![]() ![]() ![]() While I will not be taking attendence, if you miss too many classes for unspecified reasons I reserve the right to discressionary deduct up to 10% from your final grade. ![]() Note that the expectation is that all students need to attend all classes, including those where your peers present. In other words, each student will need to present (defend) and argue against (attack) one paper in class (depending on the enrollment, this is likely to be done in small groups). Each student will also need to participate in paper presentation and debate. Each student is expected to read all assigned required papers and write writeups/reviews about the selected papers. In the second half of the course, every week we will read 2 papers as a class (additional papers will be presented in class, but will not be required as reading for the whole class). So while individual assignment may not be worth a lot of points in isolation, not doing one will likely have significant effect on our grade as well as overall understanding of the material. The assignments are designed to build on one another and will lay the foundation for your final project. Group project (proposal, final presentation and web report)Īssignments in the course are designed to build the fundamental skills necessary for you to understand how to implement most state-of-the-art papers in vision, language or intersection of the two. Presentations and discussion: One paper per semester (10%) Readings and reviews: Two papers a week after the break (10%) Unregistered auditors are welcome, but will only be accommodated to the extent there is physical room in the class.Īssignments #1: Neural Networks Introduction (5%)Īssignments #2: Convolutional Neural Networks (5%)Īssignments #3: Recurrent Neural Network Language Models (10%)Īssignments #4: Neural Model for Image Captioning / Retrieval (10%) Those unregistered who would like to audit are not expected, or required, to do any assignments or readings. An optional (but extremely useful) tutorial on using Microsoft Azure will be given during the first 2 weeks of classes outside of the regular course meeting time.Īudit Policy: If you are a registered auditor, you are expected to complete assignments but not present papers or participate in the final project. Note that the amount of credits will be limited and not replenish-able, which means you have to be judicial about their use and execution times. We will also provide credits for the use of Microsoft Azure cloud service for all students in the class. You are welcome to use your own GPU if you have one. GPU will also be needed to develop course project which is a very significant part of the grade. No programming tutorials will be offered, so please ensure that you are comfortable with programming and Python.Ĭomputational Requirements: Due to the size of the data, most of the assignment in the class will require a CUDA-capable GPU with at least 4GB of GPU RAM to execute the code. Also, this course is heavy on programming assignments, which will done exclusively in Python. If you are unsure whether you have the background for this course please e-mail or talk to me. In summary, this is intended to be a demanding graduate level course and should not be your first encounter with Machine Learning. Courses in Computer Vision or Natural Language Processing are a plus. Prerequisites: You are required to have taken CPSC 340 or equivalent, with a satisfactory grade. On a technical side, we will be studying neural network architectures of various forms, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), memory networks, attention models, neural language models, structures prediction models. In addition to fundamentals, we will study recent rich body of research at the intersection of vision and language, including problems of (i) generating image descriptions using natural language, (ii) visual question answering, (iii) retrieval of images based on textural queries (and vice versa), (iv) generating images/videos from textual descriptions, (v) language grounding and many other related topics. While the fundamental techniques covered in this course are applicable broadly, the focus will on studying them in the context of joint reasoning and understanding of images/videos and language (text). This course will teach fundamental concepts related to multimodal machine learning, including (1) representation learning, (2) translation and mapping, and (3) modality alignment. Multimodal machine learning is a multi-disciplinary research field which addresses some of the core goals of artificial intelligence by integrating and modeling two or more data modalities (e.g., visual, linguistic, acoustic, etc.). ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |