Course Description

Large language models are a foundational building block of generative AI systems. Modern LLMs go far beyond the capabilities of early LLMs like GPT-3 by demonstrating the capacity for reasoning in natural language. This course provides a hands-on crash course into what makes these models work, covering:

  • Fundamental building blocks like tokenizers and optimizers
  • Optimizations such as FlashAttention
  • Approaches for learning reasoners such as supervised fine-tuning, reinforcement learning, and inference-time methods

Assignments will focus on implementation of these approaches. We will start with a moderately intensive programming assignment requiring implementing a Transformer language model from scratch and making it fast. You will be expected to do this in the first few weeks of the semester.

Recommended Prerequisites

  • CSCI-GA.2590 Natural Language Processing or equivalent. You should be familiar with the Transformer architecture (fairly comfortable with the material in this post).
  • CSCI-GA.2565 Machine Learning or equivalent
  • Deep learning or equivalent
  • Python programming experience, including PyTorch
  • Experience with concepts from probability and linear algebra

Assignments

Schedule

Date Topic Details Assignment
Jan 23 Intro, Transformers Review [slides] Course overview; expectations; review of the Transformer architecture. Assignment 1 released
Jan 30 Tokenizers, Optimizers, and Tricks [slides] Tokenizers, positional encodings, optimizers, and other stuff needed to make Transformer LLMs work.
Feb 6 Making LLMs Fast I GPUs, memory layouts, and FlashAttention.
Feb 13 Making LLMs Fast II Inference-time optimization, KV caching Assignment 1 due · Assignment 2 released · In-class quiz 1
Feb 20 Scaling Laws Empirical scaling laws with connections to convergence rates of estimators, model/data/compute tradeoffs.
Feb 27 Training I: SFT & RLHF Supervised fine-tuning, reinforcement learning from human feedback, and how they're used to train LLMs
Mar 6 Training II: GRPO & RLVR Training methods for modern reasoners Assignment 2 due · Assignment 3 released · In-class quiz 2
Mar 13 Midterm In-class midterm exam
Mar 20 No Class Spring Break.
Mar 27 TBD Topic to be announced. Final project proposals due
Apr 3 LLM Evaluation Principles for evaluating LLMs, including statistical testing and benchmark design practices Assignment 3 due · In-class quiz 3
Apr 10 Multimodality LLMs beyond text: vision, audio, and other modalities.
Apr 17 Agents Tool calling, MCP, computer use agents, and agentic workflows.
Apr 24 TBD Topic to be announced.
May 1 Project Presentations Final project presentations and wrap-up. Final project due date TBD around this time

Other Policies

A complete syllabus including policies on accommodations, late assignments, academic integrity, and other course logistics will be posted closer to the date of the course.

Course Description

Large language models are a foundational building block of generative AI systems. Modern LLMs go far beyond the capabilities of early LLMs like GPT-3 by demonstrating the capacity for reasoning in natural language. This course provides a hands-on crash course into what makes these models work, covering:

  • Fundamental building blocks like tokenizers and optimizers
  • Optimizations such as FlashAttention
  • Approaches for learning reasoners such as supervised fine-tuning, reinforcement learning, and inference-time methods

Assignments will focus on implementation of these approaches. We will start with a moderately intensive programming assignment requiring implementing a Transformer language model from scratch and making it fast. You will be expected to do this in the first few weeks of the semester.

Recommended Prerequisites

  • CSCI-GA.2590 Natural Language Processing or equivalent. You should be familiar with the Transformer architecture (fairly comfortable with the material in The Illustrated Transformer).
  • CSCI-GA.2565 Machine Learning or equivalent
  • Deep learning or equivalent
  • Python programming experience, including PyTorch
  • Experience with concepts from probability and linear algebra

Lectures

Lectures will take place as scheduled on the course calendar. A complete schedule of lectures and assignments, with readings, is on the main website page.

All lectures will take place in-person. Recordings will be made available for students to watch later. Prerecorded videos may also be used to supplement the lectures, in order to enable class time to focus more on interactive problem solving and question answering.

Illness: If you become sick with COVID-19 or any other ailment and are unable to attend class, please contact the instructor if you need accommodation and we will work to support you.

Class Recordings: Class recordings are reserved only for students in this class for educational purposes. The recordings should not be shared outside the class in any form.

Office Hours: Office hours will be held in a mix of in-person and on Zoom, per the discretion of the course staff. Information will be posted on the main course page at the start of the semester.

Communication

All key announcements will be made in Brightspace. We will use Discord as a communication tool for answering questions related to the lectures, assignments, and projects. The registration link for Discord will be available on Brightspace. All assignment submissions should be made in Gradescope. If you'd like to message course staff privately, please email the course staff mailing list nyubuildingllmreasoners@googlegroups.com .

Coursework

The timeline of assignments is on the course calendar. Assignment specifications, code, and data will be made available on the course website and Brightspace. Grading breakdowns are as follows:

  • Assignments: 25% of final grade (3 assignments)
  • Homework Quizzes: 25% of final grade (3–4 quizzes)
  • Midterm Exam: 25% of final grade
  • Final Project: 25% of final grade

Religious Observance: A student who is absent from an examination or cannot meet an assignment deadline due to the observance of a religious holy day may take the exam on an alternate day or submit the assignment up to 24 hours late without penalty, if proper notice of the planned absence has been given. Notice must be given at least 14 days prior to the classes which will be missed. For religious holy days that fall within the first 2 weeks of the semester, notice should be given on the first day of the semester.

Illness and Medical Extensions: Extensions may be granted in cases of illness (including COVID-19), medical emergency, or other circumstances. In all cases, the student should inform the course staff as soon as is practical, and the extension must be negotiated before the assignment's original due date.

Midterm Extensions: Any conflict with the midterm exam should be brought up with the course staff as soon as possible. Extensions will typically not be granted for personal travel.

Policy on ChatGPT, Copilot, and other AI assistants: See below under "Academic Honesty" for guidance on how to use these in this course.

Assignments

The assignments will feature a combination of written questions and coding assignments with various scope. Detailed instructions for assignment completion and submission are given with each assignment.

Submission: Assignments will be submitted via Gradescope. At the beginning of the semester, you will be added to the Gradescope roster through Brightspace. Please do not register on Gradescope separately or change your email; this will cause the rosters to be out-of-sync. Coding portions of assignments will be autograded, and written portions will be assessed by course staff and returned with feedback.

Slip Days: Each student is given 5 slip days to use throughout the term. Any number of these days can be applied to any assignment to extend the deadline for that assignment by that many days. E.g., you can turn in Assignment 1 one day late and Assignment 3 one day late, using two slip days total. Slip days can only be used for assignments and not the quizzes, midterm, or final project. Slip days cannot be used fractionally: submitting an assignment 1 hour late incurs 1 slip day, 25 hours late incurs 2 slip days, etc. You do not need to ask for permission to use them, but the course staff will keep track of how many days were used.

Late Assignments: For each day late an assignment is turned in not covered by a slip day or negotiated extension, 15% of the received grade for that assignment will be deducted. For example, an assignment that would have received 85% but is submitted one day late will receive 72.25% (85% × 0.85).

Homework Quizzes

There will be 3–4 in-class quizzes throughout the semester. Each quiz will happen during the first portion of class. The quizzes will consist of questions (True/False, multiple choice, and short answer) that evaluate concepts covered during lectures in the weeks preceding the quiz. Prior to the first quiz, the course staff will post sample quiz questions to help with your preparation.

Midterm

There will be one in-class midterm as described on the course calendar. Students will be allowed one standard letter (8.5" x 11") page of notes during exams. Use of electronic communication devices (phones, laptops, calculators, ChatGPT, etc.) is banned during the exam.

Final Project

The final project consists of a project on a topic of your choosing. You may complete these projects in groups of 2-3. A project proposal will be submitted prior to the final project start and feedback will be provided by the course staff. Projects do not necessarily have to "work," but will be held to a high standard in terms of expected effort, insight, and technical sophistication.

Final Grades

Your final grade is computed based on the total points earned across all assignments. The final grade is mapped to a letter as follows, with grades on the boundary receiving the higher grade:

Letter Grade Cutoff
A93.3
A-90
B+86.7
B83.3
B-80
C+76.7
C73.3
C-70
D65
FBelow 65

Depending on class performance, the instructors may shift these boundaries down to raise students' grades.

Academic Honesty

Students are encouraged to discuss lecture material, homework problems, and coding assignments with others! However, your final written solution or source code must be your own, excluding the final project, which may be completed in groups.

You may not:

  • Copy all or part of an assignment from a fellow student, the Internet, or AI
  • Bring answers into an exam, or consult an AI during an exam
  • Submit any work without appropriate attribution. In the final project, you must make clear when you are using code, data, figures, concepts, or any other artifacts from prior work.

You may consult external resources such as blog posts, YouTube videos, academic papers, GitHub repositories, AI assistants, and more. However, your use of such resources, particularly GitHub repositories, must be limited in the same way as discussions with other students: you can look at these to get an idea of how to solve a problem, but you should not take external code and submit it as part of your assignment, except for the final project when it is appropriately attributed.

Be sure you respect these policies when posting on Discord. Asking clarifying questions, addressing possible bugs in the provided code, etc. are fair game, but you should not discuss solutions in a substantive way that might spoil them for others. When in doubt, discuss privately with the instructors.

Students who violate these policies may receive a grade of zero on the assignment or exam in question or for the course overall, depending on the instructors' judgment and the severity of the infraction.

Policy on AI Assistants

Understanding the capabilities of these systems and their boundaries is a major focus of this class, and there's no better way to do that than by using them!

  • We strongly encourage you to use AI assistants to understand concepts in AI and machine learning. You should see it as another tool like web search that can supplement understanding of the course material.
  • You are allowed to use Claude Code/ChatGPT/etc. for programming assignments, but your usage must be limited in the same way as usage of other resources. You should come up with the high-level skeleton of the solution and an initial implementation yourself and use these tools primarily as coding assistants for debugging. If you and another student submit exactly the same code because you both generated it with Claude Code, we will treat that as if you had copied each others' solutions.
  • You are permitted to use AI assistants for conceptual questions on assignments, but discouraged from doing so. These questions are meant to deepen your understanding of the course content, particularly in preparation for the midterm. Heavily relying on ChatGPT for your answers will negatively impact your learning.

An example of a good question is, "Write a line of Python code to reshape a Pytorch tensor x of [batch size, seqlen, hidden dimension] to be a 2-dimensional tensor with the first two dimensions collapsed." An example of a bad question would be to try to feed in a large chunk of the assignment code and copy-paste the problem specification from the assignment PDF. As a heuristic, it should be possible for you to explain what each line of your code is doing. If you have code in your solution that is only included because an assistant told you to put it there, then it is no longer your own work.

Academic Accommodations

Academic accommodations are available for students with disabilities. Please contact the Moses Center for Student Accessibility (212-998-4980 or mosescsd@nyu.edu) for further information. Students who are requesting academic accommodations are advised to reach out to the Moses Center as early as possible in the semester for assistance. If you are already registered with the Moses Center, please deliver your Accommodation Letter to me as early as possible in the semester so we can discuss your approved accommodations and needs in this course.

Student Wellness

In a large, complex community like NYU, it's vital to reach out to others, particularly those who are isolated or engaged in self-destructive activities. Student wellness is the responsibility of all of us. The NYU Wellness Exchange is the constellation of NYU's programs and services designed to address the overall health and mental health needs of its students. Students can access this service 24 hours a day, seven days a week:

Diversity

It is our intent that students from all diverse backgrounds and perspectives be well served by this course, that students' learning needs be addressed both in and out of class, and that the diversity that students bring to this class be viewed as a resource, strength and benefit. It is our intent to present materials and activities that are respectful of diversity: gender, sexuality, disability, age, socioeconomic status, ethnicity, race, and culture. Your suggestions are encouraged and appreciated. Please let the course staff know of ways to improve the effectiveness of the course for you personally or for other students.

Furthermore, at times throughout the semester, we will discuss the broader cultural impact of machine learning, NLP, and language technology. I ask that students approach these topics seriously and recognize the power technology has to both support and undermine efforts to create a more inclusive society.