Project III: Applications of deep learning
Nabil Iqbal
Fall 2026 - Spring 2027

Artificial intelligence and deep learning are having a fundamental impact on all layers of society. Somewhat surprisingly, very simple mathematical principles -- when applied to large datasets -- have resulted in the creation of entities (e.g. large language models like ChatGPT) which appear to talk, respond as a human, and reason about the world.
In this project we will understand these phenomena from a principled point of view, and build both our theoretical and applied understanding of deep neural networks in parallel. We will understand how the composition of fairly simple basic ingredients allows for arbitrarily complicated functions to be approximated by neural networks, and we will discuss from information theory how this principle allows for a surprising variety of applications. This project will be extremely hands-on, in that theoretical work will be supplemented by Python programming using the standard deep learning package PyTorch: it is essential that students taking this project have very strong Python skills.
I asked ChatGPT "draw an image in the style of Picasso of a group of students studying deep learning." And it did! I know that we're used to this sort of thing by now but its honestly kind of absurd that this is possible. If you want to understand how this somewhat tacky image was generated, take this project.

To the left you see the basic MNIST dataset; this is one of the simplest tasks in machine learning, we will build a classifier to solve this problem as one of our first exercises.
Group Project
The group project will involve an understanding of the following topics:
-
Basics of deep learning (composition of linear and non-linear layers, the multilayer perceptron, etc.)
-
Basics of information theory (KL divergence, Jensen's inequality, the formulation of neural network objectives)
-
Basics of training a neural network (stochastic gradient descent, notion of a train and test set)
-
Various architectures that are in common use (a fully connected architecture, convolutional neural networks, the transformer i.e. the architecture which is behind things like ChatGPT)
-
Together as a group, building a neural network to perform some simple task (e.g. image classification)
Mode of Operation and Evidence of Learning for the group project
The project will involve both understanding mathematics of neural networks and programming in Python. Students will demonstrate their understanding by discussing the theory, performing simple derivations, and making sure their ideas are correctly implemented in code (i.e. that the neural networks they construct do manage to perform simple tasks). They will also be expected to clearly communicate their understanding in both written and oral forms.
Individual Project
The individual project will build on the base of knowledge obtained in the group project to understand a more advanced topic in deep learning. It will require both understanding the research literature on the subject, reproducing the results from some papers (probably in a simplified setting), and attempting to go slightly beyond what is in the literature. Possible targets include:
-
Catastrophic forgetting (a study of how neural networks forget things)
-
Trainability of deep networks
-
Recurrent neural networks (networks which can be thought of as moving forwards in time)
-
Autoregressive transformers which generate text one character at a time (e.g. like ChatGPT! Though honestly I doubt we will be able to get to something that can respond to text in a reasonable manner. You should imagine building a kind of glorified autocomplete.)
-
Honestly many other things are possible: speak to me if you're interested in something not on this list
Mode of Operation and Evidence of Learning for the individual project
Just as in the group project, the individual project will involve both understanding mathematics of neural networks and programming in Python. Students will demonstrate their understanding by discussing the theory, performing simple derivations, and making sure their ideas are correctly implemented in code (i.e. that the neural networks they construct do manage to perform the more advanced tasks of the individual project). They will also be expected to clearly communicate their understanding in both written and oral forms.
Prerequisites:
-
A strong ability to program in Python.
Co-requisites:
-
MATH3431: Machine Learning and Neural Networks is highly recommended but not strictly necessary (as we will actually skip directly to the more "advanced" deep learning ideas: through historical accidents the more basic ideas aren't really needed to do this. But please contact me if you want to take this project without the co-requisite).
References:
-
CS231N: a computer science course in machine learning at Stanford (covers things slightly more CS-y and less math-y than we will).
-
Deep Learning by Goodfellow, Bengio, Courville: a great starting textbook, though the field has moved very quickly.
Email: Nabil Iqbal