Lecture 3: Deep Learning and Neural Networks — How AI Perceives the World

Deep Learning: Inspired by the Brain

Artificial neural networks (ANNs) draw inspiration from the structure of neurons in the brain — but how they actually work is quite different from biological brains.

Biological Neuron vs. Artificial Neuron
Property	Biological Neuron	Artificial Neuron
Input	Receives chemical signals via dendrites	Sum of numbers (weight × input value)
Processing	Fires an electrical signal when threshold is exceeded	Passes through an activation function (ReLU, Sigmoid)
Output	Transmitted to the next neuron via the axon	Passes a number to the next layer
Learning	Synaptic strength changes (Hebbian learning)	Weights adjusted via backpropagation

How Neural Networks Work

Forward Pass

Input data is transformed as it passes through multiple layers. At each layer, it is multiplied by weights and passed through an activation function to produce the final output.

Loss Calculation

The difference between the model's output and the correct answer is quantified. The smaller the loss, the more accurate the model.

Backpropagation

To reduce the loss, the error is propagated backward from the output layer to the input layer. This calculates how much each weight contributed to the loss.

Gradient Descent

Weights are adjusted slightly in the direction opposite to the computed gradient. After millions of iterations, the model becomes progressively more accurate.

An intuitive analogy:
→ Descending a mountain blindfolded (gradient descent)
→ Feeling the slope underfoot (gradient = slope)
→ Taking one step downhill at a time (weight update)
→ Eventually reaching the lowest point (minimum loss)

Challenges:
→ Global minimum vs. local minimum
→ Learning rate: too large → oscillates, too small → slow

Revolutionary Deep Learning Architectures

Comparison of Major Deep Learning Architectures
Architecture	Specialization	Core Idea	Key Applications
CNN (Convolutional Neural Network)	Images & video	Extracts spatial patterns using filters	Face recognition, medical imaging, self-driving cars
RNN / LSTM	Sequential data	Remembers prior information (gate mechanism)	Translation, speech recognition (pre-2017)
Transformer	Text & general purpose	Attention processes entire context in parallel	GPT, BERT, ChatGPT, Claude
GAN (Generative Adversarial Network)	Image & audio generation	Generator vs. discriminator competing	Image generation, deepfakes
Diffusion Model	Image generation	Generates images by reversing a noise process	DALL·E, Stable Diffusion

CNN — The Eyes of AI

CNN's Hierarchical Feature Extraction
Layer	Features Extracted	Analogy
Early Layers (Low-level)	Edges, lines, color changes	Basic shapes like dots, lines, circles
Middle Layers (Mid-level)	Textures, patterns, corner combinations	Body parts like eyes, nose, ears
Later Layers (High-level)	Full faces, cars, animals	Recognizing cats, dogs, people

In 2012, AlexNet halved the error rate in the ImageNet classification competition compared to prior methods. That moment marked the beginning of the deep learning revolution. Since then, AI’s visual capabilities have surpassed human-level performance.

Transformer — The Engine of Modern AI

Building on the introduction in Lecture 1, let’s understand Transformers more deeply.

RNN vs. Transformer
Property	RNN/LSTM	Transformer
Processing Method	Sequential (one step at a time)	All at once (parallel)
Long-Range Dependencies	Distant information fades	Attention connects any position directly
Training Speed	Cannot parallelize → slow	Parallelizable → fast
Model Size	Millions of parameters	Tens to hundreds of billions of parameters
Representative Models	LSTM translation models	GPT-4, Claude, Gemini

The Transformer's core innovation — Attention:
Sentence: "The animal didn't cross the street because it was too tired."

RNN: "it" → information passed sequentially from earlier → distant "animal" fades
Transformer: "it" → attends to every word simultaneously → connects directly to "animal"

→ Correctly identifies "it = animal"
→ This is how LLMs understand complex context

Limitations of Deep Learning

Key Limitations of Deep Learning
Limitation	Explanation	Real-World Problem
Black Box	Cannot explain why it reached a conclusion	Trust issues with medical and legal AI
Data Dependency	Requires large volumes of high-quality data	Insufficient AI for diagnosing rare diseases
Distribution Shift	Fails when test conditions differ from training	Self-driving cars encountering unfamiliar roads
Compute Cost	Training requires enormous energy and expense	Environmental impact, accessibility inequality

Key Takeaways

Backpropagation: propagate error backward → adjust weights → repeat CNN: hierarchical feature extraction — edges → parts → whole objects Transformer: sequential processing → parallel Attention → speed and performance revolution Deep learning limitations: black box + data dependency + compute cost