Lecture 23: Accelerating Gradient Descent (Use Momentum)

Course Info

Instructor

Prof. Gilbert Strang

Departments

Mathematics

As Taught In

Spring 2018

Level

Undergraduate

Topics

Learning Resource Types

Lecture Videos

Problem Sets

Instructor Insights

Download Course

Video Lectures

Description

In this lecture, Professor Strang explains both momentum-based gradient descent and Nesterov’s accelerated gradient descent.

Summary

Study the zig-zag example: Minimize \(F = \frac{1}{2} (x^2 + by^2)\)
Add a momentum term / heavy ball remembers its directions.
New point \(k\) + 1 comes from TWO old points \(k\) and \(k\) - 1.
“1^st order” becomes “2^nd order” or “1^st order system” as in ODEs.
Convergence rate improves: 1 - \(b\) to 1 - square root of \(b\) !

Related section in textbook: VI.4

Instructor: Prof. Gilbert Strang

Problem for Lecture 23
From textbook Section VI.4

5. Explain why projection onto a convex set \(K\) is a contraction in equation (24). Why is the distance \(||\boldsymbol{x}-\boldsymbol{y}||\) never increased when \(\boldsymbol{x}\) and \(\boldsymbol{y}\) are projected onto \(K\)?