Shai Dekel - Teaching

- Intro, basic ML models and terminology

Spring 2024 - Mathematical foundations of machine learning

Syllabus: In the course we will study Machine Learning (ML) through the lens of geometric approximation theory and modern harmonic analysis. We will review in depth the most successful tools of ML: Support Vector Machines, Random Forest, Gradient Boosting, Deep Learning networks (Multi Layer Perceptron, Convolution, Attention, Transformers). We will discuss related theory and applications in computer vision, natural language, numerical solutions to PDEs, etc.

Lesson 1 - Intro, basic ML models (linear regression, logistic regression, soft-max), ML terminology and notation, function spaces I.

Lesson 2 - Support Vector Machines I, function spaces II

Lesson 3 - Support Vector Machines II, Random Forest I, Function spaces III

Lesson 4 - Random Forest II, Wavelet decomposition of RF I, function spaces IV

Lesson 5 - Wavelet II, Tree-based Classification, function spaces V

Lesson 6 - Feature importance, Boosting, convolutions

Lesson 7 - Deep learning building blocks I

Lesson 8 - Deep learning building blocks II, DL computer vision applications I

Lesson 9 - DL Computer vision applications II, Approximation theory of DL I

Lesson 10 - Approximation theory of DL I, Application of DL in numerical PDEs

Lesson 11 - Transformers, autoregressive image generation

Presentations

- Essentials of function space theory

- Support Vector Machines

- Random Forest

- Boosting

- Deep Learning basics

- Computer vision applications of deep learning

- Approximation Theory of DL

- Applications of DL in numerical PDEs

- Transformers

- Autoregressive image generation using wavelets

Assignments

- What is approximation theory?

Final assignment

References

[1] T. Hastie, R. Tibshirani and J. Friedman, Elements of statistical learning, Springer-Verlag 2009.

[2] Y. LeCun, Y. Bengio and G. Hinton, Deep Learning, Nature 521 (2015), 436–444.

[3] R. DeVore & G. Lorentz, Constructive Approximation

[4] R. DeVore, Nonlinear approximation, Acta Numerica 1998, 51-150.

[5] S. Dekel and D. Leviatan, Adaptive multivariate approximation using binary space partitions and geometric wavelets, SIAM Journal on Numerical Analysis 43 (2005), 707-732.

[6] O. Elisha and S. Dekel, Wavelet decomposition of Random Forests - smoothness analysis, sparse approximation and applications, JMLR 17 (2016).

[7] O. Morgan, O. Elisha and S. Dekel, Wavelet decomposition of Gradient Boosting, https://arxiv.org/abs/1805.02642

[8] I. Ben-Shaul, S. Dekel and O. Elisha, Sparse Besov space analysis of deep learning representation layers in high dimensions, Pure and Applied Functional Analysis, to appear.

[9] S. Ruder, On overview of gradient descent optimization algorithms, https://arxiv.org/abs/1609.04747

[10] I. Ben-Shaul and S. Dekel, Nearest class center simplification through intermediate layers, PMLR 196, 2022.

[11] I. Ben-Shaul, T. Galanti and S. Dekel, Exploring the approximation capabilities of multiplicative neural networks for smooth functions, TMLR 2023..

[12] M. Phuong and M. Hutter, Formal algorithms for transformers, DeepMind, 2022.

Fall 2023 - Foundations of approximation theory

Syllabus Approximation theory is one of the main theoretical pillars of applied mathematics. One of its goals is to characterize the classes of functions that can be approximated by a specified algorithm with the error decaying at a certain qualitative rate. Examples for approximation algorithms are: Fourier series, algebraic polynomials, splines, wavelets, finite elements, etc. So as to provide the theoretical foundations of signal processing & machine learning, approximation theory applies tools to measure weak-type smoothness of functions, which allows to assess the ‘smoothness’ of functions that are not even continuous. One of the main challenges in the theory is multivariate approximation where modeling of the geometry of the approximated function plays an important role. The syllabus includes: weak-type smoothness, functions spaces, trigonometric approximation, local polynomial approximation, splines, multiresolution, non-linear approximation using piecewise polynomials and wavelets, approximation spaces, the machinery of the Jackson-Bernstein theorems for the characterization of approximation spaces, geometric approximation.

Lesson 1 - Introduction, Lp and Hilbert spaces

Lesson 2 - Fourier series, Approximation using trigonometric polynomials I

Lesson 3 - Fourier integral, Approximation with piecewise constants I

Lesson 4 - Review

Lesson 5 - Approximation with piecewise constants II, Modulus of smoothness, K-functional

Lesson 6 - Lipschitz spaces, nonlinear approximation I, Jackson theorem for trigonometric polynomials I, Besov space I

Lesson 7 - Besov space II, Shift invariant spaces I

Lesson 8 - Shift invariant spaces II, wavelets I

Lesson 9 - Jackson theorem for trigonometric polynomials II, Approximation spaces I (Dany Leviatan)

Lesson 10 - Approximation spaces II, Jackson-Bernstein machinery, Bernstein-type theorems (Dany Leviatan)

Lesson 11 - wavelets II

Lesson 12 - Review of theorems

Presentations:

- Lecture notes

- List of theorems for exam

Assignments:

- Intro, sequential sampling, basic ML models, ML terminology

References:

R. DeVore & G. Lorentz, Constructive Approximation, Springer-Verlag, 1993.

R. DeVore, Nonlinear Approximation, Acta Numerica (1998), 51-150.

S. Brenner and L. Scott, The mathematical theory of finite elements, Springer 1994.

L. Grafakos, Classical and modern harmonic analysis, Prentice-Hall, 2004.

R. Adams and J. Fournier, Sobolev Spaces (2nd edition).

Spring 2023 - Mathematical foundations of machine learning

With Ido Ben-Shaul and Yuval Zelig

Syllabus: In the course we will approach Machine Learning (ML) from the perspective of geometric approximation theory and modern harmonic analysis. We will review in depth the most successful tools of ML: Support Vector Machines, Random Forest, Gradient Boosting, Deep Learning networks (Multi Layer Perceptron, Convolution, Attention, Transformers). We will discuss related theory and applications in computer vision, natural language, numerical solutions to PDEs, etc.

Lesson 1 - Intro, sequential sampling (normal distribution, Beta distribution, Dirichlet distribution), basic ML models (linear regression, logistic regression, soft-max)

Lesson 2 - ML terminology and notation, function spaces I, Random Forest I

Lesson 3 - Function spaces II, Random Forest II

Lesson 4 - Function spaces III, Wavelet decomposition of RF I

Lesson 5 - Function spaces IV, Wavelet decomposition of RF II

Lesson 6 - Function spaces V, feature importance

Lesson 7 - Approximation spaces, Besov smoothness of datasets, Support Vector machines, anisotropic RF using SVM, AdaBoost

Lesson 8 - Gradient Boosting , Wavelet based Gradient boosting, Deep Learning basics I

Lesson 9 - Deep Learning basics II, Computer vision applications of DL I

Lesson 10 - Computer vision applications of DL II, Mathematical analysis of DL !

Lesson 11- Mathematical analysis of DL !I, Applications of DL in numerical solutions for PDEs

Lesson 12 - NLP, Transformers

Lesson 13 - Applied ML workshop I

Lesson 14 - Applied ML workshop II, Review of Summer projects

Presentations:

- Function spaces, approximation theory

- Decision Trees, Random Forest, Wavelet decomposition of RF

- Support Vector Machines

- Boosting

- Deep learning - basic concepts

- Computer vision applications of deep learning

- Deep Learning Theory

- PDE applications of deep learning

- NLP, Transformers, SSL

Assignments:

References:

[1] T. Hastie, R. Tibshirani and J. Friedman, Elements of statistical learning, Springer-Verlag 2009.

[2] Y. LeCun, Y. Bengio and G. Hinton, Deep Learning, Nature 521 (2015), 436–444.

[3] R. DeVore & G. Lorentz, Constructive Approximation

[4] R. DeVore, Nonlinear approximation, Acta Numerica 1998, 51-150.

[5] S. Dekel and D. Leviatan, Adaptive multivariate approximation using binary space partitions and geometric wavelets, SIAM Journal on Numerical Analysis 43 (2005), 707-732.

[6] O. Elisha and S. Dekel, Wavelet decomposition of Random Forests - smoothness analysis, sparse approximation and applications, JMLR 17 (2016).

[7] O. Morgan, O. Elisha and S. Dekel, Wavelet decomposition of Gradient Boosting, https://arxiv.org/abs/1805.02642

[8] O. Elisha and S. Dekel, Function space analysis of deep learning representation layers, https://arxiv.org/abs/1710.03263

[9] S. Ruder, On overview of gradient descent optimization algorithms, https://arxiv.org/abs/1609.04747

[10] Introduction to boosted trees, Tianqi Chen, 2014.

[11] I. Ben-Shaul and S. Dekel, Sparsity-Probe: analysis tool for deep learning models, IMVC 2021.

[12] I. Ben-Shaul and S. Dekel, Nearest class center simplification through intermediate layers, PMLR 196, 2022.

[13] I. Ben-Shaul, T. Galanti and S. Dekel, Exploring the approximation capabilities of multiplicative neural networks for smooth functions, submitted.

[14] M. Phuong and M. Hutter, Formal algorithms for transformers, DeepMind, 2022.

[15] ChatGPT - https://openai.com/blog/chatgpt/

Spring 2022 - Mathematical foundations of machine learning

With Ido Ben-Shaul

Syllabus: In the course we will approach Machine Learning (ML) from the perspective of geometric approximation theory and modern harmonic analysis. We will review in depth the most successful tools of ML: Prophet, Gaussian Processes, Support Vector Machines, Random Forest, Gradient Boosting, Deep Learning. We will discuss related theory and applications in computer vision, numerical solutions to PDEs, etc.

lesson 1 - Introduction, adaptive sampling using the Beta and Dirichlet distributions, basic ML models: linear regression, logistic regression, soft-max.

lesson 2 - Basic definitions of ML, function spaces, Gaussian Processes for noisy time series

lesson 3 - Function spaces II, Decision Trees, Random Forest

lesson 4 - Function spaces III, Wavelet decomposition of Random Forest

lesson 5 - Function spaces IV, feature importance using linear correlation, tree-based feature importance, wavelet-based feature importance.

lesson 6 - Function spaces V, Besov space smoothness of datasets, Support vector machines (SVM), anisotropic RF using SVM.

lesson 7 - AdaBoost and additive models, Gradient Boosting, wavelet based Gradient boosting, Deep Learning I

lesson 8 - Deep learning II, Computer vision/imagining applications of DL I

lesson 9 - Computer vision/imagining applications of DL II, Mathematical analysis of DL I

lesson 10 - Mathematical analysis of DL II, PDE applications of DL I

lesson 11 - PDE applications of DL II, Attention models (Transformers)

lesson 12 - Applied ML workshop (Ido Ben Shaul)

lesson 13 - Review of projects

Presentations

- Intro, statistics and basic models

- Gaussian processes

- Function spaces

- Wavelet decomposition of Random Forest

- Support Vector Machines

- Boosting models

- Deep learning building blocks

- Computer vision/imaging applications of DL

- Mathematical analysis of DL

- Applications of DL in PDEs

- Applied workshop with Ido Ben Shaul

Assignments

Summer project list, summer assignment

References:

[1] T. Hastie, R. Tibshirani and J. Friedman, Elements of statistical learning, Springer-Verlag 2009.

[2] Y. LeCun, Y. Bengio and G. Hinton, Deep Learning, Nature 521 (2015), 436–444.

[3] R. DeVore & G. Lorentz, Constructive Approximation

[4] R. DeVore, Nonlinear approximation, Acta Numerica 1998, 51-150.

[5] S. Dekel and D. Leviatan, Adaptive multivariate approximation using binary space partitions and geometric wavelets, SIAM Journal on Numerical Analysis 43 (2005), 707-732.

[6] O. Elisha and S. Dekel, Wavelet decomposition of Random Forests - smoothness analysis, sparse approximation and applications, JMLR 17 (2016). link

[7] O. Morgan, O. Elisha and S. Dekel, Wavelet decomposition of Gradient Boosting, https://arxiv.org/abs/1805.02642

[8] O. Elisha and S. Dekel, Function space analysis of deep learning representation layers, https://arxiv.org/abs/1710.03263

[9] S. Ruder, On overview of gradient descent optimization algorithms, https://arxiv.org/abs/1609.04747

[10] Introduction to boosted trees, Tianqi Chen, 2014.

[11] S. Roberts, M. Osborne, M. Ebden, S. Reece, N. Gibson and S. Aigrain, Gaussian processes for time-series modelling, Philosophical transactions of the royal society 371 (2013)

[12] I. Ben-Shaul and S. Dekel, Sparsity-Probe: analysis tool for deep learning models, IMVC 2021.

[13] I. Ben-Shaul and S. Dekel, Nearest class center simplification through intermediate layers, submitted.

[14] R. Devore, B. Hanin and G. Petrova, Neural network approximation, Acta Numerica (2021), 327-444.

Spring 2021 - Introduction to function space theory

Syllabus: In the course we will review the range of function spaces that are fundamental to mathematical analysis and their various characterizations through Harmonic analysis, atomic representations and approximation spaces: Lp Spaces, Hardy spaces, Sobolev spaces, Triebel-Lizorkin spaces, Besov spaces. Time allowing we will also cover interpolation of functions spaces and function spaces over manifolds.

References

E. Stein, Harmonic analysis, real variable methods, orthogonality and oscillatory integrals,

L. Grafakos, Classical and modern harmonic analysis,

L. Tartar, An introduction to Sobolev spaces and interpolation spaces,

R. Adams and J. Fournier, Sobolev Space (2nd edition),

R. DeVore & G. Lorentz, Constructive Approximation

Lesson 1 - Lp spaces, weak Lp spaces.

Lesson 2 - first glimpse into function space interpolation, first glimpse into Hardy spaces. Schwartz class, Distributions, convolutions, Sobolev spaces I.

Lesson 3 - Sobolev spaces II, Fourier transform of Schwartz functions.

Lesson 4 - Fourier transform II, Fourier transform of distributions, Fourier representation of W^r_2.

Lesson 5 - Derivation of Fourier integral from Heat equation, Maximal functions, Hardy spaces I.

Lesson 6 - Hardy spaces II

Lesson 7 - Hardy spaces III, Moduli of smoothness I

Lesson 8 - Moduli of smoothness II, K-functional

Lesson 9 - Generalized Lipschitz spaces, Approximation with piecewise constants, Besov spaces I

Lesson 10 - Besov spaces II

Fall 2020 - Foundations of approximation theory

Syllabus Approximation theory is one of the main theoretical pillars of applied mathematics. One of its goals is to characterize the classes of functions that can be approximated by a specified algorithm with the error decaying at a certain qualitative rate. Examples for approximation algorithms are: Fourier series, algebraic polynomials, splines, wavelets, finite elements, etc. So as to provide the theoretical foundations of signal & data analysis, approximation theory applies tools to measure weak-type smoothness of functions, which allows to assess the ‘smoothness’ of functions that are not even continuous. One of the main challenges in the theory is multivariate approximation where modeling of the geometry of the approximated function plays an important role. The syllabus includes: weak-type smoothness, functions spaces, trigonometric approximation, local polynomial approximation, splines, multiresolution, non-linear approximation using piecewise polynomials and wavelets, approximation spaces, the machinery of the Jackson-Bernstein theorems for the characterization of approximation spaces, geometric approximation.

R. DeVore & G. Lorentz, Constructive Approximation, Springer-Velag, 1993.

R. DeVore, Nonlinear Approximation, Acta Numerica (1998), 51-150.

S. Brenner and L. Scott, The mathematical theory of finite elements, Springer 1994.

L. Grafakos, Classical and modern harmonic analysis, Prentice-Hall, 2004.

Lesson 1 - Introduction, Lp spaces, Smoothness spaces I

Lesson 2 - Smoothness spaces II, Trigonometric polynomials and Fourier series approximation, Dirichelet, Fejer, Summability kernels, Fourier integral I

Lesson 3 - Fourier integral II, approximation with piecewise constants, modulus of smoothness

Lesson 4 - K-functional, Lip spaces, first glimpse at nonlinear approximation (free knot piecewise constants), Jackson theorem for trigonometric polynomials

Lesson 5 - Besov spaces, "local" algebraic polynomial approximation I

Lesson 6 - "local" algebraic polynomial approximation II

Lesson 7 - Approximation from shift invariant spaces

Lesson 8 - Approximation from shift invariant spaces II, Wavelets I

Lesson 9 - Wavelets II

Lesson 10 - Wavelets III, Approximation spaces I

Lesson 11 - Approximation spaces II

Lesson 12 - Approximation spaces III

Lesson 13 - Review of theorems for the exam

Lectures notes: notes, local polynomial approximation notes

List of theorems for the exam

Spring 2020- Mathematical foundations of machine learning

Syllabus: In the course we will approach Machine Learning (ML) from the perspective of geometric approximation theory and modern harmonic analysis. We will review in depth the most successful tools of ML: Prophet, Gaussian Processes, Support Vector Machines, Random Forest, Gradient Boosting, Deep Learning. We will discuss related theory and applications in computer vision, numerical solutions to PDEs, etc.

Lesson 1 - Introduction to the course - presentation

Lesson 2 - Linear regression, logistic regression, soft-max, statistical evaluation metrics - notes, Function space theory I [3]- notes

Lesson 3 - Function space theory II [3] - notes, Gaussian Processes + application for noisy time series [12]- notes

Lesson 4 - Function space theory III [3] - notes, Decision trees, Random Forest [6],[7].

Lesson 5 - Besov spaces [3] - notes, Besov smoothness of indicator function - notes, RF classification - mapping to vector regression [6], [8] - notes , standard methods [1] - notes. Scikit Learn RF

Lesson 6 - Kaggle datasets, Wavelet decomposition of RF, Besov index of datasets - theory and applications [6],[8], Linear correlation - notes

Lesson 7 - Feature importance - standard methods & wavelet method [6], Support Vector Machines ([1] Chapter 12) , Anisotropic RF using linear SVM, additional notes

Lesson 8 - AdaBoost and additive models [1], Wavelet-based Gradient Boosting [7], Jackson theorem for wavelet decomposition of RF [6]- additional notes , Deep learning building blocks I - Convolutions I

Lesson 9 - Deep learning building blocks II - Convolutions II, non-linearities, pooling methods, Loss functions, Back Propagation [2] - additional notes.

Lesson 10 - Gradient descent [9], applications of DL in CV, additional notes.

Lesson 11 - Applications of DL in CV II, applications of DL in numerical PDEs

Lesson 12 - Deep neural decision forests & wavelets [13], Prophet model for time series forecasting [14]

Lesson 13 - Review of summer assignment & projects