Recent teaching @ School of Mathematical Sciences, Tel-Aviv university

Spring 2019 - Mathematical foundations of machine learning

 

Syllabus: In the course we will approach Machine Learning (ML) from the perspective of geometric approximation theory and modern harmonic analysis. We will review in depth the most successful tools of ML: Support Vector Machines, Random Forest, Gradient Boosting and Deep Learning and discuss related theory and applications.

Lesson 1 - Introduction to the course (presentation)

Lesson 2 - Linear regression, logistic regression, soft-max, statistical evaluation metrics, Function space theory I [4]

Lesson 3 - Function space theory II [4], Decision tree I [1]

Lesson 4 - Function space theory III [4], Decision tree II [1]

 

Lesson 5 - Decision tree III, Wavelet decomposition of a decision tree , Random Forest [7]

Lesson 6 - Wavelet decomposition of RF [7], numeric estimate of Besov smoothness of a data set [9], RF-based feature importance -  standard methods [1], wavelet-based [7]. 

Lesson 7 - Jackson theorem for wavelet decomposition of RF [7], Support Vector Machines ([1] Chapter 12), Anisotropic RF using linear SVM,

Tree based Gradient Boosting [1], [11], Wavelet-based Gradient Boosting [8].  

21st April - No lesson (Passover vacation)

Lesson 8 - Deep learning building blocks I - Convolutions, non-linearities, pooling methods, Loss functions I, DL architectures I

Lesson 9 - Deep learning building blocks II - Loss functions II, Back Propagation [3],  Gradient descent I [10], DL architectures II.

 

Lesson 10 - DL computer vision applications, AI based numerical solutions for PDEs

 

Lesson 11 - Overview of platform for projects I - Azure Cloud, software packages (Oren Elisha, Microsoft)  presentation

Lesson 12 - Review of projects.

Lesson on 2nd of June is cancelled 

Assignment I   

Assignment II

Project list

Course mini-conference/project presentations - Sunday, September 22nd 2019, WIX offices, Bitan 26 TLV port   schedule

References:

 

[1] T. Hastie, R. Tibshirani and J. Friedman, Elements of statistical learning, Springer-Verlag  2009. 

[2] S. Mallat, Understanding deep convolutional networks, Phil. Trans. R. Soc 374 (2016). 

[3] Y. LeCun, Y. Bengio and G. Hinton, Deep Learning, Nature 521 (2015), 436–444.

[4] R. DeVore & G. Lorentz, Constructive Approximation

[5] R. DeVore, Nonlinear approximation, Acta Numerica 1998, 51-150.

[6] S. Dekel and D. Leviatan, Adaptive multivariate approximation using binary space partitions and geometric wavelets, SIAM Journal on Numerical Analysis 43 (2005), 707-732.  

[7] O. Elisha and S. Dekel, Wavelet decompositions of Random Forests - smoothness analysis,sparse approximation and applications, JMLR 17 (2016).  link

​​

[8] O. Morgan, O. Elisha and S. Dekel, Wavelet decomposition of Gradient Boosting,  https://arxiv.org/abs/1805.02642

[9] O. Elisha and S. Dekel, Function space analysis of deep learning representation layers, https://arxiv.org/abs/1710.03263

[10] S. Ruder, On overview of gradient descent optimization algorithms, https://arxiv.org/abs/1609.04747

[11] Introduction to boosted trees, Tianqi Chen, 2014.

[12] Theories of Deep Learning lecture notes, Stanford University 2017.  

======================

Spring 2018 - Mathematical foundations of machine learning

Syllabus: In the course we will approach Machine Learning (ML) from the perspective of geometric approximation theory and modern harmonic analysis. We will review in depth the most successful tools of ML: Support Vector Machines, Random Forest, Gradient Boosting and Deep Learning and discuss related theory and applications.

Lesson 1 - Quick introduction, Statistics - basic definitions

 

Lesson 2 - Function spaces I ([4], parts of sections 2.1, 2.5, 2.6, 2.7)

Lesson 3 - Function Spaces II ([4], parts of sections 2.7, 2.9, [5] parts of sections 3.1, 3.2), Decision Trees I ([1], [6], [7])

Lesson 4 - Decision Trees II ([1], [6], [7]), Wavelet decomposition of decision trees ([6],[7]), Random Forest ([1], [7]), Examples of RF code in R, Wavelet decomposition of RF ([7])

Lesson 5 - RF-based feature importance -  standard methods [1], wavelet-based [7]

Lesson 6 - Function Spaces III ([4] Section 2.10),  Definition and numeric estimate of Besov smoothness of data set [9]

Lesson 7 - Jackson theorem for wavelet decomposition of RF [7], Support Vector Machines ([1] Chapter 12), Anisotropic RF using linear SVM, 

Lesson 8 - Ada Boost [1], Tree-based Gradient Boosting [1], Wavelet-based GB [8], Deep learning building blocks I - convolutions [3]

Lesson 9 - Deep learning building blocks II - Convolutions II, Logistic Regression, Soft-Max layer [3],  Gradient descent I [10], Back Propagation [3],  

Lesson 10 - Deep Learning architectures. Computer vision using DL - applications (presentation). With Meir Perez (AI team leader, WIX) 

[20th May, Shavuhot holiday, no lesson]

Lesson 11 - Applied ML hands-on session I (presentation). With Oren Elisha (Microsoft R&D)

Lesson 12 - Applied ML hands-on session  II (presentation). With Oren Elisha (Microsoft R&D)  

Lesson 13 - Gradient descent II [10], Projects review, epilogue [11]

Oren Elisha - Further DL implementation tips

Course mini-conference/project presentations - Wednesday, September 5th, WIX offices, Bitan 26 TLV port       Schedule

   

Project list

References:

 

[1] T. Hastie, R. Tibshirani and J. Friedman, Elements of statistical learning, Springer-Verlag  2009. 

[2] S. Mallat, Understanding deep convolutional networks, Phil. Trans. R. Soc 374 (2016). 

[3] Y. LeCun, Y. Bengio and G. Hinton, Deep Learning, Nature 521 (2015), 436–444.

[4] R. DeVore & G. Lorentz, Constructive Approximation

[5] R. DeVore, Nonlinear approximation, Acta Numerica 1998, 51-150.

[6] S. Dekel and D. Leviatan, Adaptive multivariate approximation using binary space partitions and geometric wavelets, SIAM Journal on Numerical Analysis 43 (2005), 707-732.  

[7] O. Elisha and S. Dekel, Wavelet decompositions of Random Forests - smoothness analysis,sparse approximation and applications, JMLR 17 (2016). 

​​

[8] O. Morgan, O. Elisha and S. Dekel, Wavelet decomposition of Gradient Boosting,  https://arxiv.org/abs/1805.02642 

[9] O. Elisha and S. Dekel, Function space analysis of deep learning representation layers, https://arxiv.org/abs/1710.03263

[10] S. Ruder, On overview of gradient descent optimization algorithms, https://arxiv.org/abs/1609.04747

[11] Theories of Deep Learning lecture notes, Stanford University 2017.  

Fall 2017 - Introduction to function space theory

Syllabus: In the course we will review the range of function spaces that are fundamental to mathematical analysis and their various characterizations through Harmonic analysis, atomic representations and approximation spaces: Lp Spaces, Hardy spaces, Sobolev spaces, Triebel-Lizorkin spaces, Besov spaces. Time allowing we will also cover interpolation of functions spaces and function spaces over manifolds. 

References

E. Stein, Harmonic analysis, real variable methods, orthogonality and oscillatory integrals, 

L. Grafakos, Classical and modern harmonic analysis,

L. Tartar, An introduction to Sobolev spaces and interpolation spaces,

R. Adams and J. Fournier, Sobolev Space (2nd edition),

R. DeVore & G. Lorentz, Constructive Approximation

Lesson 1 - Lp spaces, Hilbert spaces, first glimpse into Hardy spaces, weak Lp spaces

Lesson 2 - first glimpse into function space interpolation, Fourier analysis on the Torus, Schwartz functions,

 

Lesson 3 - Fourier analysis on R^n, distributions, convolutions

 

Lesson 4 - Real Hardy spaces I

Lesson 5 - Real Hardy spaces II

Lesson 6 - Real Hardy spaces III

 

Lesson 7 - Sobolev spaces I

 

Lesson 8 - Sobolev spaces II, Modulus of smoothness I,

 

Lesson 9 - Modulus of smoothness II, Lipschitz spaces

 

Lesson 10 -  Besov spaces I

Lesson 11 - Besov spaces II, review of theorem list for exam 

Theorem list for exam

Spring 2017 - Foundations of approximation theory

 

Syllabus Approximation theory is one of the main theoretical pillars of applied mathematics. One of its goals is to characterize the classes of functions that can be approximated by a specified algorithm with the error decaying at a certain qualitative rate. Examples for approximation algorithms are: Fourier series, algebraic polynomials, splines, wavelets, finite elements, etc. So as to provide the theoretical foundations of signal & data analysis, approximation theory applies tools to measure weak-type smoothness of functions, which allows to assess the ‘smoothness’ of functions that are not even continuous. One of the main challenges in the theory is multivariate approximation where modeling of the geometry of the approximated function plays an important role. The syllabus includes: weak-type smoothness, functions spaces, trigonometric approximation, local polynomial approximation, splines, multiresolution, non-linear approximation using wavelets, approximation spaces, the machinery of the Jackson-Bernstein theorems for the characterization of approximation spaces, geometric approximation.

 

R. DeVore & G. Lorentz, Constructive Approximation, Springer-Velag, 1993.

R. DeVore, Nonlinear Approximation, Acta Numerica 1998, 51-150.

Lesson 1 - Quick overview, Banach Spaces, Lp spaces, Fourier analysis I

Lesson 2 - Fourier analysis II, Heat Kernel, Fourier transform I

Lesson 3 - Fourier transform II, Smoothness spaces I, Modulus of Smoothness I

Lesson 4 - Modulus of Smoothness II, Smoothness spaces II, Jackson theorem for trigonometric polynomial approximation

Lesson 5 - K-functional, Smoothness spaces III (Lip)

 

Lesson 6- Smoothness spaces IV (Besov) , Local polynomial approximation

 

Lesson 7 -  Approximation from Shift-Invariant Spaces I

Lesson 8 - Approximation from Shift-Invariant Spaces II, Multiresolution, Free-knot piecewise constant approximation,

 

Lesson 9 - Wavelets

Lesson 10 - Jackson theorem for Wavelets, Approximation spaces

Lesson 11 - Bernstein theorem for Trigonometric polynomials, Jackson - Bernstein machinery, interpolation spaces 

Lesson 12 - Bernstein theorem for N-term wavelets, more examples of characterizations of approximation spaces, review of theorem list for the exam 

Theorem List for exam 

Fall 2016 - Wavelets

 

Syllabus Quick introduction using the Haar system, properties of the Fourier transform, sampling theorems, continuous wavelet transform, frames, singularity analysis, wavelet bases, fast wavelet transform, nonlinear approximation, applications: compressed sensing, image compression, denoising, data science (deep convolution networks, random forests, gradient boosting)

 

S. Mallat, A Wavelet tour of signal processing, 3rd edition (the sparse way), 2009.

I. Daubechies, Ten lectures on wavelets, 1992.

A. Cohen - Numerical analysis of wavelet methods, 2003

R. DeVore - Nonlinear Approximation, Acta Numerica 1998, 51-150.

O. Christensen - An introduction to frames and Riesz bases, 2003. 

Lesson 1 -  Quick introduction using the Haar system, review of some applications of wavelets.

Lesson 2 - Fourier analysis I

Lesson 3 - Fourier analysis II, Shift Invariant spaces

Lesson 4 - Shift Invariant spaces II, Sinc and Ideal low pass

 

Lesson 5 - Multiresolution

 

Lesson 6 - Wavelets, Wavelet transforms, Cascade algorithm

Lesson 7- Cascade II, Biorthogonal wavelets

Lesson 8 - Mutlivariate wavelets, Applications: image compression, inpainting, compressed sensing

Lesson 9- Wavelet compositions of Random Forests (Oren Elisha)

Lesson 10 - Wavelet continuous transform

 

Lesson 11 - Frames, Riesz basis

Lesson 12 - Scattering Networks, Convolution Networks (Leon Gugel)

Lesson 13 - Review of project list

Project list

Spring 2016 - Analysis of spaces of homogeneous type

 

Syllabus Spaces of homogeneous type are an important modern generalization of the Euclidian space. In the course we shall see that a rather weak assumption, on the relationship between the measure of ‘volume’ and the metric, provides the setup for wide-ranging theory, as in the Euclidian case. Some examples for spaces of homogeneous type are: manifolds, anisotropic spaces, graphs, various matrix and group structures, etc. The course will introduce function spaces over such geometric structures and harmonic analysis tools that replace the Fourier series or transform. The syllabus includes: Schwartz functions and distributions, Fourier analysis over R^n, function space characterization, homogeneous spaces, spectral representation, Laplace and heat kernel operators on manifolds, functional calculus, spectral analysis on manifolds. We will also discuss applications in data science, for example: deep learning where the data is collected over a manifold.

 

[S] E. Stein, Harmonic analysis: real variable methods, orthogonality and Oscillatory integrals, 1993.

[Gra] L. Grafakos, Classical and Modern Fourier Analysis, 2004.

[Gri] A. Grigor’yan, Heat kernel and analysis on manifolds, 2009.

[CKP] T. Coulhon, G. Kerkyacharian, P. Petrushev, Heat kernel generated frames in the setting of Dirichlet spaces, Jounral of Fourier analysis and Applications 18 (2012), 995-1066.

[KP] G. Kerkyacharian, P. Petrushev, Heat kernel based decomposition of spaces of distributions in the framework of Dirichlet spaces, Transactions of the American Mathematical Society 367 (2015), 121-189.

[DDP] W. Dahmen, D. Dekel and P. Petrushev, Two-level splits of anisotropic Besov spaces, Constructive Approximation 31 (2010), 149-194.

[DL] R. DeVore & G. Lorentz, Constructive Approximation, Springer-Velag, 1993.

 

Lesson 1 - Banach spaces, Fourier series, heat equation on the torus, functional calculus on manifolds (intro)

 

Lesson 2 - Fourier series (cont), definition & examples of homogenuous spaces ([S]  8-11), elliposid covers I ([DDP], Section 2). 

Lesson 3-  Ellipsoid covers II, spaces of homogenuous type II ([KP]), Schwartz class ([Gra] Sections 2.2, 2.3). 

 

Lesson 4 - distributions ([Gra] Sections 2.2, 2.3), Schwartz & distributions on manifolds [KP], Fourier transform, Fourier transform of distributions [Gra], 

 

Lesson 5 - Laplace operator on manifolds [Gri, Chapter 3], setup of localized heat kernels [KP].

 

Lesson 6 - Applications - deep learning on manifolds (Leon Gugel's presentation).

 

Lesson 7 - Davies-Gaffney estimate, Finite Speed Propogation, Kernel localization [KP].

 

Lesson 8  - Kernel localization II [KP]

 

Lesson 9 - Kernel localization III, spaces of bandlimited functions, spectral approximation (Euclidian case)

 

Lesson 10 - Spectral space on manifolds [KP], Nikolskii inequality, Bernstein inequality (trigonometric polynomials) [DL],

 

Lesson 11 - Bernstein & Jackson inequalities (spectral spaces) [CKP], Maximal function [S]

 

Lesson 12 - Maximal function II, Hardy spaces (def), uniformly bounded Hardy norm of atoms [S], review of list of theorems for exam.  

 

           

List of theorems for exam