## Teaching @ School of Mathematical Sciences, Tel-Aviv university

Spring 2022 - Mathematical foundations of machine learning

Syllabus: In the course we will approach Machine Learning (ML) from the perspective of geometric approximation theory and modern harmonic analysis. We will review in depth the most successful tools of ML: Prophet, Gaussian Processes, Support Vector Machines, Random Forest, Gradient Boosting, Deep Learning. We will discuss related theory and applications in computer vision, numerical solutions to PDEs, etc.

Spring 2021 - Introduction to function space theory

Syllabus: In the course we will review the range of function spaces that are fundamental to mathematical analysis and their various characterizations through Harmonic analysis, atomic representations and approximation spaces: Lp Spaces, Hardy spaces, Sobolev spaces, Triebel-Lizorkin spaces, Besov spaces. Time allowing we will also cover interpolation of functions spaces and function spaces over manifolds.

References

E. Stein, Harmonic analysis, real variable methods, orthogonality and oscillatory integrals,

L. Grafakos, Classical and modern harmonic analysis,

L. Tartar, An introduction to Sobolev spaces and interpolation spaces,

R. Adams and J. Fournier, Sobolev Space (2nd edition),

R. DeVore & G. Lorentz, Constructive Approximation

Lesson 1 - Lp spaces, weak Lp spaces.

Lesson 2 - first glimpse into function space interpolation, first glimpse into Hardy spaces. Schwartz class, Distributions, convolutions, Sobolev spaces I.

Lesson 3 - Sobolev spaces II, Fourier transform of Schwartz functions.

Lesson 4 - Fourier transform II, Fourier transform of distributions, Fourier representation of W^r_2.

Lesson 5 - Derivation of Fourier integral from Heat equation, Maximal functions, Hardy spaces I.

Lesson 6 - Hardy spaces II

Lesson 7 - Hardy spaces III, Moduli of smoothness I

Lesson 8 - Moduli of smoothness II, K-functional

Lesson 9 - Generalized Lipschitz spaces, Approximation with piecewise constants, Besov spaces I

Lesson 10 - Besov spaces II

Fall 2020 - Foundations of approximation theory

Syllabus Approximation theory is one of the main theoretical pillars of applied mathematics. One of its goals is to characterize the classes of functions that can be approximated by a specified algorithm with the error decaying at a certain qualitative rate. Examples for approximation algorithms are: Fourier series, algebraic polynomials, splines, wavelets, finite elements, etc. So as to provide the theoretical foundations of signal & data analysis, approximation theory applies tools to measure weak-type smoothness of functions, which allows to assess the ‘smoothness’ of functions that are not even continuous. One of the main challenges in the theory is multivariate approximation where modeling of the geometry of the approximated function plays an important role. The syllabus includes: weak-type smoothness, functions spaces, trigonometric approximation, local polynomial approximation, splines, multiresolution, non-linear approximation using piecewise polynomials and wavelets, approximation spaces, the machinery of the Jackson-Bernstein theorems for the characterization of approximation spaces, geometric approximation.

R. DeVore & G. Lorentz, Constructive Approximation, Springer-Velag, 1993.

R. DeVore, Nonlinear Approximation, Acta Numerica (1998), 51-150.

S. Brenner and L. Scott, The mathematical theory of finite elements, Springer 1994.

L. Grafakos, Classical and modern harmonic analysis, Prentice-Hall, 2004.

Lesson 1 - Introduction, Lp spaces, Smoothness spaces I

Lesson 2 - Smoothness spaces II, Trigonometric polynomials and Fourier series approximation, Dirichelet, Fejer, Summability kernels, Fourier integral I

Lesson 3 - Fourier integral II, approximation with piecewise constants, modulus of smoothness

Lesson 4 - K-functional, Lip spaces, first glimpse at nonlinear approximation (free knot piecewise constants), Jackson theorem for trigonometric polynomials

Lesson 5 - Besov spaces, "local" algebraic polynomial approximation I

Lesson 6 - "local" algebraic polynomial approximation II

Lesson 7 - Approximation from shift invariant spaces

Lesson 8 - Approximation from shift invariant spaces II, Wavelets I

Lesson 9 - Wavelets II

Lesson 10 - Wavelets III, Approximation spaces I

Lesson 11 - Approximation spaces II

Lesson 12 - Approximation spaces III

Lesson 13 - Review of theorems for the exam

Lectures notes: notes, local polynomial approximation notes

Spring 2020- Mathematical foundations of machine learning

Syllabus: In the course we will approach Machine Learning (ML) from the perspective of geometric approximation theory and modern harmonic analysis. We will review in depth the most successful tools of ML: Prophet, Gaussian Processes, Support Vector Machines, Random Forest, Gradient Boosting, Deep Learning. We will discuss related theory and applications in computer vision, numerical solutions to PDEs, etc.

Lesson 1 - Introduction to the course - presentation

Lesson 2 - Linear regression, logistic regression, soft-max, statistical evaluation metrics - notes, Function space theory I [3]- notes

Lesson 3 - Function space theory II [3] - notes, Gaussian Processes + application for noisy time series [12]- notes

Lesson 4 - Function space theory III [3] - notes, Decision trees, Random Forest [6],[7].

Lesson 5 - Besov spaces [3] - notes, Besov smoothness of indicator function - notes, RF classification - mapping to vector regression [6], [8] - notes , standard methods [1] - notes. Scikit Learn RF

Lesson 6 - Kaggle datasets, Wavelet decomposition of RF, Besov index of datasets - theory and applications [6],[8], Linear correlation - notes

Lesson 7 - Feature importance - standard methods & wavelet method [6], Support Vector Machines ([1] Chapter 12) , Anisotropic RF using linear SVM, additional notes

Lesson 8 - AdaBoost and additive models [1], Wavelet-based Gradient Boosting [7], Jackson theorem for wavelet decomposition of RF [6]- additional notes , Deep learning building blocks I - Convolutions I

Lesson 9 - Deep learning building blocks II - Convolutions II, non-linearities, pooling methods, Loss functions, Back Propagation [2] - additional notes.

Lesson 10 - Gradient descent [9], applications of DL in CV, additional notes.

Lesson 11 - Applications of DL in CV II, applications of DL in numerical PDEs

Lesson 12 - Deep neural decision forests & wavelets [13], Prophet model for time series forecasting [14]

Lesson 13 - Review of summer assignment & projects

Summer projects, summer assignment - submission on 31st August 2020

References:

[1] T. Hastie, R. Tibshirani and J. Friedman, Elements of statistical learning, Springer-Verlag 2009.

[2] Y. LeCun, Y. Bengio and G. Hinton, Deep Learning, Nature 521 (2015), 436–444.

[3] R. DeVore & G. Lorentz, Constructive Approximation

[4] R. DeVore, Nonlinear approximation, Acta Numerica 1998, 51-150.

[5] S. Dekel and D. Leviatan, Adaptive multivariate approximation using binary space partitions and geometric wavelets, SIAM Journal on Numerical Analysis 43 (2005), 707-732.

[6] O. Elisha and S. Dekel, Wavelet decomposition of Random Forests - smoothness analysis,sparse approximation and applications, JMLR 17 (2016). link

[7] O. Morgan, O. Elisha and S. Dekel, Wavelet decomposition of Gradient Boosting, https://arxiv.org/abs/1805.02642

[8] O. Elisha and S. Dekel, Function space analysis of deep learning representation layers, https://arxiv.org/abs/1710.03263

[9] S. Ruder, On overview of gradient descent optimization algorithms, https://arxiv.org/abs/1609.04747

[10] Introduction to boosted trees, Tianqi Chen, 2014.

[11] Theories of Deep Learning lecture notes, STATS 385, Stanford University 2017, 2019.

[12] S. Roberts, M. Osborne, M. Ebden, S. Reece, N. Gibson and S. Aigrain, Gaussian processes for time-series modelling, Philosophical transactions of the royal society 371 (2013)

[13] P. Kontschieder, M. Fiterau, A. Criminisi, S. Rota Bulo, Deep neural decision forests, ICCV 2015.

[14] S. Taylor and B. Letham, Forecasting at scale, 2017.

======================

Spring 2019 - Mathematical foundations of machine learning

Syllabus: In the course we will approach Machine Learning (ML) from the perspective of geometric approximation theory and modern harmonic analysis. We will review in depth the most successful tools of ML: Support Vector Machines, Random Forest, Gradient Boosting and Deep Learning and discuss related theory and applications.

Lesson 1 - Introduction to the course (presentation)

Lesson 2 - Linear regression, logistic regression, soft-max, statistical evaluation metrics, Function space theory I [4]

Lesson 3 - Function space theory II [4], Decision tree I [1]

Lesson 4 - Function space theory III [4], Decision tree II [1]

Lesson 5 - Decision tree III, Wavelet decomposition of a decision tree , Random Forest [7]

Lesson 6 - Wavelet decomposition of RF [7], numeric estimate of Besov smoothness of a data set [9], RF-based feature importance - standard methods [1], wavelet-based [7].

Lesson 7 - Jackson theorem for wavelet decomposition of RF [7], Support Vector Machines ([1] Chapter 12), Anisotropic RF using linear SVM,

Tree based Gradient Boosting [1], [11], Wavelet-based Gradient Boosting [8].

21st April - No lesson (Passover vacation)

Lesson 8 - Deep learning building blocks I - Convolutions, non-linearities, pooling methods, Loss functions I, DL architectures I

Lesson 9 - Deep learning building blocks II - Loss functions II, Back Propagation [3], Gradient descent I [10], DL architectures II.

Lesson 10 - DL computer vision applications, AI based numerical solutions for PDEs

Lesson 11 - Overview of platform for projects I - Azure Cloud, software packages (Oren Elisha, Microsoft) presentation

Lesson 12 - Review of projects.

Lesson on 2nd of June is cancelled

Course mini-conference/project presentations - Sunday, September 22nd 2019, WIX offices, Bitan 26 TLV port schedule

References:

[1] T. Hastie, R. Tibshirani and J. Friedman, Elements of statistical learning, Springer-Verlag 2009.

[2] S. Mallat, Understanding deep convolutional networks, Phil. Trans. R. Soc 374 (2016).

[3] Y. LeCun, Y. Bengio and G. Hinton, Deep Learning, Nature 521 (2015), 436–444.

[4] R. DeVore & G. Lorentz, Constructive Approximation

[5] R. DeVore, Nonlinear approximation, Acta Numerica 1998, 51-150.

[6] S. Dekel and D. Leviatan, Adaptive multivariate approximation using binary space partitions and geometric wavelets, SIAM Journal on Numerical Analysis 43 (2005), 707-732.

[7] O. Elisha and S. Dekel, Wavelet decompositions of Random Forests - smoothness analysis,sparse approximation and applications, JMLR 17 (2016). link

[8] O. Morgan, O. Elisha and S. Dekel, Wavelet decomposition of Gradient Boosting, https://arxiv.org/abs/1805.02642

[9] O. Elisha and S. Dekel, Function space analysis of deep learning representation layers, https://arxiv.org/abs/1710.03263

[10] S. Ruder, On overview of gradient descent optimization algorithms, https://arxiv.org/abs/1609.04747

[11] Introduction to boosted trees, Tianqi Chen, 2014.

[12] Theories of Deep Learning lecture notes, Stanford University 2017.

======================

Spring 2018 - Mathematical foundations of machine learning

Syllabus: In the course we will approach Machine Learning (ML) from the perspective of geometric approximation theory and modern harmonic analysis. We will review in depth the most successful tools of ML: Support Vector Machines, Random Forest, Gradient Boosting and Deep Learning and discuss related theory and applications.

Lesson 1 - Quick introduction, Statistics - basic definitions

Lesson 2 - Function spaces I ([4], parts of sections 2.1, 2.5, 2.6, 2.7)

Lesson 3 - Function Spaces II ([4], parts of sections 2.7, 2.9, [5] parts of sections 3.1, 3.2), Decision Trees I ([1], [6], [7])

Lesson 4 - Decision Trees II ([1], [6], [7]), Wavelet decomposition of decision trees ([6],[7]), Random Forest ([1], [7]), Examples of RF code in R, Wavelet decomposition of RF ([7])

Lesson 5 - RF-based feature importance - standard methods [1], wavelet-based [7]

Lesson 6 - Function Spaces III ([4] Section 2.10), Definition and numeric estimate of Besov smoothness of data set [9]

Lesson 7 - Jackson theorem for wavelet decomposition of RF [7], Support Vector Machines ([1] Chapter 12), Anisotropic RF using linear SVM,

Lesson 8 - Ada Boost [1], Tree-based Gradient Boosting [1], Wavelet-based GB [8], Deep learning building blocks I - convolutions [3]

Lesson 9 - Deep learning building blocks II - Convolutions II, Logistic Regression, Soft-Max layer [3], Gradient descent I [10], Back Propagation [3].

Lesson 10 - Deep Learning architectures. Computer vision using DL - applications (presentation). With Meir Perez (AI team leader, WIX)

[20th May, Shavuhot holiday, no lesson]

Lesson 11 - Applied ML hands-on session I (presentation). With Oren Elisha (Microsoft R&D)

Lesson 12 - Applied ML hands-on session II (presentation). With Oren Elisha (Microsoft R&D)

Lesson 13 - Gradient descent II [10], Projects review, epilogue [11]

Oren Elisha - Further DL implementation tips

Course mini-conference/project presentations - Wednesday, September 5th, WIX offices, Bitan 26 TLV port Schedule

References:

[1] T. Hastie, R. Tibshirani and J. Friedman, Elements of statistical learning, Springer-Verlag 2009.

[2] S. Mallat, Understanding deep convolutional networks, Phil. Trans. R. Soc 374 (2016).

[3] Y. LeCun, Y. Bengio and G. Hinton, Deep Learning, Nature 521 (2015), 436–444.

[4] R. DeVore & G. Lorentz, Constructive Approximation

[5] R. DeVore, Nonlinear approximation, Acta Numerica 1998, 51-150.

[6] S. Dekel and D. Leviatan, Adaptive multivariate approximation using binary space partitions and geometric wavelets, SIAM Journal on Numerical Analysis 43 (2005), 707-732.

[7] O. Elisha and S. Dekel, Wavelet decompositions of Random Forests - smoothness analysis,sparse approximation and applications, JMLR 17 (2016).

[8] O. Morgan, O. Elisha and S. Dekel, Wavelet decomposition of Gradient Boosting, https://arxiv.org/abs/1805.02642

[9] O. Elisha and S. Dekel, Function space analysis of deep learning representation layers, https://arxiv.org/abs/1710.03263

[10] S. Ruder, On overview of gradient descent optimization algorithms, https://arxiv.org/abs/1609.04747

[11] Theories of Deep Learning lecture notes, Stanford University 2017.

Fall 2017 - Introduction to function space theory

Syllabus: In the course we will review the range of function spaces that are fundamental to mathematical analysis and their various characterizations through Harmonic analysis, atomic representations and approximation spaces: Lp Spaces, Hardy spaces, Sobolev spaces, Triebel-Lizorkin spaces, Besov spaces. Time allowing we will also cover interpolation of functions spaces and function spaces over manifolds.

References

E. Stein, Harmonic analysis, real variable methods, orthogonality and oscillatory integrals,

L. Grafakos, Classical and modern harmonic analysis,

L. Tartar, An introduction to Sobolev spaces and interpolation spaces,

R. Adams and J. Fournier, Sobolev Space (2nd edition),

R. DeVore & G. Lorentz, Constructive Approximation

Lesson 1 - Lp spaces, Hilbert spaces, first glimpse into Hardy spaces, weak Lp spaces

Lesson 2 - first glimpse into function space interpolation, Fourier analysis on the Torus, Schwartz functions,

Lesson 3 - Fourier analysis on R^n, distributions, convolutions

Lesson 4 - Real Hardy spaces I

Lesson 5 - Real Hardy spaces II

Lesson 6 - Real Hardy spaces III

Lesson 7 - Sobolev spaces I

Lesson 8 - Sobolev spaces II, Modulus of smoothness I,

Lesson 9 - Modulus of smoothness II, Lipschitz spaces

Lesson 10 - Besov spaces I

Lesson 11 - Besov spaces II, review of theorem list for exam

Spring 2017 - Foundations of approximation theory

Syllabus Approximation theory is one of the main theoretical pillars of applied mathematics. One of its goals is to characterize the classes of functions that can be approximated by a specified algorithm with the error decaying at a certain qualitative rate. Examples for approximation algorithms are: Fourier series, algebraic polynomials, splines, wavelets, finite elements, etc. So as to provide the theoretical foundations of signal & data analysis, approximation theory applies tools to measure weak-type smoothness of functions, which allows to assess the ‘smoothness’ of functions that are not even continuous. One of the main challenges in the theory is multivariate approximation where modeling of the geometry of the approximated function plays an important role. The syllabus includes: weak-type smoothness, functions spaces, trigonometric approximation, local polynomial approximation, splines, multiresolution, non-linear approximation using wavelets, approximation spaces, the machinery of the Jackson-Bernstein theorems for the characterization of approximation spaces, geometric approximation.

R. DeVore & G. Lorentz, Constructive Approximation, Springer-Velag, 1993.

R. DeVore, Nonlinear Approximation, Acta Numerica 1998, 51-150.

Lesson 1 - Quick overview, Banach Spaces, Lp spaces, Fourier analysis I

Lesson 2 - Fourier analysis II, Heat Kernel, Fourier transform I

Lesson 3 - Fourier transform II, Smoothness spaces I, Modulus of Smoothness I

Lesson 4 - Modulus of Smoothness II, Smoothness spaces II, Jackson theorem for trigonometric polynomial approximation

Lesson 5 - K-functional, Smoothness spaces III (Lip)

Lesson 6- Smoothness spaces IV (Besov) , Local polynomial approximation

Lesson 7 - Approximation from Shift-Invariant Spaces I

Lesson 8 - Approximation from Shift-Invariant Spaces II, Multiresolution, Free-knot piecewise constant approximation,

Lesson 9 - Wavelets

Lesson 10 - Jackson theorem for Wavelets, Approximation spaces

Lesson 11 - Bernstein theorem for Trigonometric polynomials, Jackson - Bernstein machinery, interpolation spaces

Lesson 12 - Bernstein theorem for N-term wavelets, more examples of characterizations of approximation spaces, review of theorem list for the exam

Fall 2016 - Wavelets

Syllabus Quick introduction using the Haar system, properties of the Fourier transform, sampling theorems, continuous wavelet transform, frames, singularity analysis, wavelet bases, fast wavelet transform, nonlinear approximation, applications: compressed sensing, image compression, denoising, data science (deep convolution networks, random forests, gradient boosting)

S. Mallat, A Wavelet tour of signal processing, 3rd edition (the sparse way), 2009.

I. Daubechies, Ten lectures on wavelets, 1992.

A. Cohen - Numerical analysis of wavelet methods, 2003

R. DeVore - Nonlinear Approximation, Acta Numerica 1998, 51-150.

O. Christensen - An introduction to frames and Riesz bases, 2003.

Lesson 1 - Quick introduction using the Haar system, review of some applications of wavelets.

Lesson 2 - Fourier analysis I

Lesson 3 - Fourier analysis II, Shift Invariant spaces

Lesson 4 - Shift Invariant spaces II, Sinc and Ideal low pass

Lesson 5 - Multiresolution

Lesson 6 - Wavelets, Wavelet transforms, Cascade algorithm

Lesson 7- Cascade II, Biorthogonal wavelets

Lesson 8 - Mutlivariate wavelets, Applications: image compression, inpainting, compressed sensing

Lesson 9- Wavelet compositions of Random Forests (Oren Elisha)

Lesson 10 - Wavelet continuous transform

Lesson 11 - Frames, Riesz basis

Lesson 12 - Scattering Networks, Convolution Networks (Leon Gugel)

Lesson 13 - Review of project list

Spring 2016 - Analysis of spaces of homogeneous type

Syllabus Spaces of homogeneous type are an important modern generalization of the Euclidian space. In the course we shall see that a rather weak assumption, on the relationship between the measure of ‘volume’ and the metric, provides the setup for wide-ranging theory, as in the Euclidian case. Some examples for spaces of homogeneous type are: manifolds, anisotropic spaces, graphs, various matrix and group structures, etc. The course will introduce function spaces over such geometric structures and harmonic analysis tools that replace the Fourier series or transform. The syllabus includes: Schwartz functions and distributions, Fourier analysis over R^n, function space characterization, homogeneous spaces, spectral representation, Laplace and heat kernel operators on manifolds, functional calculus, spectral analysis on manifolds. We will also discuss applications in data science, for example: deep learning where the data is collected over a manifold.

[S] E. Stein, Harmonic analysis: real variable methods, orthogonality and Oscillatory integrals, 1993.

[Gra] L. Grafakos, Classical and Modern Fourier Analysis, 2004.

[Gri] A. Grigor’yan, Heat kernel and analysis on manifolds, 2009.

[CKP] T. Coulhon, G. Kerkyacharian, P. Petrushev, Heat kernel generated frames in the setting of Dirichlet spaces, Jounral of Fourier analysis and Applications 18 (2012), 995-1066.

[KP] G. Kerkyacharian, P. Petrushev, Heat kernel based decomposition of spaces of distributions in the framework of Dirichlet spaces, Transactions of the American Mathematical Society 367 (2015), 121-189.

[DDP] W. Dahmen, D. Dekel and P. Petrushev, Two-level splits of anisotropic Besov spaces, Constructive Approximation 31 (2010), 149-194.

[DL] R. DeVore & G. Lorentz, Constructive Approximation, Springer-Velag, 1993.

Lesson 1 - Banach spaces, Fourier series, heat equation on the torus, functional calculus on manifolds (intro)

Lesson 2 - Fourier series (cont), definition & examples of homogenuous spaces ([S] 8-11), elliposid covers I ([DDP], Section 2).

Lesson 3- Ellipsoid covers II, spaces of homogenuous type II ([KP]), Schwartz class ([Gra] Sections 2.2, 2.3).

Lesson 4 - distributions ([Gra] Sections 2.2, 2.3), Schwartz & distributions on manifolds [KP], Fourier transform, Fourier transform of distributions [Gra],

Lesson 5 - Laplace operator on manifolds [Gri, Chapter 3], setup of localized heat kernels [KP].

Lesson 6 - Applications - deep learning on manifolds (Leon Gugel's presentation).

Lesson 7 - Davies-Gaffney estimate, Finite Speed Propogation, Kernel localization [KP].

Lesson 8 - Kernel localization II [KP]

Lesson 9 - Kernel localization III, spaces of bandlimited functions, spectral approximation (Euclidian case)

Lesson 10 - Spectral space on manifolds [KP], Nikolskii inequality, Bernstein inequality (trigonometric polynomials) [DL],

Lesson 11 - Bernstein & Jackson inequalities (spectral spaces) [CKP], Maximal function [S]

Lesson 12 - Maximal function II, Hardy spaces (def), uniformly bounded Hardy norm of atoms [S], review of list of theorems for exam.