Teacher
|
CASTIGLIONE Filippo
(syllabus)
Introduction and generality; What is machine learning; definitions; supervised and unsupervised learning; regression and clustering; Univariate linear regression; representation; the hypothesis function; the choice of the parameters of the hypothesis function; the cost function; the Gradient Descent algorithm; the choice of the alpha parameter; Multivariate linear regression; vector notation of the hypothesis function and of the cost function; Gradient Descent algorithm for multivariate; matrix notation; feature scaling and normalization; polynomial regression; the Normal Equation for multivariate regression; final notes on the comparison of the Gradient Descent algorithm and the calculation of the Normal Equation; Logistic Regression; binary classification; representation of the hypotheses; the logistics function; the decision boundary; the cost function for logistic regression; the gradient descent algorithm for logistic regression; analytic derivation of the gradient of the cost function for logistic regression; notes on the implementation in Octave of the cost function and of the gradient descent algorithno in the case of logistic regression; considerations on advanced optimization methods; multi-class classification; the one-vs-all method; The regularization; the problem of overfitting / underfitting (ie high variance / high bias); modification of the cost function; the regularization parameter; regularization of linear regression; the algorithm of gradient descent with regularization; the regularized normal equation; logistic regression with regularization; Neural networks history; AI and connectionism; the perceiver; Rosenblatt's learning rule; learning of boolean functions; the limits of the perceiver; Neural networks; reasons; neurons; neuroplasticity and the one-learning-algorithm hypothesis; model representation; the neuron as a logistic unit; the weight matrix; the bias; the activation function; forward propagation; vector version; the NNs as an extension of the logistic regression; calculation of the Boolean functions AND, OR, NOT, XNOR; multiclass classification with Neural Networks; Neural Network Learning; cost function of a Multi Layer Perceptron; the Backpropagation algorithm; Intuition and formalization; Neural Network learning; Error BackPropagation Algorithm (scalar version, vector version); Notes on implementation; rolling and unrolling of the parameters for passing the weight matrix into Octave; Gradient checking by calculating the numerical approximate gradient; initialization of weights and symmetry breaking; The ALVINN network (an autonomous driving system); Machine Learning Diagnostic; Evaluating a Learning Algorithm; The test set error; Model selection + training, validation and test set; The concept of Bias and variance; Regularization and Bias / Variance; Choosing the regularization parameter; Putting all together: diagnostic method; Learning curves; Machine Learning system design; Debugging a learning algorithm; Diagnosing Neural Networks; Model selection; Error analysis; The importance of numerical evaluation; Error Metrics for Skewed Classes; Precision / Recall and Accuracy; Trading Off Precision and Recall; The F1 score; Data for Machine Learning; Designing a high accuracy learning system; Rationale for large data; Support Vector Machines; SVM Cost function; SVM as Large margin Classifiers; the Kernels; choice of landmarks; choice of parameters C and sigma; Multi-class Classification with SVM Comparison between Logistic Regression and SVM and between NN vs. SVM; Clustering; the K-means algorithm; cluster assignment step; move centroids step; optimization objective; choosing the number of clusters, the elbow method; Dimensionality Reduction; Principal Component Analysis; Motivation I: Data compression; Motivation II: data visualization - Problem Formulation; Goal of PCA; The role of Singular Value Decomposition in the PCA algorithm; Reconstruction from compressed representation; Algorithm for choosing k; Advice for Applying PCA; The most common use of PCA; Misuse of PCA; Anomaly Detection; Problem motivation; Density estimation; Gaussian distribution; Anomaly Detection; Gaussian distribution; Parameter estimation; The Anomaly Detection Algorithm; Anomaly Detection vs. Supervised Learning; Multivariate Gaussian Distribution; Recommender Systems; Collaborative Filtering; Motivation; Problem Formulation; Content Based Recommendations; Notation; Optimization objective; Gradient descent update; Low Rank Matrix Factorization; Learning with large datasets; Online learning; Stochastic gradient descent; Mini-batch gradient descent; Checking for convergence; Map reduce and data parallelism; Machine Learning pipeline; the OCR systeml ceiling analysis; Laboratory: exercise related to Recommender Systems;
(reference books)
J. Watt, R. Borhani, A. K. Katsaggelos. Machine Learning Refined. Cambridge Univ. Press 2016
|