class: center, middle, inverse, title-slide # Introduction to Machine Learning ### Alexander Goncearenco ### Martin Skarzynski ### 2019-02-07 --- ![](assets/img/image1.png) --- ![](assets/img/image3.png) <br> <br> -- ![](assets/img/image2.png) --- ![](assets/img/image4.png) --- # Problem formulation .pull-left[+ Data set + **X** [N, M]: N samples, N features + **y** : N labels (outcomes / responses) + Machine Learning: + Type of task + Batch vs online + Supervision ] .pull-right[![](assets/img/image5.png)] --- # By task: + In **classification** the algorithm must assign inputs to one of two (or more) classes + In **regression** the algorithm returns a value for each sets of inputs received + In **clustering** the algorithm divides the inputs supplied into two or more subgroups + In **density estimation** the algorithm constructs an estimate for the population distribution based on a small subset of the whole + Finally, **dimensionality reduction** maps inputs on to a lower dimensional space --- # Batch vs Online + In **batch learning** algorithms all the data is available and can be processed at one time + In **online learning** algorithms only part of the data is available for inclusion at any one time --- # Supervised vs Unsupervised + In **supervised learning** each of the supplied inputs is labeled with the desired output + In **unsupervised learning** no labels are available and the algorithm must find the underlying structure in the data itself + Somewhere in the middle is **semi-supervised learning** in which only some of the supplied inputs also have the desired output + In **reinforcement learning** the algorithm must interact with an environment to perform certain actions that maximize reward --- ![](assets/img/image6.png) --- # Classification .pull-left[![](assets/img/image7.png)] .pull-right[![](assets/img/image5.png)] --- # Classification .pull-left[![](assets/img/image8.png)] .pull-right[![](assets/img/image5.png)] --- # Regression + Predicted variable + Observed variable ![](assets/img/image9.png) --- # Clustering ![](assets/img/image10.png) --- # Clustering ![](assets/img/image11.png) --- # Latent variable models, density estimation ![](assets/img/image12.png) --- # Dimensionality reduction, feature selection <img src='assets/img/image13.png' width='500px'></img> --- # Dimensionality reduction for visualization .pull-left[![](assets/img/image15.png)] .pull-right[![](assets/img/image14.png)] --- # Simple neural networks and Deep Learning ![](assets/img/image16.png) --- # When to use Machine Learning? + **You cannot code the rules:** Many tasks cannot be adequately solved using a deterministic rule-based solution. A large number of factors could influence the answer (feature space is large) + **You cannot scale:** ML solutions are effective at handling large-scale problems (sample space is large) + Machine learning benefits come at a price (computational, design)