Course Outline:
Learning goals: recognize ML problem types; understand terminology, ML workflow, role of ML engineer
• supervised learning – classification and regression • unsupervised learning – clustering mentioned, but not practiced
Learning goals: analyze data; sample, prepare, and organize data in data matrices; identify features and labels for supervised learning
• sampling, filtering, and cleaning data • plotting data with Seaborn and Matplotlib • data types • feature engineering (mapping predictive or causal concepts to data representation, manipulating data so that it is appropriate for common machine learning APIs) • feature transformations (binary indicator, one-hot encoding, functional transformations, interaction terms, binning, scaling) • correlation, covariance, mutual information • outliers and missing data • common statistics review
Learning goals: train and evaluate supervised learning models (k-nearest neighbors, decision trees), understand overfitting and underfitting
• plitting data into testing, training, validation sets • zero-one loss function • hyperparameters • model-based learning vs instance-based learning • kNN, dtrees for classification • distance functions • drawing 2D decision boundaries • entropy (for dtrees) • optimizing models • bias-variance tradeoff
Learning goals: train and evaluate linear models – logistic regression, linear regression; understand mechanics and use of gradient descent for optimization and loss functions for evaluation; understand math using vectors and matrices and its role in implementing a linear model
• advantages and disadvantages of linear models • improving linear model by minimizing loss function • loss functions for classification and regression problems (log loss, mean squared error) • use of weighted sum and sigmoid in making predictions • review of vector and matrix math • optimization using gradient descent • learning rates • regularization • difference between logistic and linear regression
Learning goals: create and select model candidates using evaluation metrics, set up training, validation, and test splits; tune hyperparameters; select features to improve model performance
• out-of-sample validation • k-fold cross validation • feature selection (heuristic, stepwise, regularization) • hyperparameter optimization • confusion matrix and classification metrics (accuracy, precision, recall) • AUC-ROC curve for model evaluation • calibration curve
Learning goals: understand what ensemble methods are and when to use them, understand mechanics of random forests and gradient boosted decision trees, build and tune different models to improve performance using ensemble methods
• model error (bias + variance) • types of ensemble methods: stacking, bagging, boosting • random forests • gradient boosting and gradient boosted decision trees • optimizing gradient boosted decision trees
Learning goals: use NLP preprocessing techniques to convert text to data suitable for ML, understand how word embeddings are used to convert text into numerical features, implement ML models to make predictions from text data, understand basic ideas behind feedforward and recurrent neural networks
• preprocessing methods: lemmatization, n-grams, stop words • vectorization: binary, count, TD-IDF • word embeddings • cosine similarity • using word embeddings with sparse data sets • pooling approaches to capture different concepts in text • neural networks and non-linear transformations (high level description and interactive) • RNN (high level description)
Learning goals: apply bias-variance tradeoff to model evaluation, diagnose how feature issues contribute to degraded model performance, understand souces of discriminatory bias and how to measure and mitigate, understand how to improve fairness and accountability of a model
• machine learning-based risk and ML model failure modes • model developer best practices (Agile model development, applying unit tests, writing code to be reproducible, creating good documentation.) • model deployment • data issues: bottlenecks, bias-variance tradeoff, class imbalance • feature issues: irrelevant features, feature leakage • algorithmic fairness (allocative harm, representational harm) • algorithmic accountability (transparency) • roles on ML project teams
As a result of this intentional design, you will learn the habits, skills, and mindsets to be a successful entry-level ML or AI engineer, including the following competencies:
Master machine learning fundamentals to develop your portfolio of artifacts. Cultivate your authentic leadership and community to use machine learning for social good. Master project management and collaboration skills to maximize your impact. Become an effective communicator, presenter, and interviewer. Learn the skills to effectively navigate your work environment.
This course will provide you with the skills to unlock value in unstructured data sets and aid you in making recommendations about what models to use when solving machine learning problems. You will become familiar with factors to consider and questions to ask to make sure they’re implemented in the most effective way, including how to train a more powerful model using advanced evaluation and hyperparameter tuning methods.