Training details

As the market's maturity in Data Science continues to grow each day, the ability to rapidly deploy projects into production and deliver value to users becomes increasingly crucial. This 2-day intensive training program is specifically designed to bridge the gap between data science and operational deployment. The training focuses on the emerging field of MLOps, which combines Machine Learning, DevOps, and data engineering practices. Participants will learn how to streamline the end-to-end process of designing, building, and maintaining machine learning models, ensuring they are production-ready, scalable, and robust. The training covers the latest industry practices in automating and optimizing ML systems, integrating machine learning models into production environments, and managing the lifecycle of these models effectively.

Objective

Understanding the lifecycle of ML models from development to production.
Uncover the principles of Software engineering as applied to Data Science.
Gaining proficiency in MLOps tools for deploying, evaluating, and monitoring ML systems.
Learning best practices for maintaining and improving deployed ML models.
Developing skills to manage and automate ML systems in production.

Target Audience

Analysts
Statisticians
Data Scientists
Data Engineers
Machine Learning Engineers
Developers

Prerequisites

Understanding of Data Science fundamentals (models, bias, variance, etc.).
Familiarity with Python data manipulation libraries (pandas, numpy, etc.).
Command-line proficiency in Linux (e.g., bash).
Laptop with at least 8Gb of memory, IDE installed.

Pedagogical method

A mix of lectures, case studies, and hands-on workshops.
A hands-on trainong where participants evolve a codebase from exploration to production-ready,
Collaborative learning through group activities and discussions.
Proportion of presentations: 20%.
Proportion of practical cases: 70%.
Proportion of experience sharing: 10%.

Evaluation and follow-up mode

Participants' skills are assessed throughout the course via workshops and practical exercises. Immediate post-session evaluations measure trainee satisfaction, and a certificate is provided, detailing the course's objectives, content, duration, and the skills developed.

Program

Day 1: Foundations and Best Practices in Data Science and MLOps
- Introduction to MLOps
  - Role of Data Scientists in MLOps
  - Core Beliefs in Data Science and MLOps
- Setting Up Development Environments
  - Jupyter Notebook for Data Analysis
  - PyCharm for Advanced Python Development
  - Anaconda Environment Management
- Implementing Clean Code Practices
  - Cleaning and Organizing Jupyter Notebooks
  - Effective Variable Naming and Configuration Management
  - Functional Programming Techniques in Data Science
  - Principles of Immutability
  - Function Creation in Data Analysis
  - Hands-On: Refining a Data Science Notebook
- Testing Your Data Science Code
  - Test-Driven Development (TDD) Fundamentals
  - Unittest Framework in Python
  - Structuring Test Classes
  - Writing and Executing First Tests
  - Setup and Teardown Methods
  - Setuptools Integration for Testing
  - Hands-On: Developing Unit Tests
Day 2: Operationalizing and Documenting ML Projects
- Version Control in Data Science
  - Git Basics for Code Version Management
  - Effective Tagging and Versioning
  - Dataset and Model Management with Git
- Documenting Machine Learning Projects
  - Importance of Documentation in ML Projects
  - Best Practices for Writing Clear and Concise Documentation
  - Tools and Techniques for Documenting ML Code and Results
- Making ML Code Deployable
  - Packaging Defined
  - Setuptools for Code Packaging
  - Dependency Management in Python
  - Local Code Installation
  - Separating Training and Inference Phases
  - Data Preparation and Object Saving
  - Hands-On: Package Creation and Deployment
- Conclusion and Wrap-Up
  - Reviewing Key MLOps Concepts
  - Sharing experiences and feedback for practical application.
  - Final Q&A and Feedback Session

MLOps and LLMOps - Industrializing a Data Science Project