Training details

Training for computer scientists and statisticians who want to pursue a career in data science and data engineering in its scientific, technical, and methodological aspects. The training offers a review of the major theoretical concepts that form the scientific foundation of data science and encourages participants to reflect on these concepts to ensure appropriate understanding. The training also reviews machine learning algorithms from a theoretical and practical perspective. Finally, the data engineering component is covered through hands-on exercises that provide a good understanding of this time-consuming phase of data science projects.

This training was designed to be a bridge between two close but different worlds. The world of computing, which is deterministic and technical, and the world of statistics, which emphasizes reflection, method, trial and error, and is deeply exposed to business problems.

Objective

The pedagogical objective of the training is to provide a theoretical and practical support to new data scientists based on their past computer knowledge.

Target Audience

Computer scientists and statisticians with good computer foundations and a standard level in mathematics.

Prerequisites

Laptop with at least 8Gb of memory, Anaconda Python distribution installed.

Pedagogical method

Training with reflection, case studies, and practice.

Proportion of presentations: 60%
Proportion of practical cases: 30%
Proportion of experience sharing: 10%

Evaluation and follow-up mode

Practical exercises, reflections on case studies, homework post-training, and follow-up between T training and T+3 training for questions and inquiries.

Program

Day 1: Foundation in Big Data and Data Science Essentials
- Big Data Insights
  - Exploring the concept and scope of Big Data
  - Understanding the technological landscape of Big Data
- Diving into Data Science
  - Mastering the terminology used in Data Science problems
  - Transitioning from statistical analysis to machine learning
  - Overview of machine learning capabilities and applications
- Machine Learning Problem Formulation
  - Defining inputs and outputs in machine learning problems
  - Practical Case Study: "Optical Character Recognition (OCR)" - Approaches and modeling
- Exploring Machine Learning Algorithms
  - Comparative analysis of supervised vs. unsupervised learning
  - Clarifying classification and regression concepts
- Deep Dive into Algorithms - Linear Regression
  - Fundamental concepts: hypothesis functions and optimization
  - Building and understanding cost functions
  - Introduction to gradient descent method
- Deep Dive into Algorithms - Logistic Regression
  - Concepts around decision boundaries
  - Formulating convex cost functions for classification
- Tools for Data Scientists
  - Introduction to essential tools and software
  - Beginner's guide to Python, Pandas, and Scikit-learn
- Case Study 1: "Real-World Data Analysis"
  - Problem definition and approach
  - Initial data handling and analysis using Python
Day 2: Advanced Modeling and Machine Learning Challenges
- Day 1 Recap and Review
  - Revisiting and consolidating key learnings from Day 1
- Characteristics of a Robust Model
  - Delving into cross-validation techniques
  - Discussing various evaluation metrics: Precision, Recall, ROC, MAPE
- Machine Learning Pitfalls
  - Understanding and addressing overfitting
  - Examining bias vs. variance balance
  - Introduction to regularization: Ridge and Lasso
- Data Cleaning Techniques
  - Handling diverse data types
  - Strategies for outlier detection and management
  - Approaches to missing value handling
  - Practical Exercise: "Data Cleansing Strategies"
- Art of Feature Engineering
  - Handling non-continuous variables
  - Techniques for creating impactful features
- Case Study 2: "Advanced Data Analysis Scenario"
  - Developing features and building models
  - Practical implementation and evaluation
- Visualizing Data Insights
  - Techniques for effective data visualization
  - Understanding algorithms through visual representation
- Ensemble Methods Introduction
  - Fundamentals of decision trees
  - Overview of ensemble strategies: Bagging and Boosting
  - Practical Application: "Improving Model Performance with Ensemble Methods"
- Semi-Supervised Learning Applications
  - Exploring unsupervised algorithms: Clustering, PCA
  - Practical Case Study: "Anomaly Detection Techniques"
Day 3: Practical Application and Competition Engagement
- Synthesis and In-depth Review
  - Recap of key concepts and methodologies
  - Focused sessions based on participant interest
- Extensive Practical Exercises
  - Hands-on application of theories and concepts

Data Science Fundamentals

Training details

Objective

Target Audience

Prerequisites

Pedagogical method

Evaluation and follow-up mode

Program

Day 1: Foundation in Big Data and Data Science Essentials

Day 2: Advanced Modeling and Machine Learning Challenges

Day 3: Practical Application and Competition Engagement

Contact us to discuss your project