Training details

Our 3-day training program is designed to empower data scientists and developers with the skills to harness unstructured language data prevalent in businesses. With the advancements in deep learning, especially in conversational language models like ChatGPT, this course offers an in-depth exploration into the world of Natural Language Processing (NLP) and Speech Processing. Participants will learn to work within an innovative technological context, focusing on AI projects that leverage text and voice data. By the end of the training, you will have a solid understanding of the potential and state-of-the-art in NLP and Speech Processing, including the revolutionary 'Transformers' architectures underlying models like ChatGPT. The practical exercises will equip you to independently deploy and create value from language data, turning you into an expert in processing written and spoken language.

Objective

Develop proficiency in structuring text and voice data for analysis and processing.
Gain expertise in analyzing large volumes of text and/or voice data, applying advanced machine learning models effectively.
Acquire skills to process voice and/or text data in real-time, adapting to dynamic data flows.
Learn to implement intelligent search mechanisms within documents and audio recordings, enhancing data retrieval efficiency.
Master the creation of intent detection and entity recognition models to extract meaningful insights from language data.
Understand the underlying methodologies of advanced language models such as ChatGPT, BERT, and their applications in various contexts.

Target Audience

Data Scientists.
AI and Machine Learning Developers.
Back-end/front-end technologists interested in pursuing a career in LLM application development.
Data scientists with a foundation in NLP who are looking to deepen their understanding of the LLM revolution and its impact on data science.
Technology Professionals interested in NLP and Speech Processing.

Prerequisites

Completion of the Data Science Fundamentals training program is preferable.
Basic understanding of machine learning concepts and models.
Familiarity with programming languages like Python.
Knowledge of deep learning frameworks is advantageous.

Pedagogical method

Theoretical instruction combined with practical, hands-on exercises.
Case studies and real-life applications of NLP and Speech Processing.
Interactive sessions for a deeper understanding of concepts.
Group activities to foster collaborative learning.
Proportion of presentations: 50%
Proportion of practical cases: 40%
Proportion of experience sharing: 10%

Evaluation and follow-up mode

Continuous assessment through practical exercises and projects.
Feedback sessions for progress evaluation.
Post-training resources for extended learning.
Certification of completion highlighting skills acquired.

Program

Day 1: Foundations and Theoretical Aspects of NLP and LLM
- Introduction to Text and Voice Analysis
  - Exploring NLP, NLU, Speech Processing, and Understanding
  - Impact of conversational language models like ChatGPT
- Natural Language Processing (NLP) Fundamentals
  - Basics of NLP: encoding, regex, tokenization, n-grams, bag of words
  - Dimensionality reduction in NLP
  - Text cleaning techniques: stemming, lemmatization
  - Topic modeling: SVD, NMF, LDA
  - Word embedding methods: Word2Vec, FastText
- Information Retrieval (IR): Building a Search Engine
  - Fundamentals of content indexing and simple search engines
  - Creating intelligent search engines using language models (GPT, BERT, etc.)
- Deep Dive into Theoretical Aspects
  - "Attention is All You Need" and other foundational theories
  - Analyzing AGI hype and LLM capabilities
  - Techniques in prompt engineering and prompt hacking
Day 2: Deep Learning Methodologies and Language Model Revolution
- Deep Learning Methodologies for Language Processing
  - Basics of neural networks
  - Sequential models: RNNs
  - Understanding the "Transformers" revolution and mastering multi-head attention
- Revolution of Language Models for Conversation - ChatGPT
  - Overview of Large Language Models (LLMs): BERT and GPT families
  - Introduction to "Reinforcement Learning from Human Feedback" (RLHF)
  - Practical uses of these models in NLP tasks: summarization, sentiment analysis, content generation
- Working with Tokens, Embeddings, and Limitations
  - Understanding tokens, embeddings in language models
  - Analyzing existing models and their limitations
Day 3: Audio Processing, Speech Recognition, and Session Wrap-Up
- Audio Processing
  - Basics of audio data: digital signal, encoding
  - Structuring audio data: Fourier transform, Mel spectrogram, MFCC, using Librosa, PyAudio
  - Training machine learning models on audio data
- Speech Recognition
  - Implementing transcription models (Speech to Text)
  - Using open-source models like Whisper (OpenAI) and external APIs
  - Real-time transcription: challenges and methodologies
  - Context-aware transcription: fine-tuning Speech to Text models
  - Speaker diarization methodologies
  - Advanced topics: managing temporal information and transcription confidence
- Review and Training Conclusion
  - Recap and synthesis of concepts covered
  - Open discussion and feedback session
  - Additional Q&A and clarifications

Mastering Natural Language Processing and Building LLM-Powered Applications

Training details

Objective

Target Audience

Prerequisites

Pedagogical method

Evaluation and follow-up mode

Program

Day 1: Foundations and Theoretical Aspects of NLP and LLM

Day 2: Deep Learning Methodologies and Language Model Revolution

Day 3: Audio Processing, Speech Recognition, and Session Wrap-Up

Contact us to discuss your project