Skip to content

Slides, exercises, and exams for my course "Natural Language Processing" (École Pour l'Informatique et les Techniques Avancées, 2024 and 2025)

License

Notifications You must be signed in to change notification settings

gustavecortal/natural-language-processing

Repository files navigation

Natural Language Processing

Slides and exercices for my course "Natural Language Processing" (École Pour l'Informatique et les Techniques Avancées, 2024 and 2025). Semester-long courses to 150 final-year engineering students (60h).

An introduction to the fundamentals of natural language processing. We explore various algorithms for text classification and generation. The goal is to start with simple language models like n-grams and progress towards understanding modern architectures like transformers. During exercises, we implement different algorithms from scratch.

Syllabus:

0 - Course introduction

1 - Tokenization: regular expressions and the Byte-Pair Encoding algorithm

2 - N-grams

3 - Text classification with naive Bayes

4 - Text classification with logistic regression

5 - Vector semantics: tf-idf and Word2vec

6 - Feedforward neural networks

7 - Recurrent neural networks and attention mechanisms

8 - Transformer

Final project

The final project involves applying the various algorithms covered in the course to multiple datasets and identifying limitations and possible improvements.

Students' projects have been very diverse: generating song lyrics and poems, automatically moderating Twitch conversations, detecting spoilers in movie comments, detecting generated texts, etc.