Seasons Of Code

NLPlay with Transformers    • Tezan Sahu    • Shreya Laddha   

WnCC - Seasons of Code

Seasons of Code is a programme launched by WnCC along the lines of the Google Summer of Code. It provides one with an opprtunity to learn and participate in a variety of interesting projects under the mentorship of the very best in our institute.

List of Running Projects

NLPlay with Transformers

NLPlay with Transformers

The core idea behind this project is to familiarize with Deep NLP (one of the most sough-after domains of AI) & get hands-on experience with various deep network architectures while trying to accomplish 2 major tasks: Sentiment Analysis (Classification) & Automatic Lyrics Generation (Text Generation)

No. of mentees: 8-9

Description: Natural Language Processing (NLP) is a field of AI that gives the machines the ability to read, understand and derive meaning from human languages. Back in 2014, Sequence to Sequence (Seq2Seq) models revolutionized the field of NLP. Machine Learning is a quickly developing discipline, and it wasn’t too long since even more groundbreaking work built on the shoulders of giants came along. Today, we talk about the hype surrounding Transformer models such as BERT and, most recently, GPT-3.

This project involves learning NLP from the ground up. You will be introduced to the fundamental concepts and algorithms used for Natural Language Processing through an in-depth exploration of two different examples: sentiment classification and song lyrics generation. Starting the journey by understanding the basic theory behind Deep NLP, you will maneuver your way through RNNs, LSTMs and GRUs while simultaneously applying the gained knowledge to the sentiment classification task. In the second stage of the project, we will focus on transformers, beginning with using BERT, RoBERTa & XLM for the sentiment classification task, and concluding this project by employing models like GPT & T5 for the song lyrics generation task.

Prereqs - Enthu !!!

Tentative Project Timeline

Week Number Tasks to be Completed
Week 1 (22/03 - 28/03) Basics of Neural Networks & NLP, with introduction to PyTorch & NLTK
Week 2 (29/03 - 04/04) Implement Basic Preprocessing & Feedforward Network for Sentiment Analysis; Get started with Deep NLP
Week 3 (05/04 - 11/04) Understand RNN, LSTM & GRU; implement them for Sentiment Analysis
Week 4 (10/05 - 16/05) Introduction to Transformers & the Huggingface library
Week 5 (17/05 - 23/05) Deep Dive into Transformer architectures; Fine-tune BERT, XLM, etc. for Sentiment Analysis
Week 6 (24/05 - 30/05) Introduction to Text Generation using Transformers
Week 7 (31/05 - 06/06) Buffer Week (to account for overspills or dive deeper into theoretical aspects of transformers)
Week 8 (28/06 - 04/07) Get started with Lyrics Generation using LSTM
Week 9 (05/07 - 11/07) Fine-tune GPT-2, T5, etc. for Lyrics Generation & study their performance
Week 10 (12/07 - 18/07) Buffer Week (to account for overspills) + Final Documentation


Checkpoint Number Progress
1 (04/04) Implement Sentiment Analysis using Feedforward Network in PyTorch
2 (23/05) Implement LSTM-based & have a theoretical understanding of Transformers
3 (06/06) Fine-tune BERT-based Sentiment Analysis models & Familiarize with Text Generation Tasks
4 (18/07) Fine-tune GPT / T5-based Lyrics Generation models & complete project documentation