Sequence Models in NLP – RNN & LSTM for Text Processing in Machine Learning
Sequence Models in NLP – RNN & LSTM for Text Processing
Language is sequential. The meaning of a sentence depends not only on individual words but also on the order in which they appear. Traditional feedforward neural networks cannot naturally capture this sequential dependency. This is where sequence models such as Recurrent Neural Networks (RNNs) and Long Short-Term Memory networks (LSTMs) became foundational in NLP.
1. Why Sequence Modeling Is Necessary
Consider the sentences:
"The movie was not good." "The movie was good."
The word "not" completely changes the sentiment. A model must remember previous words while reading the sentence. Sequence models enable this memory mechanism.
2. Introduction to Recurrent Neural Networks (RNNs)
An RNN processes input sequentially, maintaining a hidden state that carries information from previous time steps.
At time step t:
h_t = f(Wx_t + Uh_{t-1})
Where:
- x_t = current input
- h_{t-1} = previous hidden state
- W and U = weight matrices
This recurrence allows memory across time.
3. RNN Architecture Unfolded
When unfolded across time:
x1 → h1 → y1 x2 → h2 → y2 x3 → h3 → y3
The same weights are shared across time steps.
4. Applications of Basic RNNs
- Sentiment analysis
- Language modeling
- Part-of-speech tagging
- Text generation
5. The Vanishing Gradient Problem
During backpropagation through time (BPTT), gradients are multiplied repeatedly across time steps.
If gradients are small:
- They shrink exponentially
- Early time steps learn very slowly
This makes basic RNNs struggle with long-term dependencies.
6. Long Short-Term Memory (LSTM)
LSTM networks were introduced to solve long-term dependency issues.
They use gated mechanisms to control information flow.
7. LSTM Architecture Components
An LSTM cell contains:- Forget Gate
- Input Gate
- Cell State
- Output Gate
8. Forget Gate
f_t = σ(W_f [h_{t-1}, x_t] + b_f)
Determines which information to discard.
9. Input Gate
i_t = σ(W_i [h_{t-1}, x_t] + b_i)
Decides which new information to store.
10. Cell State Update
C_t = f_t * C_{t-1} + i_t * C̃_t
This enables long-term memory retention.
11. Output Gate
o_t = σ(W_o [h_{t-1}, x_t] + b_o)
h_t = o_t * tanh(C_t)
Controls what information is exposed.
12. Why LSTMs Work Better
- Preserve long-term dependencies
- Reduce vanishing gradient impact
- More stable training
13. Practical NLP Example – Sentiment Analysis
Workflow:- Tokenize text
- Convert words to embeddings
- Pass embeddings into LSTM layer
- Final dense layer for classification
LSTM learns contextual relationships across the sentence.
14. Bidirectional LSTMs
Process sequence in both forward and backward directions.
Improves contextual understanding.
15. Limitations of RNN & LSTM
- Sequential computation (slow training)
- Hard to parallelize
- Struggles with extremely long sequences
These limitations motivated attention mechanisms and transformers.
16. Enterprise Applications
- Chatbots
- Speech recognition
- Email classification
- Text summarization (early systems)
17. Comparison – RNN vs LSTM
- RNN → Simple but limited memory
- LSTM → Gated memory control
- LSTM → Better for long sequences
18. Final Summary
Sequence models such as RNNs and LSTMs were foundational in enabling deep learning for NLP. By maintaining hidden states across time, they capture contextual relationships in language. While basic RNNs struggle with long-term dependencies, LSTMs introduced gating mechanisms that significantly improved performance. Although transformers now dominate modern NLP, understanding sequence models remains essential for grasping the evolution of language AI systems.

