Question Answering & Chatbots: Reading comprehension QA (extractive)

NLP Natural Language Processing 22 min min read Updated: Feb 27, 2026 Beginner

Question Answering & Chatbots: Reading comprehension QA (extractive) in NLP Natural Language Processing

Beginner Topic 3 of 8

Question Answering & Chatbots: Reading comprehension QA (extractive)

Module: Question Answering & Chatbots. This lesson is written to feel like a mentor sitting next to you: clear, practical, and deep. By the end, you should be able to explain the concept, implement a baseline, and know the mistakes to avoid.

Quick promise: if you read this carefully and do the exercises, you’ll stop “copying NLP code” and start building NLP systems.

What you’re really learning here

Question Answering & Chatbots: Reading comprehension QA (extractive) sounds like a single idea, but it actually touches multiple layers: (1) how language behaves, (2) how we convert language into a usable signal, and (3) how we measure whether the signal is useful. In production, these layers show up as separate components—data, preprocessing, representation, model, and evaluation—even if you prototype them in one notebook.

Key terms and intuition

  • Input text: raw user text, documents, chat messages, transcripts, or logs.
  • Signal: whatever part of language helps your task (meaning, intent, tone, entities, etc.).
  • Representation: numbers that approximate the signal.
  • Model: a function that maps representation to output (class, score, sequence, answer).
  • Evaluation: how you prove the system is good, not just “it looks good”.

Deep dive: how to think like an NLP engineer

When you meet a new NLP problem, ask these questions in order:

  1. What is the output? A class, a score, a span, a sequence, or generated text?
  2. What is the unit of meaning? word-level, sentence-level, document-level, or conversation-level?
  3. What hurts you most? ambiguity, domain shift, low data, class imbalance, latency, or safety?
  4. What baseline wins quickly? Often TF‑IDF + Logistic Regression or a small transformer fine-tune.
  5. What does “good” mean? choose metrics that match the business failure mode.

This mindset is what separates “knows NLP terms” from “can ship NLP”.

Mini walkthrough (Python-style)

This is a compact baseline you can run with any dataset (CSV with text and label). It’s not meant to be perfect; it’s meant to be a reliable starting point.

from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report

X_train, X_test, y_train, y_test = train_test_split(texts, labels, test_size=0.2, random_state=42)

pipe = Pipeline([
  ("tfidf", TfidfVectorizer(ngram_range=(1,2), min_df=2)),
  ("clf", LogisticRegression(max_iter=1000))
])

pipe.fit(X_train, y_train)
pred = pipe.predict(X_test)
print(classification_report(y_test, pred))

Why show this here? Because even advanced NLP work benefits from strong baselines. Baselines reveal data issues, label noise, and metric mistakes early—before you spend time on complex models.

Concept in plain words

Start with the human version of the idea. If you can explain it to a smart friend without equations, you’ll build the right mental model. Then we’ll map that model to the math and the code.

In practice, you will iterate. Your first version will be wrong in some way: it may over-clean, under-clean, overfit, or fail on edge cases. That’s normal. The key is to set up the workflow so you can learn fast: keep a small validation set of hard examples, log errors, and treat failures as design inputs.

If you are building for Indian users (Hinglish, spelling variations, transliteration), you must assume mixed scripts and code-mixed tokens. Your preprocessing and evaluation must reflect that reality; otherwise your model will look great in offline tests and disappoint in real traffic.

Where people go wrong

Most learners don’t fail because the topic is hard. They fail because they apply the wrong technique to the wrong problem. We’ll highlight the common traps and give you rules of thumb that actually hold up in production.

In practice, you will iterate. Your first version will be wrong in some way: it may over-clean, under-clean, overfit, or fail on edge cases. That’s normal. The key is to set up the workflow so you can learn fast: keep a small validation set of hard examples, log errors, and treat failures as design inputs.

If you are building for Indian users (Hinglish, spelling variations, transliteration), you must assume mixed scripts and code-mixed tokens. Your preprocessing and evaluation must reflect that reality; otherwise your model will look great in offline tests and disappoint in real traffic.

A working example

We will use a small but realistic example so you can see the full flow: input → preprocessing → representation → model → evaluation. The goal is not ‘hello world’. The goal is to develop the habit of building end-to-end.

In practice, you will iterate. Your first version will be wrong in some way: it may over-clean, under-clean, overfit, or fail on edge cases. That’s normal. The key is to set up the workflow so you can learn fast: keep a small validation set of hard examples, log errors, and treat failures as design inputs.

If you are building for Indian users (Hinglish, spelling variations, transliteration), you must assume mixed scripts and code-mixed tokens. Your preprocessing and evaluation must reflect that reality; otherwise your model will look great in offline tests and disappoint in real traffic.

Architecture notes

When you ship NLP, you care about latency, costs, monitoring, and data drift. We’ll add a production lens: what to log, how to version artifacts, and how to avoid silent quality regressions.

In practice, you will iterate. Your first version will be wrong in some way: it may over-clean, under-clean, overfit, or fail on edge cases. That’s normal. The key is to set up the workflow so you can learn fast: keep a small validation set of hard examples, log errors, and treat failures as design inputs.

If you are building for Indian users (Hinglish, spelling variations, transliteration), you must assume mixed scripts and code-mixed tokens. Your preprocessing and evaluation must reflect that reality; otherwise your model will look great in offline tests and disappoint in real traffic.

Practice

You’ll get exercises that force you to make choices: which preprocessing, which metric, which baseline, and what trade-offs you accept. This is exactly how interview and real projects work.

In practice, you will iterate. Your first version will be wrong in some way: it may over-clean, under-clean, overfit, or fail on edge cases. That’s normal. The key is to set up the workflow so you can learn fast: keep a small validation set of hard examples, log errors, and treat failures as design inputs.

If you are building for Indian users (Hinglish, spelling variations, transliteration), you must assume mixed scripts and code-mixed tokens. Your preprocessing and evaluation must reflect that reality; otherwise your model will look great in offline tests and disappoint in real traffic.

Interview-style questions (with answers)

  • Q: Why do we start with baselines in NLP?
    A: Baselines expose data leakage, label noise, and metric issues early, and they provide a fair yardstick for complex models.
  • Q: What’s the biggest risk in text preprocessing?
    A: Removing information that carries meaning for the task (e.g., negation, emojis, punctuation that signals tone).
  • Q: How do you debug a weak NLP model?
    A: Slice errors by category, inspect misclassified examples, check class balance, and verify your train/test split and leakage.

Exercises

  1. Pick 50 examples from your dataset and manually label what “signal” matters. Write 5 rules you think the model should learn.
  2. Build a baseline (TF‑IDF + Logistic Regression). Record F1 score and list top 10 false positives and false negatives.
  3. Change one thing: add bigrams OR change min_df OR add char-ngrams. Measure the difference.
  4. Write a short evaluation note: what failed, why it failed, what you would try next.

Recommended next lessons

  • Question Answering & Chatbots: Generative QA with RAG
  • Question Answering & Chatbots: Conversation memory and context windows
  • Question Answering & Chatbots: Tool calling and agent patterns

Summary

You now have a clear, practical understanding of Question Answering & Chatbots: Reading comprehension QA (extractive). The goal was not to memorize definitions, but to build instincts: when to use a technique, what trade-off it implies, and how to validate the result. Keep your pipeline simple, measure properly, and iterate with real examples.

What People Say

Testimonial

Nagmani Solanki

Digital Marketing

Edugators platform is the best place to learn live classes, and live projects by which you can understand easily and have excellent customer service.

Testimonial

Saurabh Arya

Full Stack Developer

It was a very good experience. Edugators and the instructor worked with us through the whole process to ensure we received the best training solution for our needs.

testimonial

Praveen Madhukar

Web Design

I would definitely recommend taking courses from Edugators. The instructors are very knowledgeable, receptive to questions and willing to go out of the way to help you.

Need To Train Your Corporate Team ?

Customized Corporate Training Programs and Developing Skills For Project Success.

Google AdWords Training
React Training
Angular Training
Node.js Training
AWS Training
DevOps Training
Python Training
Hadoop Training
Photoshop Training
CorelDraw Training
.NET Training

Get Newsletter

Subscibe to our newsletter and we will notify you about the newest updates on Edugators