Decision Trees – Entropy, Gini Index and Tree-Based Learning Explained in Machine Learning
Decision Trees – Entropy, Gini Index and Tree-Based Learning Explained
Decision Trees are one of the most intuitive and interpretable supervised learning algorithms. They mimic human decision-making by splitting data into branches based on feature conditions.
Because of their clarity and flexibility, decision trees are widely used in both classification and regression tasks.
1. What is a Decision Tree?
A decision tree splits data recursively into smaller subsets based on feature values. Each split aims to create more homogeneous groups.
Key components:
- Root Node
- Internal Nodes
- Leaf Nodes
- Branches
2. How Decision Trees Make Decisions
If Feature A > threshold → Go left Else → Go right
The algorithm selects features that provide the best split.
3. Entropy – Measuring Impurity
Entropy measures randomness in the dataset.
Entropy(S) = - Σ p_i log2(p_i)
If entropy is 0 → Pure node If entropy is 1 → Maximum impurity (binary case)
4. Information Gain
Information Gain measures reduction in entropy after a split.
IG = Entropy(parent) - Weighted Entropy(children)
The feature with highest information gain is selected for splitting.
5. Gini Index – Alternative to Entropy
Gini Index measures impurity using:
Gini = 1 - Σ p_i²
Lower Gini → More pure node
Differences:
- Entropy uses logarithm
- Gini is computationally faster
- Gini often used in CART algorithm
6. Regression Trees
For regression tasks, trees minimize variance instead of entropy.
Objective:
Minimize Mean Squared Error within nodes
Leaves output average target value.
7. Decision Boundaries
Decision trees create axis-aligned splits.
- Produces rectangular decision regions
- Handles non-linear relationships
8. Overfitting in Decision Trees
Deep trees memorize training data, causing overfitting.
Indicators:
- High training accuracy
- Low test accuracy
9. Pruning Techniques
Pre-Pruning
- Max depth
- Minimum samples per leaf
- Minimum information gain threshold
Post-Pruning
- Cost complexity pruning
Pruning improves generalization.
10. Advantages of Decision Trees
- Easy to interpret
- No feature scaling required
- Handles categorical data
- Works for classification and regression
11. Limitations
- Prone to overfitting
- Unstable (small data changes affect tree)
- Axis-aligned splits only
12. Enterprise Use Cases
- Credit approval systems
- Fraud detection
- Medical diagnosis
- Risk assessment
- Customer segmentation
Decision trees are often used as base learners in ensemble models.
13. Relationship to Ensemble Methods
Decision trees are building blocks for:
- Random Forest
- Gradient Boosting
- XGBoost
- LightGBM
Most production ML systems rely on tree ensembles.
14. Practical Workflow
1. Clean and preprocess data 2. Select features 3. Train tree 4. Tune depth and hyperparameters 5. Evaluate on validation set 6. Apply pruning if required
15. When to Use Decision Trees
- When interpretability is important
- When feature scaling is undesirable
- When quick baseline model needed
Final Summary
Decision Trees provide a powerful yet interpretable way to model complex relationships in data. By minimizing impurity using entropy or Gini index, they create structured decision rules that mirror human reasoning. While single trees may overfit, they form the backbone of advanced ensemble models widely used in enterprise machine learning systems.

