Hierarchical Clustering – Agglomerative vs Divisive Methods Explained in Machine Learning
Hierarchical Clustering – Agglomerative vs Divisive Methods Explained
Hierarchical clustering is an unsupervised learning method that builds a tree-like structure of clusters. Unlike K-Means, it does not require specifying the number of clusters in advance.
Instead, it creates a hierarchy of nested clusters that can be visualized using a dendrogram.
1. Core Idea of Hierarchical Clustering
Hierarchical clustering creates clusters by either:
- Starting with individual points and merging them (Agglomerative)
- Starting with one cluster and splitting it (Divisive)
The result is a hierarchical tree structure.
2. Agglomerative Clustering (Bottom-Up)
Agglomerative clustering begins with each data point as its own cluster.
Process:
1. Start with N clusters (each point) 2. Compute distance between clusters 3. Merge closest two clusters 4. Repeat until one cluster remains
This is the most commonly used hierarchical approach.
3. Divisive Clustering (Top-Down)
Divisive clustering starts with all points in one cluster.
Process:
1. Start with single cluster 2. Split cluster into two 3. Recursively split sub-clusters
Divisive methods are computationally expensive and less commonly used.
4. Distance Between Clusters (Linkage Criteria)
Key challenge: defining distance between clusters.
Single Linkage
Distance between closest points of two clusters.
Complete Linkage
Distance between farthest points.
Average Linkage
Average distance between all points.
Ward’s Method
Minimizes increase in variance when merging clusters.
5. Dendrogram Visualization
A dendrogram is a tree diagram that shows how clusters are merged or split.
Vertical axis:
- Represents distance between merged clusters
Cutting dendrogram at a certain height determines number of clusters.
6. Choosing Number of Clusters
Unlike K-Means, hierarchical clustering does not require K initially.
Cluster count is determined by cutting dendrogram at chosen threshold.
7. Computational Complexity
Time complexity:
O(n² log n)
Not suitable for extremely large datasets.
8. Advantages of Hierarchical Clustering
- No need to predefine K
- Produces interpretable hierarchy
- Flexible linkage options
9. Limitations
- Computationally expensive
- Sensitive to noise
- Irreversible merging/splitting decisions
10. Comparison with K-Means
- K-Means → Flat clusters
- Hierarchical → Nested clusters
- K-Means requires K upfront
- Hierarchical allows flexible cluster selection
11. Real-World Applications
- Customer segmentation
- Gene sequence analysis
- Document clustering
- Image segmentation
- Market research analysis
Hierarchical clustering is popular in bioinformatics.
12. Distance Metrics Used
- Euclidean distance
- Manhattan distance
- Cosine similarity
Choice depends on domain and data type.
13. When to Use Hierarchical Clustering
- Small to medium datasets
- Need hierarchical structure
- When K is unknown
14. Enterprise Workflow
1. Normalize features 2. Choose distance metric 3. Select linkage method 4. Generate dendrogram 5. Choose cut level 6. Interpret clusters
15. Practical Considerations
- Standardize data before clustering
- Experiment with multiple linkage methods
- Validate clusters using silhouette score
Final Summary
Hierarchical clustering builds a tree of clusters using bottom-up or top-down approaches. With flexible linkage strategies and dendrogram visualization, it provides rich insights into data structure. While computationally heavier than K-Means, it offers interpretability and flexibility that are valuable in enterprise analytics and research domains.

