In our recent class, our professor introduced several technical concepts, including the K-Medoids clustering technique, and delved into the concept of hierarchical clustering. Here’s a summary of what was covered:
K-Medoids:
K-Medoids is a partitioning clustering algorithm that distinguishes itself by its enhanced robustness, especially in handling outliers. In contrast to K-Means, which uses the mean (average) as the cluster center, K-Medoids selects the actual data point (medoid) within a cluster. The medoid is the data point that minimizes the sum of distances to all other points in the same cluster. This unique approach makes K-Medoids less sensitive to outliers and particularly suitable for clusters with non-Gaussian shapes.
Hierarchical Clustering:
Hierarchical clustering is a clustering method characterized by the construction of a tree-like structure of clusters, establishing a hierarchical relationship between data points. There are two primary approaches to hierarchical clustering: agglomerative (bottom-up) and divisive (top-down). Agglomerative clustering starts with each data point as its own cluster and iteratively merges the closest neighboring clusters, creating a dendrogram as a visual representation. In contrast, divisive clustering begins with all data points in a single cluster and then recursively divides them into smaller clusters. One notable advantage of hierarchical clustering is that it doesn’t require specifying the number of clusters in advance, and it provides a visual representation of the inherent grouping of the data.
Dendrograms:
A dendrogram is a graphical representation in the form of a tree-like diagram, employed to visualize the hierarchical structure of clusters within hierarchical clustering. This visual tool displays the sequence of merges or splits, along with the respective distances at which these actions occur. The height of the vertical lines within the dendrogram signifies the dissimilarity or distance between clusters. By choosing a specific height to cut the dendrogram, you can obtain a desired number of clusters, making dendrograms a valuable aid in cluster selection.
These concepts offer a comprehensive toolbox for exploring and understanding the underlying structures and relationships within datasets, catering to a wide range of data types and shapes.