Affinity Propagation in Scikit-learn
Affinity Propagation is a powerful clustering algorithm that identifies exemplars among data points and forms clusters based on these exemplars. In this article, we will implement Affinity Propagation using Scikit-learn, a popular Python library for machine learning.
1. Introduction
In this example, we will use Affinity Propagation to cluster a synthetic dataset. The goal is to demonstrate how to apply the algorithm in Scikit-learn, understand the output, and visualize the clustering results.
2. Dataset
For simplicity, we'll use Scikit-learn's make_blobs
function to generate a synthetic dataset with three distinct clusters. This allows us to clearly visualize the clustering performance of Affinity Propagation.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
# Generate synthetic dataset
X, y_true = make_blobs(n_samples=300, centers=3, cluster_std=0.60, random_state=42)
# Plot the dataset
plt.scatter(X[:, 0], X[:, 1], s=50)
plt.title("Synthetic Dataset")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()
3. Implementing Affinity Propagation
3.1 Importing Necessary Libraries
To implement Affinity Propagation, we first need to import the necessary modules from Scikit-learn.
from sklearn.cluster import AffinityPropagation
from sklearn import metrics
3.2 Applying Affinity Propagation
Next, we'll create an instance of the AffinityPropagation
class and fit it to our synthetic dataset.
# Initialize Affinity Propagation
aff_prop = AffinityPropagation(random_state=42)
# Fit the model
aff_prop.fit(X)
# Predict the cluster labels
labels = aff_prop.predict(X)
3.3 Understanding the Output
After fitting the model, we can extract the cluster centers (exemplars) and the labels for each data point.
# Retrieve the cluster centers (exemplars)
cluster_centers_indices = aff_prop.cluster_centers_indices_
n_clusters = len(cluster_centers_indices)
exemplars = aff_prop.cluster_centers_
print(f"Number of clusters identified: {n_clusters}")
print("Cluster centers (exemplars):")
print(exemplars)
4. Visualizing the Clusters
We can now visualize the clusters identified by Affinity Propagation, along with their exemplars.
# Plot the clusters with their exemplars
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis', s=50)
plt.scatter(exemplars[:, 0], exemplars[:, 1], c='red', s=200, alpha=0.75, marker='X')
plt.title("Affinity Propagation Clustering")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()
In the visualization:
- Colored Points: Represent the data points, colored by their assigned cluster.
- Red Crosses: Represent the exemplars (cluster centers) identified by Affinity Propagation.
5. Evaluating the Clustering Performance
To evaluate the performance of the clustering, we can use metrics like the Adjusted Rand Index (ARI) and the Silhouette Score.
# Adjusted Rand Index
ari = metrics.adjusted_rand_score(y_true, labels)
print(f"Adjusted Rand Index: {ari:.2f}")
# Silhouette Score
silhouette_score = metrics.silhouette_score(X, labels, metric='euclidean')
print(f"Silhouette Score: {silhouette_score:.2f}")
6. Conclusion
In this article, we implemented Affinity Propagation using Scikit-learn to cluster a synthetic dataset. We visualized the clusters and evaluated the performance using standard clustering metrics. Affinity Propagation's ability to automatically identify the number of clusters and select exemplars makes it a versatile tool for various clustering tasks.
In the next articles, we will explore implementations of Affinity Propagation in TensorFlow and PyTorch, providing you with multiple approaches to apply this algorithm in different machine learning environments.