Skip to main content

Affinity Propagation in Scikit-learn

Affinity Propagation is a powerful clustering algorithm that identifies exemplars among data points and forms clusters based on these exemplars. In this article, we will implement Affinity Propagation using Scikit-learn, a popular Python library for machine learning.


1. Introduction

In this example, we will use Affinity Propagation to cluster a synthetic dataset. The goal is to demonstrate how to apply the algorithm in Scikit-learn, understand the output, and visualize the clustering results.


2. Dataset

For simplicity, we'll use Scikit-learn's make_blobs function to generate a synthetic dataset with three distinct clusters. This allows us to clearly visualize the clustering performance of Affinity Propagation.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs

# Generate synthetic dataset
X, y_true = make_blobs(n_samples=300, centers=3, cluster_std=0.60, random_state=42)

# Plot the dataset
plt.scatter(X[:, 0], X[:, 1], s=50)
plt.title("Synthetic Dataset")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()

3. Implementing Affinity Propagation

3.1 Importing Necessary Libraries

To implement Affinity Propagation, we first need to import the necessary modules from Scikit-learn.

from sklearn.cluster import AffinityPropagation
from sklearn import metrics

3.2 Applying Affinity Propagation

Next, we'll create an instance of the AffinityPropagation class and fit it to our synthetic dataset.

# Initialize Affinity Propagation
aff_prop = AffinityPropagation(random_state=42)

# Fit the model
aff_prop.fit(X)

# Predict the cluster labels
labels = aff_prop.predict(X)

3.3 Understanding the Output

After fitting the model, we can extract the cluster centers (exemplars) and the labels for each data point.

# Retrieve the cluster centers (exemplars)
cluster_centers_indices = aff_prop.cluster_centers_indices_
n_clusters = len(cluster_centers_indices)
exemplars = aff_prop.cluster_centers_

print(f"Number of clusters identified: {n_clusters}")
print("Cluster centers (exemplars):")
print(exemplars)

4. Visualizing the Clusters

We can now visualize the clusters identified by Affinity Propagation, along with their exemplars.

# Plot the clusters with their exemplars
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis', s=50)
plt.scatter(exemplars[:, 0], exemplars[:, 1], c='red', s=200, alpha=0.75, marker='X')
plt.title("Affinity Propagation Clustering")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()

In the visualization:

  • Colored Points: Represent the data points, colored by their assigned cluster.
  • Red Crosses: Represent the exemplars (cluster centers) identified by Affinity Propagation.

5. Evaluating the Clustering Performance

To evaluate the performance of the clustering, we can use metrics like the Adjusted Rand Index (ARI) and the Silhouette Score.

# Adjusted Rand Index
ari = metrics.adjusted_rand_score(y_true, labels)
print(f"Adjusted Rand Index: {ari:.2f}")

# Silhouette Score
silhouette_score = metrics.silhouette_score(X, labels, metric='euclidean')
print(f"Silhouette Score: {silhouette_score:.2f}")

6. Conclusion

In this article, we implemented Affinity Propagation using Scikit-learn to cluster a synthetic dataset. We visualized the clusters and evaluated the performance using standard clustering metrics. Affinity Propagation's ability to automatically identify the number of clusters and select exemplars makes it a versatile tool for various clustering tasks.

In the next articles, we will explore implementations of Affinity Propagation in TensorFlow and PyTorch, providing you with multiple approaches to apply this algorithm in different machine learning environments.