Affinity Propagation in PyTorch
Affinity Propagation is a unique clustering algorithm that identifies exemplars among the data points and forms clusters by assigning each data point to its nearest exemplar. In this article, we will implement Affinity Propagation using PyTorch, showcasing how flexible this deep learning library can be for tasks beyond neural networks.
1. Introduction
Affinity Propagation clusters data by sending messages between data points until convergence. Unlike algorithms like K-Means, Affinity Propagation does not require specifying the number of clusters in advance, making it particularly useful when the number of clusters is unknown.
2. Dataset
We'll use a synthetic dataset generated by Scikit-learn's make_blobs
function. This dataset contains three distinct clusters, which will allow us to visualize the performance of the Affinity Propagation algorithm.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
# Generate synthetic dataset
X, y_true = make_blobs(n_samples=300, centers=3, cluster_std=0.60, random_state=42)
# Plot the dataset
plt.scatter(X[:, 0], X[:, 1], s=50)
plt.title("Synthetic Dataset")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()
3. Implementing Affinity Propagation in PyTorch
3.1 PyTorch Setup
We'll begin by setting up our PyTorch environment and defining the necessary tensors and parameters.
import torch
# Convert the dataset to PyTorch tensors
X_tensor = torch.tensor(X, dtype=torch.float32)
# Constants
DAMPING = 0.9 # Damping factor to avoid oscillations
MAX_ITERATIONS = 200 # Maximum number of iterations
# Data shape
n_samples = X_tensor.size(0)
# Initialize similarity, responsibility, and availability tensors
S = torch.zeros((n_samples, n_samples))
R = torch.zeros((n_samples, n_samples))
A = torch.zeros((n_samples, n_samples))
3.2 Similarity Matrix
The similarity matrix is crucial in Affinity Propagation. It defines the similarity between pairs of points.
# Compute the similarity matrix
for i in range(n_samples):
for j in range(n_samples):
if i != j:
S[i, j] = -torch.norm(X_tensor[i] - X_tensor[j])
3.3 Responsibility Update
The responsibility update step assigns data points to exemplars.
def update_responsibility(S, A):
for i in range(n_samples):
for j in range(n_samples):
max_val = torch.max(S[i, :] + A[i, :])
R[i, j] = S[i, j] - max_val
return DAMPING * R + (1 - DAMPING) * R
3.4 Availability Update
The availability update step propagates the responsibility across the network.
def update_availability(R):
for i in range(n_samples):
for j in range(n_samples):
sum_val = torch.sum(torch.max(torch.tensor(0.0), R[:, j])) - max(torch.tensor(0.0), R[j, j])
A[i, j] = min(0, sum_val)
return DAMPING * A + (1 - DAMPING) * A
3.5 Iterative Updates
We now define the iterative process for updating the responsibility and availability matrices.
def affinity_propagation(S):
R = torch.zeros_like(S)
A = torch.zeros_like(S)
for i in range(MAX_ITERATIONS):
R = update_responsibility(S, A)
A = update_availability(R)
return R + A
# Run Affinity Propagation
R_A = affinity_propagation(S)
labels = torch.argmax(R_A, dim=1)
3.6 Running the PyTorch Session
Finally, we run the algorithm and visualize the results.
# Convert labels to numpy for visualization
cluster_labels = labels.numpy()
# Plotting the results
plt.scatter(X[:, 0], X[:, 1], c=cluster_labels, cmap='viridis')
plt.title("Affinity Propagation Clustering (PyTorch)")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()
4. Conclusion
In this article, we implemented Affinity Propagation using PyTorch, demonstrating how this powerful library can be applied to a variety of machine learning tasks beyond deep learning. Affinity Propagation is particularly useful when the number of clusters is unknown and can provide insights into the structure of the data that other clustering methods might miss.
Next, we will explore common mistakes and best practices when using Affinity Propagation in your machine learning projects.