Theory Behind Affinity Propagation

Affinity Propagation is a unique clustering algorithm that automatically identifies representative data points, known as exemplars, which serve as the center of clusters. Unlike traditional clustering methods, Affinity Propagation does not require the number of clusters to be specified in advance. Instead, it determines the optimal number of clusters based on the data's internal structure.

1. Key Concepts in Affinity Propagation

1.1 Similarity

The Affinity Propagation algorithm starts by defining a similarity measure between pairs of data points. The similarity $s(i, k)$ represents how well-suited data point $k$ is to be the exemplar for data point $i$ . The similarity can be calculated using various distance metrics, such as Euclidean distance or cosine similarity. In the case of Euclidean distance, the similarity is typically defined as:

s(i, k) = -\|x_i - x_k\|^2

Here, $x_i$ and $x_k$ are the feature vectors of data points $i$ and $k$ , respectively, and $\|x_i - x_k\|$ denotes the Euclidean distance between them. The negative sign ensures that points closer to each other have higher similarity.

1.2 Responsibility and Availability

Affinity Propagation relies on two types of messages exchanged between data points: responsibility and availability. These messages are iteratively updated to determine which data points should be exemplars.

Responsibility $r(i, k)$ : The responsibility message reflects how well-suited point $k$ is to be the exemplar for point $i$ , considering other potential exemplars for $i$ . Mathematically, it is defined as:
$r(i, k) = s(i, k) - \max_{k' \neq k} \{a(i, k') + s(i, k')\}$
This equation implies that the responsibility $r(i, k)$ is the similarity $s(i, k)$ minus the maximum of the sum of availability and similarity for any other candidate exemplar $k'$ . This helps to focus on the best candidate exemplar.
Availability $a(i, k)$ : The availability message reflects how appropriate it would be for point $i$ to choose point $k$ as its exemplar, considering how many other points want to select $k$ as their exemplar. It is computed as:
$a(i, k) = \min \left(0, r(k, k) + \sum_{i' \notin \{i, k\}} \max(0, r(i', k))\right)$
Here, $r(k, k)$ is the self-responsibility, indicating the extent to which $k$ is suitable as its own exemplar. The sum captures how many other points consider $k$ to be their exemplar.

1.3 Message Passing

The key to Affinity Propagation lies in iteratively updating these responsibility and availability messages until convergence. Initially, all availability values are set to zero:

a(i, k) = 0 \quad \text{for all} \quad i, k

The responsibility and availability messages are then updated iteratively according to the following rules:

Update Responsibility:
$r(i, k) \leftarrow s(i, k) - \max_{k' \neq k} \{a(i, k') + s(i, k')\}$
Update Availability:
$a(i, k) \leftarrow \min \left(0, r(k, k) + \sum_{i' \notin \{i, k\}} \max(0, r(i', k))\right)$
Self-Responsibility:

For each point $k$ to consider itself as an exemplar:
$a(k, k) \leftarrow \sum_{i' \neq k} \max(0, r(i', k))$

These updates continue until the changes in responsibility and availability are below a certain threshold, indicating convergence.

1.4 Selection of Exemplars

After the message-passing process converges, the exemplars are selected. For each data point $i$ , the exemplar $k$ is chosen based on the maximum of the sum of responsibility and availability:

k = \arg\max_{k'} \{r(i, k') + a(i, k')\}

This means that the exemplar for point $i$ is the point $k$ that maximizes the sum of responsibility and availability.

2. Preference Parameter

The preference parameter $p$ plays a crucial role in Affinity Propagation. It determines the likelihood of each point being chosen as an exemplar. Typically, $p$ is set to the median or minimum of the input similarities to influence the number of clusters. A higher preference value increases the number of clusters, while a lower value reduces it.

3. Convergence and Stability

Affinity Propagation can be sensitive to initialization and parameter choices, particularly the preference parameter. Convergence is not always guaranteed, and the algorithm may oscillate between configurations. To mitigate this, damping factors are often introduced in the update equations to stabilize the iterations.

3.1 Damping Factor

The damping factor $\lambda$ is introduced to control the update of messages:

r_{\text{new}}(i, k) \leftarrow \lambda \cdot r_{\text{new}}(i, k) + (1 - \lambda) \cdot r_{\text{old}}(i, k)

a_{\text{new}}(i, k) \leftarrow \lambda \cdot a_{\text{new}}(i, k) + (1 - \lambda) \cdot a_{\text{old}}(i, k)

The damping factor $\lambda$ typically ranges from 0.5 to 0.9, helping to prevent oscillations and ensuring that the algorithm converges more reliably.

4. Advantages and Disadvantages

4.1 Advantages

Automatic Cluster Detection: Affinity Propagation does not require the number of clusters to be specified beforehand.
Flexibility: It can find clusters of various shapes and sizes, making it useful for diverse data types.
Exemplar-Based: The algorithm identifies actual data points as exemplars, providing a clear representative for each cluster.

4.2 Disadvantages

Computationally Intensive: The algorithm can be computationally expensive, particularly for large datasets.
Sensitivity to Parameters: The algorithm's performance can be highly sensitive to the choice of the preference parameter.
Convergence Issues: Without careful tuning, the algorithm may fail to converge, especially in the presence of noise.

Conclusion

Affinity Propagation is a powerful clustering algorithm that offers a unique approach by automatically determining the number of clusters and identifying exemplars within the data. Its reliance on message passing and iterative refinement allows it to handle a variety of clustering tasks. However, its computational complexity and sensitivity to parameters require careful consideration when applying it to real-world problems.

1. Key Concepts in Affinity Propagation​

1.1 Similarity​

1.2 Responsibility and Availability​

1.3 Message Passing​

1.4 Selection of Exemplars​

2. Preference Parameter​

3. Convergence and Stability​

3.1 Damping Factor​

4. Advantages and Disadvantages​

4.1 Advantages​

4.2 Disadvantages​

Conclusion​