Skip to main content

Theory Behind Affinity Propagation

Affinity Propagation is a unique clustering algorithm that automatically identifies representative data points, known as exemplars, which serve as the center of clusters. Unlike traditional clustering methods, Affinity Propagation does not require the number of clusters to be specified in advance. Instead, it determines the optimal number of clusters based on the data's internal structure.


1. Key Concepts in Affinity Propagation

1.1 Similarity

The Affinity Propagation algorithm starts by defining a similarity measure between pairs of data points. The similarity s(i,k)s(i, k) represents how well-suited data point kk is to be the exemplar for data point ii. The similarity can be calculated using various distance metrics, such as Euclidean distance or cosine similarity. In the case of Euclidean distance, the similarity is typically defined as:

s(i,k)=xixk2s(i, k) = -\|x_i - x_k\|^2

Here, xix_i and xkx_k are the feature vectors of data points ii and kk, respectively, and xixk\|x_i - x_k\| denotes the Euclidean distance between them. The negative sign ensures that points closer to each other have higher similarity.

1.2 Responsibility and Availability

Affinity Propagation relies on two types of messages exchanged between data points: responsibility and availability. These messages are iteratively updated to determine which data points should be exemplars.

  • Responsibility r(i,k)r(i, k): The responsibility message reflects how well-suited point kk is to be the exemplar for point ii, considering other potential exemplars for ii. Mathematically, it is defined as:

    r(i,k)=s(i,k)maxkk{a(i,k)+s(i,k)}r(i, k) = s(i, k) - \max_{k' \neq k} \{a(i, k') + s(i, k')\}

    This equation implies that the responsibility r(i,k)r(i, k) is the similarity s(i,k)s(i, k) minus the maximum of the sum of availability and similarity for any other candidate exemplar kk'. This helps to focus on the best candidate exemplar.

  • Availability a(i,k)a(i, k): The availability message reflects how appropriate it would be for point ii to choose point kk as its exemplar, considering how many other points want to select kk as their exemplar. It is computed as:

    a(i,k)=min(0,r(k,k)+i{i,k}max(0,r(i,k)))a(i, k) = \min \left(0, r(k, k) + \sum_{i' \notin \{i, k\}} \max(0, r(i', k))\right)

    Here, r(k,k)r(k, k) is the self-responsibility, indicating the extent to which kk is suitable as its own exemplar. The sum captures how many other points consider kk to be their exemplar.

1.3 Message Passing

The key to Affinity Propagation lies in iteratively updating these responsibility and availability messages until convergence. Initially, all availability values are set to zero:

a(i,k)=0for alli,ka(i, k) = 0 \quad \text{for all} \quad i, k

The responsibility and availability messages are then updated iteratively according to the following rules:

  1. Update Responsibility:

    r(i,k)s(i,k)maxkk{a(i,k)+s(i,k)}r(i, k) \leftarrow s(i, k) - \max_{k' \neq k} \{a(i, k') + s(i, k')\}
  2. Update Availability:

    a(i,k)min(0,r(k,k)+i{i,k}max(0,r(i,k)))a(i, k) \leftarrow \min \left(0, r(k, k) + \sum_{i' \notin \{i, k\}} \max(0, r(i', k))\right)
  3. Self-Responsibility:

    For each point kk to consider itself as an exemplar:

    a(k,k)ikmax(0,r(i,k))a(k, k) \leftarrow \sum_{i' \neq k} \max(0, r(i', k))

These updates continue until the changes in responsibility and availability are below a certain threshold, indicating convergence.

1.4 Selection of Exemplars

After the message-passing process converges, the exemplars are selected. For each data point ii, the exemplar kk is chosen based on the maximum of the sum of responsibility and availability:

k=argmaxk{r(i,k)+a(i,k)}k = \arg\max_{k'} \{r(i, k') + a(i, k')\}

This means that the exemplar for point ii is the point kk that maximizes the sum of responsibility and availability.

2. Preference Parameter

The preference parameter pp plays a crucial role in Affinity Propagation. It determines the likelihood of each point being chosen as an exemplar. Typically, pp is set to the median or minimum of the input similarities to influence the number of clusters. A higher preference value increases the number of clusters, while a lower value reduces it.

3. Convergence and Stability

Affinity Propagation can be sensitive to initialization and parameter choices, particularly the preference parameter. Convergence is not always guaranteed, and the algorithm may oscillate between configurations. To mitigate this, damping factors are often introduced in the update equations to stabilize the iterations.

3.1 Damping Factor

The damping factor λ\lambda is introduced to control the update of messages:

rnew(i,k)λrnew(i,k)+(1λ)rold(i,k)r_{\text{new}}(i, k) \leftarrow \lambda \cdot r_{\text{new}}(i, k) + (1 - \lambda) \cdot r_{\text{old}}(i, k) anew(i,k)λanew(i,k)+(1λ)aold(i,k)a_{\text{new}}(i, k) \leftarrow \lambda \cdot a_{\text{new}}(i, k) + (1 - \lambda) \cdot a_{\text{old}}(i, k)

The damping factor λ\lambda typically ranges from 0.5 to 0.9, helping to prevent oscillations and ensuring that the algorithm converges more reliably.

4. Advantages and Disadvantages

4.1 Advantages

  • Automatic Cluster Detection: Affinity Propagation does not require the number of clusters to be specified beforehand.
  • Flexibility: It can find clusters of various shapes and sizes, making it useful for diverse data types.
  • Exemplar-Based: The algorithm identifies actual data points as exemplars, providing a clear representative for each cluster.

4.2 Disadvantages

  • Computationally Intensive: The algorithm can be computationally expensive, particularly for large datasets.
  • Sensitivity to Parameters: The algorithm's performance can be highly sensitive to the choice of the preference parameter.
  • Convergence Issues: Without careful tuning, the algorithm may fail to converge, especially in the presence of noise.

Conclusion

Affinity Propagation is a powerful clustering algorithm that offers a unique approach by automatically determining the number of clusters and identifying exemplars within the data. Its reliance on message passing and iterative refinement allows it to handle a variety of clustering tasks. However, its computational complexity and sensitivity to parameters require careful consideration when applying it to real-world problems.