Skip to main content

Theoretical Foundations of Spectral Clustering

Spectral Clustering is a powerful technique rooted in graph theory and linear algebra. It is particularly effective in scenarios where clusters are not easily separable by conventional methods like K-Means. In this article, we delve into the mathematical foundations of Spectral Clustering, covering key concepts such as similarity graphs, graph Laplacians, and eigenvalue decomposition.

1. Overview of Spectral Clustering

Spectral Clustering can be viewed as a graph partitioning problem. The main idea is to represent the data points as nodes in a graph, with edges connecting pairs of nodes that are similar. The problem of clustering then becomes one of partitioning this graph into subgraphs (clusters) such that the connections (similarities) between points within the same cluster are strong, while those between different clusters are weak.

2. Constructing the Similarity Graph

The first step in Spectral Clustering is constructing a similarity graph, which encodes the relationships between data points.

2.1 Similarity Matrix

Given a dataset {x1,x2,,xn}\{x_1, x_2, \dots, x_n\}, the similarity between two points xix_i and xjx_j is often represented by a similarity function s(xi,xj)s(x_i, x_j). This leads to the similarity matrix SS, where each entry Sij=s(xi,xj)S_{ij} = s(x_i, x_j).

Common Similarity Functions

  • Gaussian (RBF) Kernel:

    s(xi,xj)=exp(xixj22σ2)s(x_i, x_j) = \exp\left(-\frac{\|x_i - x_j\|^2}{2\sigma^2}\right)

    where σ\sigma is a scale parameter controlling the width of the Gaussian kernel.

  • k-Nearest Neighbors (k-NN): Here, s(xi,xj)s(x_i, x_j) is set to 1 if xjx_j is among the k-nearest neighbors of xix_i, and 0 otherwise.

2.2 Types of Graphs

  • Fully Connected Graph: Every pair of points is connected, with edge weights given by the similarity function. This approach is computationally expensive for large datasets.

  • k-Nearest Neighbors Graph: Each point is connected to its k-nearest neighbors. This graph is sparse and more scalable for larger datasets.

  • ϵ\epsilon-Neighborhood Graph: Points are connected if their distance is less than some threshold ϵ\epsilon. This method is sensitive to the choice of ϵ\epsilon.

3. Graph Laplacian

Once the similarity graph is constructed, the next step is to compute the graph Laplacian, a matrix that plays a central role in the clustering process.

3.1 Degree Matrix

The degree matrix DD is a diagonal matrix where each diagonal element DiiD_{ii} represents the sum of the similarities connected to node ii:

Dii=jSijD_{ii} = \sum_{j} S_{ij}

3.2 Graph Laplacian

There are several forms of the graph Laplacian, but the most common ones are:

  • Unnormalized Laplacian:

    L=DSL = D - S
  • Normalized Laplacian:

    Lsym=ID1/2SD1/2L_{\text{sym}} = I - D^{-1/2} S D^{-1/2}

    or

    Lrw=ID1SL_{\text{rw}} = I - D^{-1} S

Where II is the identity matrix. The normalized Laplacians are particularly useful because they help to address issues with varying densities in the data.

3.3 Properties of the Laplacian

  • Eigenvalues: The eigenvalues of the graph Laplacian provide important information about the structure of the graph. The smallest eigenvalue λ1\lambda_1 of the unnormalized Laplacian is always 0, corresponding to the eigenvector 1\mathbf{1} (a vector of all ones). The second smallest eigenvalue λ2\lambda_2 is known as the algebraic connectivity and reflects how well connected the graph is.

  • Eigenvectors: The eigenvectors corresponding to the smallest non-zero eigenvalues are used to cluster the data. These eigenvectors provide a low-dimensional embedding of the data, where the clustering becomes more apparent.

4. Spectral Clustering Algorithm

4.1 Step-by-Step Procedure

  1. Construct the Similarity Graph: Compute the similarity matrix SS and construct the graph.

  2. Compute the Laplacian: Calculate the graph Laplacian LL, using either the unnormalized or normalized form.

  3. Eigenvalue Decomposition: Solve the eigenvalue problem for the Laplacian to obtain the eigenvalues and eigenvectors:

    Lv=λvL \mathbf{v} = \lambda \mathbf{v}
  4. Select k Smallest Eigenvectors: Select the eigenvectors corresponding to the k smallest eigenvalues (excluding the zero eigenvalue) to form a matrix VV.

  5. Cluster the Rows of V: Treat the rows of VV as data points in a lower-dimensional space and apply a standard clustering algorithm like K-Means to these points to obtain the final clusters.

4.2 Mathematical Example

Consider a simple dataset with four points:

{x1=(0,0),x2=(1,0),x3=(3,0),x4=(4,0)}\{x_1 = (0, 0), x_2 = (1, 0), x_3 = (3, 0), x_4 = (4, 0)\}

Using a Gaussian kernel with σ=1\sigma = 1, the similarity matrix SS is:

S=(10.60650.01110.00030.606510.60650.01110.01110.606510.60650.00030.01110.60651)S = \begin{pmatrix} 1 & 0.6065 & 0.0111 & 0.0003 \\ 0.6065 & 1 & 0.6065 & 0.0111 \\ 0.0111 & 0.6065 & 1 & 0.6065 \\ 0.0003 & 0.0111 & 0.6065 & 1 \end{pmatrix}

The degree matrix DD is:

D=(1.617900002.224100002.224100001.6179)D = \begin{pmatrix} 1.6179 & 0 & 0 & 0 \\ 0 & 2.2241 & 0 & 0 \\ 0 & 0 & 2.2241 & 0 \\ 0 & 0 & 0 & 1.6179 \end{pmatrix}

The unnormalized Laplacian LL is then:

L=DS=(0.61790.60650.01110.00030.60651.22410.60650.01110.01110.60651.22410.60650.00030.01110.60650.6179)L = D - S = \begin{pmatrix} 0.6179 & -0.6065 & -0.0111 & -0.0003 \\ -0.6065 & 1.2241 & -0.6065 & -0.0111 \\ -0.0111 & -0.6065 & 1.2241 & -0.6065 \\ -0.0003 & -0.0111 & -0.6065 & 0.6179 \end{pmatrix}

Solving the eigenvalue problem for LL, we find the eigenvalues λ\lambda and eigenvectors v\mathbf{v}. The smallest non-zero eigenvalues provide the clustering information.

4.3 Geometric Interpretation

The eigenvectors corresponding to the smallest eigenvalues of the Laplacian matrix give us a low-dimensional embedding of the data, where the clusters are more apparent. This is because these eigenvectors minimize the graph's cut, meaning they provide a partitioning of the graph that respects the connectivity (similarity) structure of the data.

5. Conclusion

Spectral Clustering is a mathematically rich method that leverages the power of graph theory and linear algebra to uncover complex cluster structures in data. By transforming the clustering problem into an eigenvalue problem, Spectral Clustering can identify clusters that are not easily detected by traditional methods. Understanding the theory behind this method allows practitioners to apply it effectively to a wide range of datasets, especially those with intricate structures.