Theoretical Foundations of Spectral Clustering

Spectral Clustering is a powerful technique rooted in graph theory and linear algebra. It is particularly effective in scenarios where clusters are not easily separable by conventional methods like K-Means. In this article, we delve into the mathematical foundations of Spectral Clustering, covering key concepts such as similarity graphs, graph Laplacians, and eigenvalue decomposition.

1. Overview of Spectral Clustering

Spectral Clustering can be viewed as a graph partitioning problem. The main idea is to represent the data points as nodes in a graph, with edges connecting pairs of nodes that are similar. The problem of clustering then becomes one of partitioning this graph into subgraphs (clusters) such that the connections (similarities) between points within the same cluster are strong, while those between different clusters are weak.

2. Constructing the Similarity Graph

The first step in Spectral Clustering is constructing a similarity graph, which encodes the relationships between data points.

2.1 Similarity Matrix

Given a dataset $\{x_1, x_2, \dots, x_n\}$ , the similarity between two points $x_i$ and $x_j$ is often represented by a similarity function $s(x_i, x_j)$ . This leads to the similarity matrix $S$ , where each entry $S_{ij} = s(x_i, x_j)$ .

Common Similarity Functions

Gaussian (RBF) Kernel:
$s(x_i, x_j) = \exp\left(-\frac{\|x_i - x_j\|^2}{2\sigma^2}\right)$
where $\sigma$ is a scale parameter controlling the width of the Gaussian kernel.
k-Nearest Neighbors (k-NN): Here, $s(x_i, x_j)$ is set to 1 if $x_j$ is among the k-nearest neighbors of $x_i$ , and 0 otherwise.

2.2 Types of Graphs

Fully Connected Graph: Every pair of points is connected, with edge weights given by the similarity function. This approach is computationally expensive for large datasets.
k-Nearest Neighbors Graph: Each point is connected to its k-nearest neighbors. This graph is sparse and more scalable for larger datasets.
$\epsilon$ -Neighborhood Graph: Points are connected if their distance is less than some threshold $\epsilon$ . This method is sensitive to the choice of $\epsilon$ .

3. Graph Laplacian

Once the similarity graph is constructed, the next step is to compute the graph Laplacian, a matrix that plays a central role in the clustering process.

3.1 Degree Matrix

The degree matrix $D$ is a diagonal matrix where each diagonal element $D_{ii}$ represents the sum of the similarities connected to node $i$ :

D_{ii} = \sum_{j} S_{ij}

3.2 Graph Laplacian

There are several forms of the graph Laplacian, but the most common ones are:

Unnormalized Laplacian:
$L = D - S$
Normalized Laplacian:
$L_{\text{sym}} = I - D^{-1/2} S D^{-1/2}$
or
$L_{\text{rw}} = I - D^{-1} S$

Where $I$ is the identity matrix. The normalized Laplacians are particularly useful because they help to address issues with varying densities in the data.

3.3 Properties of the Laplacian

Eigenvalues: The eigenvalues of the graph Laplacian provide important information about the structure of the graph. The smallest eigenvalue $\lambda_1$ of the unnormalized Laplacian is always 0, corresponding to the eigenvector $\mathbf{1}$ (a vector of all ones). The second smallest eigenvalue $\lambda_2$ is known as the algebraic connectivity and reflects how well connected the graph is.
Eigenvectors: The eigenvectors corresponding to the smallest non-zero eigenvalues are used to cluster the data. These eigenvectors provide a low-dimensional embedding of the data, where the clustering becomes more apparent.

4. Spectral Clustering Algorithm

4.1 Step-by-Step Procedure

Construct the Similarity Graph: Compute the similarity matrix $S$ and construct the graph.
Compute the Laplacian: Calculate the graph Laplacian $L$ , using either the unnormalized or normalized form.
Eigenvalue Decomposition: Solve the eigenvalue problem for the Laplacian to obtain the eigenvalues and eigenvectors:
$L \mathbf{v} = \lambda \mathbf{v}$
Select k Smallest Eigenvectors: Select the eigenvectors corresponding to the k smallest eigenvalues (excluding the zero eigenvalue) to form a matrix $V$ .
Cluster the Rows of V: Treat the rows of $V$ as data points in a lower-dimensional space and apply a standard clustering algorithm like K-Means to these points to obtain the final clusters.

4.2 Mathematical Example

Consider a simple dataset with four points:

\{x_1 = (0, 0), x_2 = (1, 0), x_3 = (3, 0), x_4 = (4, 0)\}

Using a Gaussian kernel with $\sigma = 1$ , the similarity matrix $S$ is:

S = \begin{pmatrix} 1 & 0.6065 & 0.0111 & 0.0003 \\ 0.6065 & 1 & 0.6065 & 0.0111 \\ 0.0111 & 0.6065 & 1 & 0.6065 \\ 0.0003 & 0.0111 & 0.6065 & 1 \end{pmatrix}

The degree matrix $D$ is:

D = \begin{pmatrix} 1.6179 & 0 & 0 & 0 \\ 0 & 2.2241 & 0 & 0 \\ 0 & 0 & 2.2241 & 0 \\ 0 & 0 & 0 & 1.6179 \end{pmatrix}

The unnormalized Laplacian $L$ is then:

L = D - S = \begin{pmatrix} 0.6179 & -0.6065 & -0.0111 & -0.0003 \\ -0.6065 & 1.2241 & -0.6065 & -0.0111 \\ -0.0111 & -0.6065 & 1.2241 & -0.6065 \\ -0.0003 & -0.0111 & -0.6065 & 0.6179 \end{pmatrix}

Solving the eigenvalue problem for $L$ , we find the eigenvalues $\lambda$ and eigenvectors $\mathbf{v}$ . The smallest non-zero eigenvalues provide the clustering information.

4.3 Geometric Interpretation

The eigenvectors corresponding to the smallest eigenvalues of the Laplacian matrix give us a low-dimensional embedding of the data, where the clusters are more apparent. This is because these eigenvectors minimize the graph's cut, meaning they provide a partitioning of the graph that respects the connectivity (similarity) structure of the data.

5. Conclusion

Spectral Clustering is a mathematically rich method that leverages the power of graph theory and linear algebra to uncover complex cluster structures in data. By transforming the clustering problem into an eigenvalue problem, Spectral Clustering can identify clusters that are not easily detected by traditional methods. Understanding the theory behind this method allows practitioners to apply it effectively to a wide range of datasets, especially those with intricate structures.

1. Overview of Spectral Clustering​

2. Constructing the Similarity Graph​

2.1 Similarity Matrix​

Common Similarity Functions​

2.2 Types of Graphs​

3. Graph Laplacian​

3.1 Degree Matrix​

3.2 Graph Laplacian​

3.3 Properties of the Laplacian​

4. Spectral Clustering Algorithm​

4.1 Step-by-Step Procedure​

4.2 Mathematical Example​

4.3 Geometric Interpretation​

5. Conclusion​