Support Vector Machines with PyTorch
In this article, we will walk through a practical example of implementing Support Vector Machines (SVM) using PyTorch. We will build the model from scratch, define the hinge loss function, train the model using gradient descent, and evaluate its performance on a classification task.
Steps Covered:
- Loading and preparing the dataset.
- Building a custom SVM model in PyTorch.
- Training the model with gradient descent.
- Evaluating the model’s performance.
- Making predictions on new data.
1. Load and Prepare the Dataset
We will use the Iris dataset for classification and focus on a binary classification problem (Setosa and Versicolor classes).
# Import necessary libraries
import torch
import torch.nn as nn
import torch.optim as optim
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# Load the Iris dataset
iris = datasets.load_iris()
data = pd.DataFrame(data=iris.data, columns=iris.feature_names)
data['target'] = iris.target
# Select only two classes for binary classification (Setosa and Versicolor)
data = data[data['target'] != 2]
# Split the data into features (X) and target (y)
X = data.drop('target', axis=1).values
y = data['target'].values
# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Standardize the features (SVM is sensitive to feature scaling)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Convert the data to PyTorch tensors
X_train_tensor = torch.tensor(X_train_scaled, dtype=torch.float32)
X_test_tensor = torch.tensor(X_test_scaled, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32).unsqueeze(1) # Add extra dimension for binary classification
y_test_tensor = torch.tensor(y_test, dtype=torch.float32).unsqueeze(1)
Explanation:
- We loaded the Iris dataset and reduced it to a binary classification problem.
- The data was split into training and testing sets, standardized, and converted into PyTorch tensors for further processing.
2. Build the Custom SVM Model in PyTorch
Now, we will build a simple SVM model using PyTorch. The model will consist of a linear layer followed by the decision boundary logic.
# Create the SVM model class
class SVM(nn.Module):
def __init__(self, input_size):
super(SVM, self).__init__()
self.linear = nn.Linear(input_size, 1) # One output for binary classification
def forward(self, x):
return self.linear(x) # Linear combination w^T x + b
# Initialize the model
model = SVM(input_size=X_train_tensor.shape[1])
# Print the model architecture
print(model)
Explanation:
- We defined a custom
SVM
model using PyTorch, which consists of a single linear layer. - The model takes the input features and applies a linear transformation: .
3. Define the Hinge Loss and Optimizer
We will now define the hinge loss function, which is commonly used for SVM, and use Stochastic Gradient Descent (SGD) as the optimizer to minimize the loss.
# Define the hinge loss function
def hinge_loss(y_true, y_pred):
return torch.mean(torch.clamp(1 - y_true * y_pred, min=0))
# Define the optimizer (Stochastic Gradient Descent)
optimizer = optim.SGD(model.parameters(), lr=0.01)
Explanation:
- Hinge Loss: The hinge loss penalizes incorrect classifications and encourages maximizing the margin.
- SGD: We used Stochastic Gradient Descent (SGD) to optimize the model parameters with a learning rate of 0.01.
4. Train the Model
We will now define the training loop to minimize the hinge loss and update the model’s parameters. We also add L2 regularization to avoid overfitting.
# Training loop
epochs = 100
C = 1.0 # Regularization parameter (controls the strength of L2 regularization)
for epoch in range(epochs):
# Set the model to training mode
model.train()
# Zero the gradients
optimizer.zero_grad()
# Forward pass: Compute the model's predictions
y_pred = model(X_train_tensor)
# Compute the hinge loss with L2 regularization
loss = hinge_loss(y_train_tensor, y_pred) + C * torch.norm(model.linear.weight) ** 2
# Backward pass: Compute gradients
loss.backward()
# Update the model parameters
optimizer.step()
if (epoch + 1) % 10 == 0:
print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")
Explanation:
- L2 Regularization: We added a regularization term, , to the hinge loss to control the magnitude of the weights.
- Training Loop: The model is trained for 100 epochs, and after every 10 epochs, the current loss is printed.
5. Evaluate the Model
After training, we can evaluate the model’s performance on the test set by calculating the accuracy of the predictions.
# Set the model to evaluation mode
model.eval()
# Disable gradient calculations for evaluation
with torch.no_grad():
# Make predictions on the test set
y_pred_test = model(X_test_tensor)
# Convert predictions to binary class labels (threshold at 0)
y_pred_labels = torch.where(y_pred_test >= 0, torch.tensor(1.0), torch.tensor(0.0))
# Calculate accuracy
accuracy = torch.mean((y_pred_labels == y_test_tensor).float())
print(f"Test Accuracy: {accuracy.item() * 100:.2f}%")
Explanation:
- We evaluated the model by making predictions on the test set and converting the raw output scores to binary class labels (1 or 0).
- The accuracy is calculated by comparing the predicted labels with the true test labels.
6. Make Predictions on New Data
Finally, we can use the trained model to make predictions on new, unseen data.
# Example of new data for prediction (standardized input)
new_data = [[5.1, 3.5, 1.4, 0.2]] # Example input
new_data_scaled = scaler.transform(new_data) # Scale the new data
# Convert to tensor
new_data_tensor = torch.tensor(new_data_scaled, dtype=torch.float32)
# Make a prediction
with torch.no_grad():
y_new_pred = model(new_data_tensor)
predicted_class = torch.where(y_new_pred >= 0, torch.tensor(1.0), torch.tensor(0.0))
print(f"Predicted Class: {int(predicted_class.item())}")
Explanation:
- We provided a new data point, scaled it using the pre-trained scaler, and used the trained model to predict the class.
- The output is the predicted class (0 or 1).
Summary
In this article, we implemented Support Vector Machines (SVM) from scratch using PyTorch. We:
- Loaded and prepared the Iris dataset for a binary classification problem.
- Built a custom SVM model using PyTorch.
- Defined the hinge loss and added L2 regularization.
- Trained the model using Stochastic Gradient Descent (SGD).
- Evaluated the model’s performance and made predictions on new data.
This example demonstrates how to implement SVM from scratch in PyTorch using custom training loops and how to apply it to classification tasks. Let me know if you'd like to refine any part of the article or move on to the next section!