Skip to main content

LightGBM Practical Example with TensorFlow

While LightGBM is typically used as a standalone framework, it can also be integrated with TensorFlow workflows for certain tasks. In this example, we will show how to combine LightGBM and TensorFlow to predict house prices using the California Housing Dataset.

This will be a hybrid approach where we first train a LightGBM model and then integrate it into a TensorFlow workflow. The TensorFlow integration allows us to use features like TensorBoard or embedding the LightGBM model as part of a larger neural network model.


1. Install Dependencies

Before proceeding, ensure that both LightGBM and TensorFlow are installed. You can install them via pip:

pip install lightgbm tensorflow

2. Load and Preprocess the Dataset

We will use the California Housing Dataset from sklearn.datasets. Let's load the dataset and split it into training and test sets.

import numpy as np
import lightgbm as lgb
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load the California Housing dataset
data = fetch_california_housing()
X, y = data.data, data.target

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Normalize the features using StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In this step:

  • We load the dataset and split it into training and test sets.
  • We apply StandardScaler to normalize the features since many machine learning models benefit from standardized inputs.

3. Train the LightGBM Model

Now, let's train a LightGBM model. We'll first train it separately before integrating it with TensorFlow.

# Initialize the LightGBM regressor
model = lgb.LGBMRegressor(
n_estimators=1000, # Number of boosting rounds
learning_rate=0.05, # Learning rate
max_depth=7, # Maximum tree depth to avoid overfitting
random_state=42
)

# Train the model
model.fit(X_train_scaled, y_train, eval_set=[(X_test_scaled, y_test)], early_stopping_rounds=10, verbose=False)

# Evaluate model on test set
y_pred_lgbm = model.predict(X_test_scaled)

print(f"LightGBM model trained. Sample prediction: {y_pred_lgbm[:5]}")

In this step:

  • We train a LightGBM regressor using the normalized training data.
  • We also evaluate the model on the test data and make predictions.

4. Convert the LightGBM Model for TensorFlow

To integrate the trained LightGBM model with TensorFlow, we'll convert it to a TensorFlow compatible format using tf.convert_to_tensor. While there isn't native support for integrating LightGBM directly into TensorFlow pipelines, we can pass LightGBM predictions as input to a TensorFlow model or fine-tune with TensorFlow layers.

import tensorflow as tf

# Convert LightGBM predictions to TensorFlow tensors
y_pred_tensor = tf.convert_to_tensor(y_pred_lgbm, dtype=tf.float32)

# Check the TensorFlow tensor
print(f"TensorFlow tensor from LightGBM predictions: {y_pred_tensor[:5]}")

Here, the predictions from LightGBM are transformed into a TensorFlow tensor, which can then be used in a larger neural network or deep learning workflow.


5. Integrating LightGBM with a Neural Network (Optional)

If you want to combine the LightGBM predictions with a deep learning model, you can embed the predictions as features or an initial layer in a neural network.

# Create a simple TensorFlow neural network model
model_nn = tf.keras.Sequential([
tf.keras.layers.InputLayer(input_shape=(1,)),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(32, activation='relu'),
tf.keras.layers.Dense(1) # Output layer for regression
])

# Compile the model
model_nn.compile(optimizer='adam', loss='mean_squared_error')

# Train the neural network using the LightGBM predictions as input
model_nn.fit(y_pred_tensor, y_test, epochs=10, batch_size=32, validation_split=0.2)

# Evaluate the combined model
combined_predictions = model_nn.predict(y_pred_tensor[:5])
print(f"Combined Model Predictions: {combined_predictions}")

Explanation:

  • The neural network takes LightGBM predictions as input and fine-tunes them using additional layers.
  • We use a simple feed-forward network with ReLU activation functions to predict house prices based on the LightGBM predictions.
  • The final output is fine-tuned using backpropagation.

6. Model Evaluation

You can evaluate the model’s performance using Mean Absolute Error (MAE) and R-Squared (R2R^2), just like in the scikit-learn example.

from sklearn.metrics import mean_absolute_error, r2_score

# Calculate metrics for LightGBM predictions
mae = mean_absolute_error(y_test, y_pred_lgbm)
r2 = r2_score(y_test, y_pred_lgbm)

print(f"LightGBM + TensorFlow Integration Model - MAE: {mae:.2f}, R-Squared: {r2:.2f}")

Interpretation:

  • MAE: Measures the average magnitude of errors in a set of predictions, without considering their direction. Lower values indicate a better fit.
  • R-Squared (R2R^2): Measures the proportion of variance explained by the model. An R2R^2 value closer to 1 indicates a better fit.

Summary

In this example, we demonstrated how to:

  1. Load and preprocess the data using scikit-learn.
  2. Train a LightGBM model for predicting house prices.
  3. Convert LightGBM predictions to TensorFlow tensors for integration into a larger deep learning workflow.
  4. Combine LightGBM with a simple neural network using TensorFlow's Sequential API.
  5. Evaluate the integrated model using performance metrics such as MAE and R-Squared.

Although LightGBM is typically used separately, it can be integrated with TensorFlow for more complex workflows, such as combining gradient boosting with deep learning models.

Next, we’ll look at how to implement LightGBM using PyTorch.