In this practical, we will be looking at the mechanics behind perceptrons and stacking. We will start by building a simple perceptron model and then move on to stacking multiple models together to improve performance.

Perceptrons

Let’s begin by training a perceptron model on the weather_classification_data that we met in Practical 2.

# Load the data
weather_full <- read.csv("https://www.maths.dur.ac.uk/users/john.p.gosling/MATH3431_practicals/weather_classification_data.csv")

# Display the first few rows
head(weather_full)

# Select the features of interest
weather <- weather_full[,c(1:6)]

# Pick 1000 random rows
set.seed(1312)
weather <- weather[sample(1:nrow(weather), 1000),]

# Convert Cloud.Cover to a binary variable (clear vs not)
weather$Cloud.Cover <- ifelse(weather$Cloud.Cover == "clear", 1, -1)

# Add in a variable for a constant term
weather <- cbind(weather,1)

# Summarise the data
summary(weather)

We will also split the data into a training and testing set (70/30).

# Set the seed
set.seed(141)

# Split the data
train_indices <- ????
train_data <- ????
test_data <- ????

Task 1.1 - Build your own perceptron

Build a perceptron model that predicts the class variable using the other variables as predictors. You should use the base R strategy given in the notes.

# Initialise the weights to zero
weights <- ????

# Set the learning rate
alpha <- 0.1

# Set the maximum number of iterations
max_iter <- 30

# Repeat the following steps until the maximum number of 
# iterations is reached
for (i in 1:max_iter) {
  # For each input in the training data
  for (j in 1:nrow(train_data)) {
    # Compute the predicted class label
    predicted <- ????
      
    # Update the weights based on the classification error
    weights <- ????
  }
}

Visualise the weights.

weights
barplot(as.numeric(weights), names.arg = colnames(train_data)[-5])

Make predictions on the test data and evaluate the model’s performance using accuracy.

# Make predictions
predictions <- NULL
for (j in 1:nrow(test_data)) {
  predictions[j] <- ????
}

# Calculate the accuracy
accuracy <- ????
accuracy

Task 1.2 - Perceptron using standardised data

Create a standardised version of the five explanatory variables and repeat the above steps.

# Standardise the data by subtracting the mean 
# and dividing by the standard deviation
standardised_weather <- ????

# Combine the standardised data with the class variable
# and the constant term
standardised_weather <- ????

# Split the data
set.seed(141)
train_indices <- ????
train_data <- ????
test_data <- ????

# Initialise the weights to zero
weights <- ????

# Set the learning rate
alpha <- 0.1

# Set the maximum number of iterations
max_iter <- 30

# Repeat the following steps until the maximum number of
# iterations is reached
for (i in 1:max_iter) {
  # For each input in the training data
  for (j in 1:nrow(train_data)) {
    # Compute the predicted class label
    predicted <- ????
      
    # Update the weights based on the classification error
    weights <- ????
  }
}

# Make predictions
predictions <- NULL
for (j in 1:nrow(test_data)) {
  predictions[j] <- ????
}

# Calculate the accuracy
std_accuracy <- ????
std_accuracy

Is this transformation necessary? Is it beneficial?

weights
barplot(as.numeric(weights), names.arg = colnames(train_data)[-6])

Task 2 - Changing the parameters

Repeat the above steps for the original data but try different learning rates and maximum iterations.

# Set the seed
set.seed(141)

# Split the data
train_indices <- ????
train_data <- ????
test_data <- ????

# Initialise the weights to zero
weights <- ????

# Set the learning rate
alpha <- ????

# Set the maximum number of iterations
max_iter <- ????

# Fit the perceptron model
????

# Make predictions
predictions <- NULL
for (j in 1:nrow(test_data)) {
  predictions[j] <- ????
}

# Calculate the accuracy
accuracy_try <- ????

Have things improved?

accuracy_try
weights
barplot(as.numeric(weights), names.arg = colnames(train_data)[-6])

To get a handle on what is going on, consider the interplay between alpha and the standardisation being performed on the data.

Stacking

Let’s go back to the Glass dataset that we first met in Practical 1. We will use this dataset to build a stacking model for the RI response variable.

# Load in the data
Glass <- read.csv("https://www.maths.dur.ac.uk/users/john.p.gosling/MATH3431_practicals/Glass.csv")

# Look at the first few rows
head(Glass)

# Let's split the data into a training and testing set (70/30)
set.seed(123)
train_indices <- ????
train_data <- ????
test_data <- ????

Task 3 - Building poor models

Let’s start by building four weak learners.

  • Model 1 A linear regression utilising just Na and Mg as predictors.
  • Model 2 A linear regression utilising just Al as a predictor with no intercept term.
  • Model 3 A 1-NN model utilising just Ca, Ba and Fe.
  • Model 4 A decision tree model utilising all variables but with a maximum depth of 2.

Task 3.1 - Model 1

Build the model and evaluate its performance on the test data (MSE and MAE).

model_1 <- ????

# Make predictions
predictions_1 <- ????
  
# Calculate the MSE and MAE
mse_1 <- ????
mae_1 <- ????

Task 3.2 - Model 2

Build the model and evaluate its performance on the test data (MSE and MAE).

model_2 <- ????

# Make predictions
predictions_2 <- ????

# Calculate the MSE and MAE
mse_2 <- ????
mae_2 <- ????

Task 3.3 - Model 3

Build the model and evaluate its performance on the test data (MSE and MAE).

library(caret)

model_3 <- train(RI ~ Ca + Ba + Fe,
                 method = ????,
                 data = train_data)

# Make predictions
predictions_3 <- ????

# Calculate the MSE and MAE
mse_3 <- ????
mae_3 <- ????

Task 3.4 - Model 4

Build the model and evaluate its performance on the test data (MSE and MAE).

library(rpart)

model_4 <- ????

# Make predictions
predictions_4 <- ????

# Calculate the MSE and MAE
mse_4 <- ????
mae_4 <- ????

Which model is best so far?

Task 4 - Stacking models

Now we will stack the models together to see if we can improve performance.

Task 4.1 - Build the meta-model

Build a decision tree model that takes the predictions from the four weak learners as input. We want the possibility of a more detailed model so set the maximum depth to 5.

# Create a new data frame with the predictions from the training data alongside
# the actual response variable
stacking_data <- ????

# Build the meta-model
stacking_model <- ????

Plot the tree.

library(rpart.plot)

????

What is notable about the tree?

Task 4.2 - Evaluate the meta-model

Make predictions using the meta-model and evaluate its performance on the test data (MSE and MAE).

# Create the test set by combining the predictions from the weak learners
# alongside the actual response variable
stacking_test_data <- ????

# Make predictions
stacking_predictions <- ????

# Calculate the MSE and MAE
stacking_mse <- ????
stacking_mae <- ????

How does the meta-model perform compared to the individual models?

Given these results, what model would you recommend using for this dataset?