MLNN model interpretation

JP Gosling

2024-12-03

The black box

Source: xkcd.com/1450

Many models (1)

Many models (2)

Many models (3)

Many models (4)

Many models (5)

A bigger mess (1)

A bigger mess (2)

End of section

Variable importance

Source: xkcd.com/2560

Linear regression (1)

## 
## Call:
## lm(formula = alcohol ~ pH + sulphates + pH:sulphates, data = wine)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.3232 -0.8016 -0.2270  0.6386  4.9341 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  10.44185    0.02597 402.102  < 2e-16 ***
## pH            0.23441    0.02635   8.897  < 2e-16 ***
## sulphates     0.23082    0.03134   7.365 2.82e-13 ***
## pH:sulphates  0.09599    0.02009   4.777 1.95e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.026 on 1595 degrees of freedom
## Multiple R-squared:  0.07422,    Adjusted R-squared:  0.07247 
## F-statistic: 42.62 on 3 and 1595 DF,  p-value: < 2.2e-16

Linear regression (2)

## 
## Call:
## lm(formula = alcohol ~ pH, data = wine)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8085 -0.8378 -0.2253  0.6509  4.9470 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.42298    0.02609 399.522   <2e-16 ***
## pH           0.21914    0.02610   8.397   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.043 on 1597 degrees of freedom
## Multiple R-squared:  0.04228,    Adjusted R-squared:  0.04169 
## F-statistic: 70.51 on 1 and 1597 DF,  p-value: < 2.2e-16

Linear regression (3)

## 
## Call:
## lm(formula = alcohol ~ sulphates, data = wine)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.1006 -0.8593 -0.2535  0.6377  4.3700 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.42298    0.02654 392.707  < 2e-16 ***
## sulphates    0.09974    0.02655   3.757 0.000178 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.061 on 1597 degrees of freedom
## Multiple R-squared:  0.00876,    Adjusted R-squared:  0.008139 
## F-statistic: 14.11 on 1 and 1597 DF,  p-value: 0.0001783

Linear regression (4)

We can see a clear effect of removing variables on the model fit.

Model	Adjusted \(R^2\)	RMSE
Full	0.072	1.025
pH only	0.042	1.043
sulphates only	0.008	1.061

Removing the pH variable from the model has a greater effect on the model fit than removing the sulphates variable.

Permutation importance

Is it fair to compare models with a different number of variables?

We lose two parameters from the model here because we have the interaction term.

See notes

Linear regression (5)

Let’s permute the pH variable and see how it affects the model fit.

wine_perm <- wine
wine_perm$pH <- sample(wine$pH)

Linear regression (6)

Linear regression (7)

## 
## Call:
## lm(formula = alcohol ~ pH + sulphates + pH:sulphates, data = wine_perm)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.1010 -0.8641 -0.2572  0.6422  4.3734 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  10.42279    0.02655 392.592  < 2e-16 ***
## pH           -0.01956    0.02657  -0.736 0.461778    
## sulphates     0.10035    0.02656   3.778 0.000164 ***
## pH:sulphates  0.02189    0.02691   0.813 0.416170    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.062 on 1595 degrees of freedom
## Multiple R-squared:  0.009532,   Adjusted R-squared:  0.007669 
## F-statistic: 5.117 on 3 and 1595 DF,  p-value: 0.001591

Linear regression (8)

## 
## Call:
## lm(formula = alcohol ~ pH + sulphates + pH:sulphates, data = wine_perm)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7969 -0.8466 -0.2272  0.6507  4.9025 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  10.42308    0.02610 399.322   <2e-16 ***
## pH            0.21857    0.02613   8.363   <2e-16 ***
## sulphates     0.01521    0.02616   0.582    0.561    
## pH:sulphates -0.01166    0.03031  -0.385    0.700    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.044 on 1595 degrees of freedom
## Multiple R-squared:  0.04259,    Adjusted R-squared:  0.04079 
## F-statistic: 23.65 on 3 and 1595 DF,  p-value: 5.604e-15

Linear regression (9)

We can again see a clear effect of removing variables on the model fit, but this might be a more honest approach

Model	Adjusted \(R^2\)	RMSE
Full	0.072	1.025
pH permutated	0.008	1.06
sulphates permutated	0.041	1.042

Random forest (1)

One way to think about variable importance in the context of decision trees is to consider the number of times a variable is used to split the data.

# Count the number of times each variable is used to split the data
use_count <- varUsed(rf_model)

Random forest (1)

Random forest (2)

However, it is more influential to do early splits in the tree than later splits. We might therefore count the number of times a variable is used to do the first split.

# Count the number of times each variable is used to 1st split the data
first.split <- sapply(1:rf_model$ntree, function(i) {
  rfTree <- getTree(rf_model,
                    k = i,
                    labelVar = TRUE)[1,3]
  variable.names(wine)[-12][rfTree]
})

Random forest (2)

Random forest (3)

In the randomForest package, the importance function can be used to calculate an importance metric for each variable in the model.

See notes

Random forest (3)

importance(rf_model)

##                      MeanDecreaseGini
## fixed.acidity                77.00334
## volatile.acidity            105.82246
## citric.acid                  75.69601
## residual.sugar               71.87860
## chlorides                    83.52143
## free.sulfur.dioxide          68.86417
## total.sulfur.dioxide        107.32531
## density                      95.55846
## pH                           78.32486
## sulphates                   114.57800
## alcohol                     148.13808

Handwriting data (1)

We had some success with the handwriting data using a k-nearest neighbours model.

## k-Nearest Neighbors 
## 
## 2000 samples
##  784 predictor
##   10 classes: '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 1600, 1601, 1599, 1600, 1600 
## Resampling results across tuning parameters:
## 
##   k  Accuracy   Kappa    
##   1  0.9065198  0.8960155
##   2  0.8910160  0.8787635
##   3  0.8995098  0.8882166
##   4  0.8890135  0.8765324
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was k = 1.

Handwriting data (2)

We can use the DALEX package to interpret the model.

library(DALEX)

# Create an explainer object
explainer <- explain(knn_model, 
                     data = MNIST_train[, -1], 
                     y = as.factor(MNIST_train$label))

## Preparation of a new explainer is initiated
##   -> model label       :  train  (  default  )
##   -> data              :  2000  rows  784  cols 
##   -> data              :  tibble converted into a data.frame 
##   -> target variable   :  2000  values 
##   -> predict function  :  yhat.train  will be used (  default  )
##   -> predicted values  :  No value for predict function target column. (  default  )
##   -> model_info        :  package caret , ver. 6.0.94 , task multiclass (  default  ) 
##   -> predicted values  :  predict function returns multiple columns:  10  (  default  ) 
##   -> residual function :  difference between 1 and probability of true class (  default  )
##   -> residuals         :  numerical, min =  0 , mean =  0 , max =  0  
##   A new explainer has been created!

Handwriting data (3)

# Create a variable importance plot
perm_results <- model_parts(explainer)

Handwriting data (3)

# Create a variable importance plot
perm_results <- model_parts(explainer)

This could take a very long time to run…

End of section

Main effects

Source: xkcd.com/326

2d visualisations

It would be nice to be able to visualise the effect of a variable on the model output.

There are many variants of this: we will concentrate on

partial dependence plots or main effects plots
accumulated local effects or ALE plots

Both of these plot types can be produced using the iml package. See notes

Many models (1)

Many models (2)

Many models (3)

Wine data (1)

We will fit a random forest model to the wine data with the alcohol variable as the response.

# Fit a random forest model
rf_model <- randomForest(alcohol ~ . - quality, 
                         data = wine)

Wine data (2)

What is the effect of the pH variable on the model output?

library(iml)

# Convert into iml form
explainer <- Predictor$new(rf_model,
                           data = wine)

# Compute PDP for the pH variable
pH_pdp <- FeatureEffect$new(explainer, 
                            feature = "pH",
                            method = "pdp")

# Plot the PDP
plot(pH_pdp)

Wine data (2)

Wine data (3)

What is the effect of the pH variable on the model output?

# Compute ALE for the pH variable
pH_ale <- FeatureEffect$new(explainer, 
                            feature = "pH",
                            method = "ale")

# Plot the ALE
plot(pH_ale)

Wine data (3)

Wine data (4)

Wine data (5)

Interactions (1)

Let’s create some data with a strong interaction between two variables.

set.seed(34123)

x1 <- seq(0, 10, length.out = 100)
x2 <- x1 + rnorm(100, 0, 1)

y <- ifelse(x1>x2, x1*x2, x1+x2) + rnorm(100, 0, 0.3)

int_data <- data.frame(x1 = x1,
                       x2 = x2,
                       y = y)

MLNN model interpretation

JP Gosling

2024-12-03

The black box

Many models (1)

Many models (2)

Many models (2)

Many models (2)

Many models (2)

Many models (3)

Many models (4)

Many models (5)

Many models (5)

Many models (5)

A bigger mess (1)

A bigger mess (2)

A bigger mess (2)

A bigger mess (2)

A bigger mess (2)

End of section

Variable importance

Linear regression (1)

Linear regression (2)

Linear regression (3)

Linear regression (4)

Permutation importance

Linear regression (5)

Linear regression (6)

Linear regression (7)

Linear regression (8)

Linear regression (9)

Random forest (1)

Random forest (1)

Random forest (2)

Random forest (2)

Random forest (3)

Random forest (3)

Handwriting data (1)

Handwriting data (2)

Handwriting data (3)

Handwriting data (3)

End of section

Main effects

2d visualisations

Many models (1)

Many models (2)

Many models (3)

Wine data (1)

Wine data (2)

Wine data (2)

Wine data (3)

Wine data (3)

Wine data (4)

Wine data (4)

Wine data (5)

Wine data (5)

Interactions (1)

Interactions (2)

Interactions (3)

Interactions (4)

Interactions (5)

Interactions (6)

End of chapter