Simple Examples with Torch
The only constants in life are death, taxes, and the RStudio team continually crushing it. This time, they’ve ported Torch into R. I’m a fairly heavy tensorflow
user, and coming from an R background had a steep learning curve incorporating it into my toolkit. While torch
is simpler in a lot of ways (specifically, not requiring a python environment), these deep learning frameworks can be intimidating. What I hope to do here is demystify torch
workflows a little bit by providing some overly simple use cases. Specifically, show regression and classification (binary and multiclass) models build with torch, and how to extract the correct output in order to feed into your model evaluation/post-modeling process. To be clear, I re-use a lot of the code from the vignettes and examples from the torch package website. They’ve done a great job, I just wanted to put my own spin on it in case it could be helpful to someone in the future.
Setup
I’m going to keep dependencies to a minimum, so I’ll only be using the torch
and palmerpenguins
libraries.
library(torch)
library(palmerpenguins)
For each task I’ll be using the palmerpenguins
dataset. For sake of simplicity, I’ve just removed the cases that have missing values. There isn’t a clear binary target variable, so I create one (flagging if the penguin is of the Adelie species). I also create a train/test split
penguins <- na.omit(penguins)
penguins$is_adelie <- as.numeric(penguins$species=='Adelie')
train_idx <- sample(nrow(penguins), nrow(penguins)*.7)
The final setup step is to create a function that converts the data we want into torch tensors.(side note: this is optional, but recommended way to load data into torch models. When deep learning, you probably can’t have all your data in memory at once, and this process helps batch it up). This code mimics python classes, so it may look a little funky to R users, but just know that the main purpose is to convert data from R datatypes into torch tensors. I’ve left the helpful comments from this tutorial on the torch package site
df_dataset <- dataset(
name = "Penguins",
# the input data to your dataset goes in the initialize function.
# our dataset will take a dataframe and the name of the response
# variable.
initialize = function(df, feature_variables, response_variable) {
# conveniently, the categorical data are already factors
df$species <- as.numeric(df$species)
df$island <- as.numeric(df$island)
self$df <- df[, feature_variables]
self$response_variable <- df[[response_variable]]
},
# the .getitem method takes an index as input and returns the
# corresponding item from the dataset.
# the index could be anything. the dataframe could have many
# rows for each index and the .getitem method would do some
# kind of aggregation before returning the element.
# in our case the index will be a row of the data.frame,
.getitem = function(index) {
response <- torch_tensor(self$response_variable[index], dtype = torch_float())
x <- torch_tensor(as.numeric(self$df[index,]))
# note that the dataloaders will automatically stack tensors
# creating a new dimension
list(x = x, y = response)
},
# It's optional, but helpful to define the .length method returning
# the number of elements in the dataset. This is needed if you want
# to shuffle your dataset.
.length = function() {
length(self$response_variable)
}
)
Regression
The first regression task is to predict a penguins weight using their other measurements, which island they were observed on, and their species.
Will this be a very good model? No. The relationship between these variables isn’t super strong, I’ve done no preprocessing, I’m just going through the process of building the model and getting the correct output. When building in real life, absolutely take all those preprocessing steps. Perhaps in a future post I’ll show how torch
models can integrate into a tidymodels
workflow of some kind.
First, I pass the names of the features and response variables I want through that data conversion function
features <- c('bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'island', 'species')
response <- 'body_mass_g'
penguins_train <- df_dataset(penguins[train_idx,],
feature_variables = features,
response_variable = response)
penguins_test <- df_dataset(penguins[-train_idx,],
feature_variables = features,
response_variable = response)
If I want to look at an example case, here is how you could do that.
penguins_train$.getitem(1)
## $x
## torch_tensor
## 42.3000
## 21.2000
## 191.0000
## 2.0000
## 1.0000
## [ CPUFloatType{5} ]
##
## $y
## torch_tensor
## 4150
## [ CPUFloatType{1} ]
Next, I’ll pass this converted data through a data loader. This will explicitly batch my data, and be what feeds into the model. You can specify your batch size here. For sake of simplicity, I’m just going to set 10
dl_train <- dataloader(penguins_train, batch_size = 10, shuffle = TRUE)
dl_test <- dataloader(penguins_test, batch_size = 10)
What that did was allow me to load 10 cases at a time. It’s helpful to get an idea of what that actually looks like. I’ll iterate through an example here to show.
iter <- dl_train$.iter()
iter$.next()
## [[1]]
## torch_tensor
## 45.8000 18.9000 197.0000 3.0000 1.0000
## 48.2000 14.3000 210.0000 1.0000 3.0000
## 37.2000 19.4000 184.0000 3.0000 1.0000
## 42.6000 13.7000 213.0000 1.0000 3.0000
## 49.0000 19.6000 212.0000 2.0000 2.0000
## 49.6000 15.0000 216.0000 1.0000 3.0000
## 46.6000 17.8000 193.0000 2.0000 2.0000
## 43.2000 18.5000 192.0000 2.0000 1.0000
## 35.5000 17.5000 190.0000 3.0000 1.0000
## 42.4000 17.3000 181.0000 2.0000 2.0000
## [ CPUFloatType{10,5} ]
##
## [[2]]
## torch_tensor
## 4150
## 4600
## 3900
## 4950
## 4300
## 4750
## 3800
## 4100
## 3700
## 3600
## [ CPUFloatType{10,1} ]
We can see here a single batch of 10 cases. That’s what’s going to be fed to the model in order to update the weights. If I kept executing iter$.next()
I’d see the next 10 cases, and so on until I had gone through the entire dataset.
Now, for modeling. The overall structure is a bit different than tensorflow
, but still intuitive in it’s own way.
I’d highly recommend reading Sigrid Keydana’s initial blog post on torch for more info on torch model structure
net <- nn_module(
"PenguinNet",
initialize = function() {
self$fc1 <- nn_linear(length(features), 16)
self$fc2 <- nn_linear(16, 8)
self$fc3 <- nn_linear(8, 1)
},
forward = function(x) {
x %>%
self$fc1() %>%
nnf_relu() %>%
self$fc2() %>%
nnf_relu() %>%
self$fc3()
}
)
So first I specify my layers in the initialize
section, things like layer type, shape, etc. Then I specify the network structure and place those layers within that network in the forward
section. This is combined in tensorflow
, which may be a stumbling block for some.
I’ll specify the optimizer and assign the network to a model
model <- net()
optimizer <- optim_adam(model$parameters)
Now comes a part that will likely look different to R users. We’re used to a nice tidy (no pun intended) fit()
function, or that function being wrapped up in something like lm()
, randomForest()
etc. With the package in it’s infancy (and being a port of PyTorch and borrowing syntax), the fitting is a little more involved. I’m going to set a for loop over the epochs, and explicitly update the model’s weights with each pass. This is what is happening under the hood anyway in the functions mentioned above (perhaps without the batching), so it is useful insight into how models in general, and deep learning models in particular, are built.
for (epoch in 1:10) {
l <- c()
for (b in enumerate(dl_train)) {
optimizer$zero_grad()
output <- model(b[[1]])
loss <- nnf_mse_loss(output,b[[2]])
loss$backward()
optimizer$step()
l <- c(l, loss$item())
}
cat(sprintf("Loss at epoch %d: %3f\n", epoch, mean(l)))
}
## Loss at epoch 1: 17954407.166667
## Loss at epoch 2: 17854906.791667
## Loss at epoch 3: 17636592.708333
## Loss at epoch 4: 17588233.166667
## Loss at epoch 5: 17171742.041667
## Loss at epoch 6: 16631909.208333
## Loss at epoch 7: 16209465.166667
## Loss at epoch 8: 15292539.625000
## Loss at epoch 9: 14541538.583333
## Loss at epoch 10: 13544888.833333
Notice how I specified the loss function within that loop (nnf_mse_loss()
)? Keep an eye on how that changes as we work through the classification models.
So I have my crappy model, now I want to evaluate it on the test set, and pull predictions out so I can make dope visualizations.
First thing to do is to put the model object in evaluation mode, meaning it won’t update the weights anymore and stay as a statis object (i.e. you don’t want your linear regression model changing coefficients as you eval on the test set.) That’s a simple function
model$eval()
For evaluation, I take a similar approach to training, where I loop through the test set, get my loss function, and then aggregate at the end. With a continuous outcome, I’m really only looking at MSE here.
test_losses <- c()
for (b in enumerate(dl_test)) {
output <- model(b[[1]])
loss <- nnf_mse_loss(output, b[[2]])
test_losses <- c(test_losses, loss$item())
}
mean(test_losses)
## [1] 13268394
As I go through the classification examples, I’ll show how to specify different loss functions.
And as with any model, it’s useless without the output to pass into your production system, visualization models, etc. Extracting the output is simple, even though we have to do some workarounds compared to other packages due to the batching. First, I create an empty prediction vector, and as our cases pass through I populate that vector with the subsequent predictions.
preds = c()
for (b in enumerate(dl_test)) {
# get predictions
output <- model(b[[1]])
# convert to vector and append
predicted = output$data() %>% as.array() %>% .[,1]
preds <- c(preds, predicted)
}
head(preds)
## [1] 645.8435 656.9200 674.9702 592.6207 609.0219 629.3991
As we can see here, we now have a nice clean vector we can use to in prediction and visualization systems.
That’s end-to-end for regression, now let’s move onto binary classification.
Binary Classification
Keeping with the penguins dataset, let’s re-use the data loading function from before, and transform the data we want into torch tensors. As there isn’t a natural binary variable in this dataset, the outcome is going to be is_adelie
variable that I created up above.
features <- c('bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g')
response <- 'is_adelie'
penguins_train <- df_dataset(penguins[train_idx,],
feature_variables = features,
response_variable = response)
penguins_test <- df_dataset(penguins[-train_idx,],
feature_variables = features,
response_variable = response)
Now to take a look at a sample case to make sure the data looks correct
penguins_train$.getitem(100)
## $x
## torch_tensor
## 45.5000
## 15.0000
## 220.0000
## 5000.0000
## [ CPUFloatType{4} ]
##
## $y
## torch_tensor
## 0
## [ CPUFloatType{1} ]
Looking good! On to the data loaders…
dl_train <- dataloader(penguins_train, batch_size = 10, shuffle = TRUE)
dl_test <- dataloader(penguins_test, batch_size = 10)
As this is a classification model, our model structure is going to be mostly the same.
net <- nn_module(
"PenguinNet",
initialize = function() {
self$fc1 <- nn_linear(length(features), 16)
self$fc2 <- nn_linear(16, 8)
self$fc3 <- nn_linear(8, 1)
},
forward = function(x) {
x %>%
self$fc1() %>%
nnf_relu() %>%
self$fc2() %>%
nnf_relu() %>%
self$fc3()
}
)
model <- net()
optimizer <- optim_adam(model$parameters)
Some may be wondering why I don’t have a sigmoid activation at the end of the network. Torch is able to handle that through the loss function. As we see below, I use the nnf_binary_cross_entropy_with_logits()
loss function, which handles that transformation. Another way to run this model would be to add the sigmoid activation function and use the nnf_binary_cross_entropy()
function. As is true in all of coding, there are a lot of way to do the same thing.
for (epoch in 1:10) {
l <- c()
for (b in enumerate(dl_train)) {
optimizer$zero_grad()
output <- model(b[[1]])
loss <- nnf_binary_cross_entropy_with_logits(output,b[[2]])
loss$backward()
optimizer$step()
l <- c(l, loss$item())
}
cat(sprintf("Loss at epoch %d: %3f\n", epoch, mean(l)))
}
## Loss at epoch 1: 43.617383
## Loss at epoch 2: 4.836457
## Loss at epoch 3: 1.533206
## Loss at epoch 4: 0.722353
## Loss at epoch 5: 0.947575
## Loss at epoch 6: 0.820873
## Loss at epoch 7: 0.828812
## Loss at epoch 8: 0.794824
## Loss at epoch 9: 0.713368
## Loss at epoch 10: 0.790455
Next, the model goes into evaluation mode and we get the test loss
model$eval()
test_losses <- c()
for (b in enumerate(dl_test)) {
output <- model(b[[1]])
loss <- nnf_binary_cross_entropy_with_logits(output, b[[2]])
test_losses <- c(test_losses, loss$item())
}
mean(test_losses)
## [1] 1.793357
Evaluation with classification models need more response vectors than just one (even though they can all be derived from log-odds). The model itself will return log odds, but we can add another vector that returns a class prediction.
# Placeholder vector for predictions
preds = c()
# Placeholder vector for probabilities
out_log_odds = c()
for (b in enumerate(dl_test)) {
# get log odds
output <- model(b[[1]])
# convert to df and append
log_odds = output$data() %>% as.array() %>% .[,1]
out_log_odds <- c(out_log_odds, log_odds)
# get class prediction from log odds and append
predicted <- as.numeric(log_odds>0)
preds <- c(preds, predicted)
}
head(preds)
## [1] 1 1 1 1 1 1
head(out_log_odds)
## [1] 5.062830 2.667231 2.070307 3.278193 3.250284 3.637262
All that puts out log odds, which you can convert into odds ratios and/or probabilities, as well as class predictions at a 50% cutoff. All that can be fed into even more evaluation, confusion matrices, etc.
Multi-Class Classification
Predicting multiple classes is (unsurprisingly) trickier and has more holdups than either of the two previous examples. In this example, I’ll be predicting the penguin’s species. One thing important to note with multi-class classification is that, contrary to the past two examples, the data type of the outcome variable has to be long, not float. Re-examining the data transformation function from above, we can easily add dtype = torch_long()
when specifying the outcome variable to account for this
features <- c('bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g')
response <- 'species'
df_dataset <- dataset(
name = "Penguins",
initialize = function(df, feature_variables, response_variable) {
df$species <- as.numeric(df$species)
df$island <- as.numeric(df$island)
self$df <- df[, feature_variables]
self$response_variable <- df[[response_variable]]
},
.getitem = function(index) {
response <- torch_tensor(self$response_variable[index], dtype = torch_long())
x <- torch_tensor(as.numeric(self$df[index,]))
list(x = x, y = response)
},
.length = function() {
length(self$response_variable)
}
)
penguins_train <- df_dataset(penguins[train_idx,],
feature_variables = features,
response_variable = response)
penguins_test <- df_dataset(penguins[-train_idx,],
feature_variables = features,
response_variable = response)
Now, for a look at the data to make sure the outcome is properly coded.
penguins_train$.getitem(100)
## $x
## torch_tensor
## 45.5000
## 15.0000
## 220.0000
## 5000.0000
## [ CPUFloatType{4} ]
##
## $y
## torch_tensor
## 3
## [ CPULongType{1} ]
With that looking good, the next step is to prep the dataloaders and specify the model structure. Those familiar with deep learning will recognize that I have three nodes on my last layer, which is equal to the number of classes I’m trying to predict.
dl_train <- dataloader(penguins_train, batch_size = 10, shuffle = TRUE)
dl_test <- dataloader(penguins_test, batch_size = 10)
net <- nn_module(
"PenguinNet",
initialize = function() {
self$fc1 <- nn_linear(length(features), 16)
self$fc2 <- nn_linear(16, 8)
self$fc3 <- nn_linear(8, 3)
},
forward = function(x) {
x %>%
self$fc1() %>%
nnf_relu() %>%
self$fc2() %>%
nnf_relu() %>%
self$fc3()
}
)
As with the binary classification, I don’t have any activations after the last layer, as torch handles those when specifying nnf_cross_entropy()
into the loss function.
Another important thing to note is that torch_squeeze()
has to be applied to the labels, or else this loop will error out. As we’re working with a multi-class problem, there is an issue with shape, as the model is outputting 3 vectors. What torch_squeeze()
is gets those vectors into the right format in order to be run with the batching, in our case creating a 10x3 matrix, as our batch size is 10 here.
model <- net()
optimizer <- optim_adam(model$parameters)
for (epoch in 1:10) {
l <- c()
for (b in enumerate(dl_train)) {
optimizer$zero_grad()
output <- model(b[[1]])
loss <- nnf_cross_entropy(output, torch_squeeze(b[[2]]))
loss$backward()
optimizer$step()
l <- c(l, loss$item())
}
cat(sprintf("Loss at epoch %d: %3f\n", epoch, mean(l)))
}
## Loss at epoch 1: 28.549027
## Loss at epoch 2: 4.123320
## Loss at epoch 3: 2.022033
## Loss at epoch 4: 1.748306
## Loss at epoch 5: 1.788059
## Loss at epoch 6: 1.755952
## Loss at epoch 7: 1.757479
## Loss at epoch 8: 1.783664
## Loss at epoch 9: 1.789628
## Loss at epoch 10: 1.757293
After the model is trained, evaluation is the next step. Again, torch_squeeze()
is necessary to get the output in the right shape. I also pull from Sigrid’s into to torch post to add an accuracy metric as well.
# Put the model into eval mode
model$eval()
test_losses <- c()
total <- 0
correct <- 0
for (b in enumerate(dl_test)) {
output <- model(b[[1]])
labels <- torch_squeeze(b[[2]])
loss <- nnf_cross_entropy(output, labels)
test_losses <- c(test_losses, loss$item())
# torch_max returns a list, with position 1 containing the values
# and position 2 containing the respective indices
predicted <- torch_max(output$data(), dim = 2)[[2]]
total <- total + labels$size(1)
# add number of correct classifications in this batch to the aggregate
correct <- correct + (predicted == labels)$sum()$item()
}
mean(test_losses)
## [1] 2.361512
Moving on to pulling out the correct output, some adjustments have to be made due to the nature of the multi-class model. Rather than pulling a single vector and working with that, I pull each of the three vectors representing each class’s log-odds into a data frame. From that data frame, I extract the class with the highest probability, and use that for our class prediction.
# Placeholder vector for predictions
preds = c()
# Placeholder df for log odds
out_log_odds = data.frame()
for (b in enumerate(dl_test)) {
# get log odds
output <- model(b[[1]])
# convert to df and append
output_df = output$data() %>% as.array() %>% as.data.frame
out_log_odds <- rbind(out_log_odds, output_df)
# get class prediction from log odds
predicted <- torch_max(output$data(), dim = 2)[[2]]
preds <- c(preds, predicted)
}
head(preds)
## [[1]]
## torch_tensor
## 3
## 3
## 3
## 3
## 3
## 3
## 3
## 3
## 3
## 3
## [ CPULongType{10} ]
##
## [[2]]
## torch_tensor
## 3
## 3
## 3
## 3
## 3
## 3
## 3
## 3
## 3
## 3
## [ CPULongType{10} ]
##
## [[3]]
## torch_tensor
## 3
## 3
## 3
## 3
## 3
## 3
## 3
## 3
## 3
## 3
## [ CPULongType{10} ]
##
## [[4]]
## torch_tensor
## 3
## 3
## 3
## 3
## 3
## 3
## 3
## 3
## 3
## 3
## [ CPULongType{10} ]
##
## [[5]]
## torch_tensor
## 3
## 3
## 3
## 3
## 3
## 3
## 3
## 3
## 3
## 3
## [ CPULongType{10} ]
##
## [[6]]
## torch_tensor
## 3
## 3
## 3
## 3
## 3
## 3
## 3
## 3
## 3
## 3
## [ CPULongType{10} ]
head(out_log_odds)
## V1 V2 V3
## 1 -4.518219 -6.365453 4.822133
## 2 -4.919380 -9.388348 6.034894
## 3 -5.663982 -7.024784 5.442845
## 4 -4.573833 -5.845218 4.552447
## 5 -4.689006 -6.470817 4.848561
## 6 -4.673595 -7.396751 5.219960
And with that, we’ve gone through some basic examples of ML model types using tabular data!
Final Thoughts
Hopefully this post demystified some of the code necessary to get Torch up and running, and that the reader will be more comfortable using torch in day to day work, and even build more awesome stuff!
I hope to make this somewhat of a series, where I make examples of making binary/continuous predictions with text/image data, etc. I’ve used tensorflow
quite a bit, and often ran into seemingly simple errors. My goal with torch is to have a resource online that will help others overcome these.
I’m very excited to see Torch develop within the R community. Binding directly to the C++ libraries is going to pay dividends, as it removed the step of managing python environments within R (which was a necessary pain). First, it’s going to be fun to see libraries like torchtext
and pyro
get ported over and used within R. Second, I think this setup makes it likely that we’ll see some R torch libraries get created that will have to be ported over to Python. The RStudio team absolutely crushed it with this port, and I’m excited to see how they and the R community at large continue to build on this.