Code
library(tidyverse)
library(tidymodels)
data(bivariate)
ggplot(bivariate_train, aes(x = A, y = B, color = Class)) +
geom_point()
Ryan Zomorrodi
February 15, 2024
We are going to use a toy dataset called bivariate. There is a training, testing, and validation dataset provided.
Use logistic_reg
and glm
to make a classification model of Class ~ A * B. Then use tidy and glance to see some summary information on our model. Anything stand out to you?
# A tibble: 4 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.115 0.404 0.284 7.76e- 1
2 A 0.00433 0.000434 9.97 2.01e-23
3 B -0.0553 0.00633 -8.74 2.32e-18
4 A:B -0.0000101 0.00000222 -4.56 5.04e- 6
# A tibble: 1 × 8
null.deviance df.null logLik AIC BIC deviance df.residual nobs
<dbl> <int> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 1329. 1008 -549. 1106. 1126. 1098. 1005 1009
Use augment to get predictions. Look at the predictions.
# A tibble: 710 × 6
.pred_class .pred_One .pred_Two A B Class
<fct> <dbl> <dbl> <dbl> <dbl> <fct>
1 One 0.730 0.270 742. 68.8 One
2 Two 0.491 0.509 709. 50.4 Two
3 One 0.805 0.195 1006. 89.9 One
4 Two 0.431 0.569 1983. 112. Two
5 Two 0.169 0.831 1698. 81.0 Two
6 One 0.900 0.0996 948. 98.9 One
7 One 0.521 0.479 751. 54.8 One
8 Two 0.347 0.653 1254. 72.2 Two
9 Two 0.00568 0.994 4243. 136. One
10 One 0.910 0.0898 713. 88.2 One
# ℹ 700 more rows
Visually inspect the predictions using the code below
preds <- expand.grid(
A = seq(min(bivariate_train$A), max(bivariate_train$A), length.out = 100),
B = seq(min(bivariate_train$B), max(bivariate_train$B), length.out = 100)) %>%
augment(log_model, .)
ggplot(preds, aes(x = A, y = B)) +
geom_contour(aes(z = .pred_One), breaks = .5, col = "black") +
geom_point(data = bivariate_val, aes(col = Class), alpha = 0.3)
Evaluate your model using the following functions (which dataset(s) should you use to do this train, test, or validation). See if you can provide a basic interpretation of the measures.
# A tibble: 3 × 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 roc_auc binary 0.790
2 accuracy binary 0.76
3 f_meas binary 0.827
Recall Table 8.4 from the textbook. If necessary, class one can be positive and class two can be negative. Using the output from conf_mat, visually verify you know how to calculate the following: