PA470 - Coding Warmup 4

Part A

We are going to use a toy dataset called bivariate. There is a training, testing, and validation dataset provided.

Code

library(tidyverse)
library(tidymodels)

data(bivariate)

ggplot(bivariate_train, aes(x = A, y = B, color = Class)) +
    geom_point()

Use logistic_reg and glm to make a classification model of Class ~ A * B. Then use tidy and glance to see some summary information on our model. Anything stand out to you?

Code

log_model <- logistic_reg() %>%
    set_engine('glm') %>%
    set_mode('classification') %>%
    fit(Class ~ A * B, data = bivariate_train)

log_model %>% tidy()

# A tibble: 4 × 5
  term          estimate  std.error statistic  p.value
  <chr>            <dbl>      <dbl>     <dbl>    <dbl>
1 (Intercept)  0.115     0.404          0.284 7.76e- 1
2 A            0.00433   0.000434       9.97  2.01e-23
3 B           -0.0553    0.00633       -8.74  2.32e-18
4 A:B         -0.0000101 0.00000222    -4.56  5.04e- 6

Code

log_model %>% glance()

# A tibble: 1 × 8
  null.deviance df.null logLik   AIC   BIC deviance df.residual  nobs
          <dbl>   <int>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
1         1329.    1008  -549. 1106. 1126.    1098.        1005  1009

Part B

Use augment to get predictions. Look at the predictions.

Code

log_model %>% augment(bivariate_test)

# A tibble: 710 × 6
   .pred_class .pred_One .pred_Two     A     B Class
   <fct>           <dbl>     <dbl> <dbl> <dbl> <fct>
 1 One           0.730      0.270   742.  68.8 One  
 2 Two           0.491      0.509   709.  50.4 Two  
 3 One           0.805      0.195  1006.  89.9 One  
 4 Two           0.431      0.569  1983. 112.  Two  
 5 Two           0.169      0.831  1698.  81.0 Two  
 6 One           0.900      0.0996  948.  98.9 One  
 7 One           0.521      0.479   751.  54.8 One  
 8 Two           0.347      0.653  1254.  72.2 Two  
 9 Two           0.00568    0.994  4243. 136.  One  
10 One           0.910      0.0898  713.  88.2 One  
# ℹ 700 more rows

Part C

Visually inspect the predictions using the code below

Code

preds <- expand.grid(
    A = seq(min(bivariate_train$A), max(bivariate_train$A), length.out = 100),
    B = seq(min(bivariate_train$B), max(bivariate_train$B), length.out = 100)) %>%
    augment(log_model, .)

ggplot(preds, aes(x = A, y = B)) +
    geom_contour(aes(z = .pred_One), breaks = .5, col = "black") + 
    geom_point(data = bivariate_val, aes(col = Class), alpha = 0.3)

Part D

Evaluate your model using the following functions (which dataset(s) should you use to do this train, test, or validation). See if you can provide a basic interpretation of the measures.

roc_auc
accuracy
roc_curve and autoplot
f_meas

Code

val_preds <- log_model %>% 
    augment(bivariate_val)

metrics <- list(
    val_preds %>%
        roc_auc(Class, .pred_One),
    val_preds %>%
        accuracy(Class, .pred_class),
    val_preds %>%
        f_meas(Class, .pred_class))

metrics %>% bind_rows()

# A tibble: 3 × 3
  .metric  .estimator .estimate
  <chr>    <chr>          <dbl>
1 roc_auc  binary         0.790
2 accuracy binary         0.76 
3 f_meas   binary         0.827

Code

val_preds %>% 
    roc_curve(Class, .pred_One) %>%
    autoplot()

Part E

Recall Table 8.4 from the textbook. If necessary, class one can be positive and class two can be negative. Using the output from conf_mat, visually verify you know how to calculate the following:

True Positive Rate (TPR), Sensitivity, or Recall
True Negative Rate (TNR) or Specificity
False Positive Rate, Type I error
False Negative Rate (FNR), Type II error
Positive Predictive Value (PPV) or Precision

Code

val_preds %>% 
    conf_mat(truth = Class, estimate = .pred_class)

          Truth
Prediction One Two
       One 172  42
       Two  30  56