library(mall)
library(classmap)
library(dplyr)
data(data_bookReviews)
|>
data_bookReviews glimpse()
#> Rows: 1,000
#> Columns: 2
#> $ review <chr> "i got this as both a book and an audio file. i had waited t…
#> $ sentiment <fct> 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 2, 1, …
Performance
Performance
We will briefly cover this methods performance from two perspectives:
How long the analysis takes to run locally
How well it predicts
To do so, we will use the data_bookReviews
data set, provided by the classmap
package. For this exercise, only the first 100, of the total 1,000, are going to be part of this analysis.
As per the docs, sentiment
is a factor indicating the sentiment of the review: negative (1) or positive (2)
length(strsplit(paste(head(data_bookReviews$review, 100), collapse = " "), " ")[[1]])
#> [1] 20470
Just to get an idea of how much data we’re processing, I’m using a very, very simple word count. So we’re analyzing a bit over 20 thousand words.
<- data_bookReviews |>
reviews_llm head(100) |>
llm_sentiment(
col = review,
options = c("positive" ~ 2, "negative" ~ 1),
pred_name = "predicted"
)#> ! There were 2 predictions with invalid output, they were coerced to NA
As far as time, on my Apple M3 machine, it took about 1.5 minutes to process, 100 rows, containing 20 thousand words. Setting temp
to 0 in llm_use()
, made the model run faster.
The package uses purrr
to send each prompt individually to the LLM. But, I did try a few different ways to speed up the process, unsuccessfully:
Used
furrr
to send multiple requests at a time. This did not work because either the LLM or Ollama processed all my requests serially. So there was no improvement.I also tried sending more than one row’s text at a time. This cause instability in the number of results. For example sending 5 at a time, sometimes returned 7 or 8. Even sending 2 was not stable.
This is what the new table looks like:
reviews_llm#> # A tibble: 100 × 3
#> review sentiment predicted
#> <chr> <fct> <dbl>
#> 1 "i got this as both a book and an audio file… 1 1
#> 2 "this book places too much emphasis on spend… 1 1
#> 3 "remember the hollywood blacklist? the holly… 2 2
#> 4 "while i appreciate what tipler was attempti… 1 1
#> 5 "the others in the series were great, and i … 1 1
#> 6 "a few good things, but she's lost her edge … 1 1
#> 7 "words cannot describe how ripped off and di… 1 1
#> 8 "1. the persective of most writers is shaped… 1 NA
#> 9 "i have been a huge fan of michael crichton … 1 1
#> 10 "i saw dr. polk on c-span a month or two ago… 2 2
#> # ℹ 90 more rows
I used yardstick
to see how well the model performed. Of course, the accuracy will not be of the “truth”, but rather the package’s results recorded in sentiment
.
library(forcats)
|>
reviews_llm mutate(predicted = as.factor(predicted)) |>
::accuracy(sentiment, predicted)
yardstick#> # A tibble: 1 × 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 accuracy binary 0.980