library(mall)
llm_use("ollama", "llama3.1", seed = 100)
#>
#> ── mall session object
#> Backend: ollama
#> LLM session:
#> model:llama3.1
#>
#> seed:100
#>
#> R session: cache_folder:_mall_cache
Caching results
Data preparation, and model preparation, is usually a iterative process. Because models in R are normally rather fast, it is not a problem to re-run the entire code to confirm that all of the results are reproducible. But in the case of LLM’s, re-running things may be a problem. Locally, running the LLM will be processor intensive, and typically long. If running against a remote LLM, the issue would the cost per token.
To ameliorate this, mall
is able to cache existing results in a folder. That way, running the same analysis over and over, will be much quicker. Because instead of calling the LLM again, mall
will return the previously recorded result.
By default, this functionality is turned on. The results will be saved to a folder named “_mall_cache” . The name of the folder can be easily changed, simply set the .cache
argument in llm_use()
. To disable this functionality, set the argument to an empty character, meaning .cache = ""
.
How it works
mall
uses all of the values used to make the LLM query as the “finger print” to confidently identify when the same query is being done again. This includes:
- The value in the particular row
- The additional prompting built by the
llm_
function, - Any other arguments/options used, set in
llm_use()
- The name of the back end used for the call
A file is created that contains the request and response. The key to the process is the name of the file itself. The name is the hashed value of the combined value of the items listed above. This becomes the “finger print” that allows mall
to know if there is an existing cache.
Walk-through
We will initialize the LLM session specifying a seed
Using the tictoc
package, we will measure how long it takes to make a simple sentiment call.
library(tictoc)
tic()
llm_vec_sentiment("I am happy")
#> [1] "positive"
toc()
#> 1.266 sec elapsed
This creates a the “_mall_cache” folder, and inside a sub-folder, it creates a file with the cache. The name of the file is the resulting hash value of the combination mentioned in the previous section.
dir_ls("_mall_cache", recurse = TRUE, type = "file")
#> _mall_cache/08/086214f2638f60496fd0468d7de37c59.json
The cache is a JSON file, that contains both the request, and the response. As mentioned in the previous section, the named of the file is derived from the combining the values in the request ($request
).
::read_json(
jsonlite"_mall_cache/08/086214f2638f60496fd0468d7de37c59.json",
simplifyVector = TRUE,
flatten = TRUE
)#> $request
#> $request$messages
#> role
#> 1 user
#> content
#> 1 You are a helpful sentiment engine. Return only one of the following answers: positive, negative, neutral. No capitalization. No explanations. The answer is based on the following text:\nI am happy
#>
#> $request$output
#> [1] "text"
#>
#> $request$model
#> [1] "llama3.1"
#>
#> $request$seed
#> [1] 100
#>
#>
#> $response
#> [1] "positive"
Re-running the same mall
call, will complete significantly faster
tic()
llm_vec_sentiment("I am happy")
#> [1] "positive"
toc()
#> 0.001 sec elapsed
If a slightly different query is made, mall
will recognize that this is a different call, and it will send it to the LLM. The results are then saved in a new JSON file.
llm_vec_sentiment("I am very happy")
#> [1] "positive"
dir_ls("_mall_cache", recurse = TRUE, type = "file")
#> _mall_cache/08/086214f2638f60496fd0468d7de37c59.json
#> _mall_cache/7c/7c7cfcfddc43a90b4deb9d7e60e88291.json
During the same R session, if we change something in llm_use()
that will impact the request to the LLM, that will trigger a new cache file
llm_use(seed = 101)
#>
#> ── mall session object
#> Backend: ollama
#> LLM session:
#> model:llama3.1
#>
#> seed:101
#>
#> R session: cache_folder:_mall_cache
llm_vec_sentiment("I am very happy")
#> [1] "positive"
dir_ls("_mall_cache", recurse = TRUE, type = "file")
#> _mall_cache/08/086214f2638f60496fd0468d7de37c59.json
#> _mall_cache/7c/7c7cfcfddc43a90b4deb9d7e60e88291.json
#> _mall_cache/f1/f1c72c2bf22e22074cef9c859d6344a6.json
The only argument that does not trigger a new cache file is .silent
llm_use(seed = 101, .silent = TRUE)
llm_vec_sentiment("I am very happy")
#> [1] "positive"
dir_ls("_mall_cache", recurse = TRUE, type = "file")
#> _mall_cache/08/086214f2638f60496fd0468d7de37c59.json
#> _mall_cache/7c/7c7cfcfddc43a90b4deb9d7e60e88291.json
#> _mall_cache/f1/f1c72c2bf22e22074cef9c859d6344a6.json
Performance improvements
To drive home the point of the usefulness of this feature, we will use the same data set we used for the README. To start, we will change the cache folder to make it easy to track the new files
llm_use(.cache = "_performance_cache", .silent = TRUE)
As mentioned, we will use the data_bookReviews
data frame from the classmap
package
library(classmap)
data(data_bookReviews)
The individual reviews in this data set are really long. So they take a while to process. To run this test, we will use the first 5 rows:
tic()
|>
data_bookReviews head(5) |>
llm_sentiment(review)
#> # A tibble: 5 × 3
#> review sentiment .sentiment
#> <chr> <fct> <chr>
#> 1 "i got this as both a book and an audio file… 1 negative
#> 2 "this book places too much emphasis on spend… 1 negative
#> 3 "remember the hollywood blacklist? the holly… 2 negative
#> 4 "while i appreciate what tipler was attempti… 1 negative
#> 5 "the others in the series were great, and i … 1 negative
toc()
#> 10.223 sec elapsed
The analysis took about 10 seconds on my laptop, so around 2 seconds per record. That may not seem like much, but during model, or workflow, development having to wait this long every time will take its toll on our time, and patience.
The new cache folder now has the 5 records cached in their corresponding JSON files
dir_ls("_performance_cache", recurse = TRUE, type = "file")
#> _performance_cache/23/23ea4fff55a6058db3b4feefe447ddeb.json
#> _performance_cache/60/60a0dbb7d3b8133d40e2f74deccdbf47.json
#> _performance_cache/76/76f1b84b70328b1b3533436403914217.json
#> _performance_cache/c7/c7cf6e0f9683ae29eba72b0a4dd4b189.json
#> _performance_cache/e3/e375559b424833d17c7bcb067fe6b0f8.json
Re-running the same exact call will not take a fraction of a fraction of the original time!
tic()
|>
data_bookReviews head(5) |>
llm_sentiment(review)
#> # A tibble: 5 × 3
#> review sentiment .sentiment
#> <chr> <fct> <chr>
#> 1 "i got this as both a book and an audio file… 1 negative
#> 2 "this book places too much emphasis on spend… 1 negative
#> 3 "remember the hollywood blacklist? the holly… 2 negative
#> 4 "while i appreciate what tipler was attempti… 1 negative
#> 5 "the others in the series were great, and i … 1 negative
toc()
#> 0.01 sec elapsed
Running an additional record, will only cost the time it takes to process it. The other 5 will still be scored using their cached result
tic()
|>
data_bookReviews head(6) |>
llm_sentiment(review)
#> # A tibble: 6 × 3
#> review sentiment .sentiment
#> <chr> <fct> <chr>
#> 1 "i got this as both a book and an audio file… 1 negative
#> 2 "this book places too much emphasis on spend… 1 negative
#> 3 "remember the hollywood blacklist? the holly… 2 negative
#> 4 "while i appreciate what tipler was attempti… 1 negative
#> 5 "the others in the series were great, and i … 1 negative
#> 6 "a few good things, but she's lost her edge … 1 negative
toc()
#> 0.624 sec elapsed
Set the seed!
If at the end of your analysis, you plan to re-run all of the code, and you want to take advantage of the caching functionaly, then set the model seed. This will allow for the exact same results to be returned by the LLM.
If no seed is set during development, then the results will always come back the same because the cache is being read. But once the cache is removed, to run everything from 0, then you will get different results. This is because the invariability of the cache results, mask the fact that the model will have variability.
llm_use("ollama", "llama3.1", seed = 999)
#>
#> ── mall session object
#> Backend: ollama
#> LLM session:
#> model:llama3.1
#>
#> seed:999
#>
#> R session: cache_folder:_performance_cache