Extract entities from text

R/llm-extract.R

llm_extract

Description

Use a Large Language Model (LLM) to extract specific entity, or entities, from the provided text

Usage

 
llm_extract( 
  .data, 
  col, 
  labels, 
  expand_cols = FALSE, 
  additional_prompt = "", 
  pred_name = ".extract" 
) 
 
llm_vec_extract(x, labels = c(), additional_prompt = "", preview = FALSE) 

Arguments

Arguments Description
.data A data.frame or tbl object that contains the text to be analyzed
col The name of the field to analyze, supports tidy-eval
labels A vector with the entities to extract from the text
expand_cols If multiple labels are passed, this is a flag that tells the function to create a new column per item in labels. If labels is a named vector, this function will use those names as the new column names, if not, the function will use a sanitized version of the content as the name.
additional_prompt Inserts this text into the prompt sent to the LLM
pred_name A character vector with the name of the new column where the prediction will be placed
x A vector that contains the text to be analyzed
preview It returns the R call that would have been used to run the prediction. It only returns the first record in x. Defaults to FALSE Applies to vector function only.

Value

llm_extract returns a data.frame or tbl object. llm_vec_extract returns a vector that is the same length as x.

Examples

 
library(mall) 
 
data("reviews") 
 
llm_use("ollama", "llama3.2", seed = 100, .silent = TRUE) 
 
# Use 'labels' to let the function know what to extract 
llm_extract(reviews, review, labels = "product") 
#> # A tibble: 3 × 2
#>   review                                        .extract       
#>   <chr>                                         <chr>          
#> 1 This has been the best TV I've ever used. Gr… tv             
#> 2 I regret buying this laptop. It is too slow … laptop         
#> 3 Not sure how to feel about my new washing ma… washing machine
 
# Use 'pred_name' to customize the new column's name 
llm_extract(reviews, review, "product", pred_name = "prod") 
#> # A tibble: 3 × 2
#>   review                                        prod           
#>   <chr>                                         <chr>          
#> 1 This has been the best TV I've ever used. Gr… tv             
#> 2 I regret buying this laptop. It is too slow … laptop         
#> 3 Not sure how to feel about my new washing ma… washing machine
 
# Pass a vector to request multiple things, the results will be pipe delimeted 
# in a single column 
llm_extract(reviews, review, c("product", "feelings")) 
#> # A tibble: 3 × 2
#>   review                                        .extract                   
#>   <chr>                                         <chr>                      
#> 1 This has been the best TV I've ever used. Gr… tv | great                 
#> 2 I regret buying this laptop. It is too slow … laptop|frustration         
#> 3 Not sure how to feel about my new washing ma… washing machine | confusion
 
# To get multiple columns, use 'expand_cols' 
llm_extract(reviews, review, c("product", "feelings"), expand_cols = TRUE) 
#> # A tibble: 3 × 3
#>   review                                        product            feelings     
#>   <chr>                                         <chr>              <chr>        
#> 1 This has been the best TV I've ever used. Gr… "tv "              " great"     
#> 2 I regret buying this laptop. It is too slow … "laptop"           "frustration"
#> 3 Not sure how to feel about my new washing ma… "washing machine " " confusion"
 
# Pass a named vector to set the resulting column names 
llm_extract( 
  .data = reviews, 
  col = review, 
  labels = c(prod = "product", feels = "feelings"), 
  expand_cols = TRUE 
) 
#> # A tibble: 3 × 3
#>   review                                        prod               feels        
#>   <chr>                                         <chr>              <chr>        
#> 1 This has been the best TV I've ever used. Gr… "tv "              " great"     
#> 2 I regret buying this laptop. It is too slow … "laptop"           "frustration"
#> 3 Not sure how to feel about my new washing ma… "washing machine " " confusion"
 
# For character vectors, instead of a data frame, use this function 
llm_vec_extract("bob smith, 123 3rd street", c("name", "address")) 
#> [1] "bob smith | 123 3rd street"
 
# To preview the first call that will be made to the downstream R function 
llm_vec_extract( 
  "bob smith, 123 3rd street", 
  c("name", "address"), 
  preview = TRUE 
) 
#> ollamar::chat(messages = list(list(role = "user", content = "You are a helpful text extraction engine. Extract the name, address being referred to on the text. I expect 2 items exactly. No capitalization. No explanations. Return the response exclusively in a pipe separated list, and no headers.   The answer is based on the following text:\nbob smith, 123 3rd street")), 
#>     output = "text", model = "llama3.2", seed = 100)