Extract entities from text

R/llm-extract.R

llm_extract

Description

Use a Large Language Model (LLM) to extract specific entity, or entities, from the provided text

Usage


llm_extract(
  .data,
  col,
  labels,
  expand_cols = FALSE,
  additional_prompt = "",
  pred_name = ".extract"
)

llm_vec_extract(x, labels = c(), additional_prompt = "", preview = FALSE)

Arguments

Arguments Description
.data A data.frame or tbl object that contains the text to be analyzed
col The name of the field to analyze, supports tidy-eval
labels A vector with the entities to extract from the text
expand_cols If multiple labels are passed, this is a flag that tells the function to create a new column per item in labels. If labels is a named vector, this function will use those names as the new column names, if not, the function will use a sanitized version of the content as the name.
additional_prompt Inserts this text into the prompt sent to the LLM
pred_name A character vector with the name of the new column where the prediction will be placed
x A vector that contains the text to be analyzed
preview It returns the R call that would have been used to run the prediction. It only returns the first record in x. Defaults to FALSE Applies to vector function only.

Value

llm_extract returns a data.frame or tbl object. llm_vec_extract returns a vector that is the same length as x.

Examples



library(mall)

data("reviews")

llm_use("ollama", "llama3.2", seed = 100, .silent = TRUE)

# Use 'labels' to let the function know what to extract
llm_extract(reviews, review, labels = "product")
#> # A tibble: 3 × 2
#>   review                                        .extract       
#>   <chr>                                         <chr>          
#> 1 This has been the best TV I've ever used. Gr… tv             
#> 2 I regret buying this laptop. It is too slow … laptop         
#> 3 Not sure how to feel about my new washing ma… washing machine

# Use 'pred_name' to customize the new column's name
llm_extract(reviews, review, "product", pred_name = "prod")
#> # A tibble: 3 × 2
#>   review                                        prod           
#>   <chr>                                         <chr>          
#> 1 This has been the best TV I've ever used. Gr… tv             
#> 2 I regret buying this laptop. It is too slow … laptop         
#> 3 Not sure how to feel about my new washing ma… washing machine

# Pass a vector to request multiple things, the results will be pipe delimeted
# in a single column
llm_extract(reviews, review, c("product", "feelings"))
#> # A tibble: 3 × 2
#>   review                                        .extract                   
#>   <chr>                                         <chr>                      
#> 1 This has been the best TV I've ever used. Gr… tv | great                 
#> 2 I regret buying this laptop. It is too slow … laptop|frustration         
#> 3 Not sure how to feel about my new washing ma… washing machine | confusion

# To get multiple columns, use 'expand_cols'
llm_extract(reviews, review, c("product", "feelings"), expand_cols = TRUE)
#> # A tibble: 3 × 3
#>   review                                        product            feelings     
#>   <chr>                                         <chr>              <chr>        
#> 1 This has been the best TV I've ever used. Gr… "tv "              " great"     
#> 2 I regret buying this laptop. It is too slow … "laptop"           "frustration"
#> 3 Not sure how to feel about my new washing ma… "washing machine " " confusion"

# Pass a named vector to set the resulting column names
llm_extract(
  .data = reviews,
  col = review,
  labels = c(prod = "product", feels = "feelings"),
  expand_cols = TRUE
)
#> # A tibble: 3 × 3
#>   review                                        prod               feels        
#>   <chr>                                         <chr>              <chr>        
#> 1 This has been the best TV I've ever used. Gr… "tv "              " great"     
#> 2 I regret buying this laptop. It is too slow … "laptop"           "frustration"
#> 3 Not sure how to feel about my new washing ma… "washing machine " " confusion"

# For character vectors, instead of a data frame, use this function
llm_vec_extract("bob smith, 123 3rd street", c("name", "address"))
#> [1] "bob smith | 123 3rd street"

# To preview the first call that will be made to the downstream R function
llm_vec_extract(
  "bob smith, 123 3rd street",
  c("name", "address"),
  preview = TRUE
)
#> ollamar::chat(messages = list(list(role = "user", content = "You are a helpful text extraction engine. Extract the name, address being referred to on the text. I expect 2 items exactly. No capitalization. No explanations. Return the response exclusively in a pipe separated list, and no headers.   The answer is based on the following text:\nbob smith, 123 3rd street")), 
#>     output = "text", model = "llama3.2", seed = 100)