Databricks

This brief example shows how seamless it is to use the same functions, but against a remote database connection. Today, it works with the following functions:

Examples

We will start by connecting to the Databricks Warehouse

library(mall)
library(DBI)

con <- dbConnect(
  odbc::databricks(),
  HTTPPath = Sys.getenv("DATABRICKS_PATH")
)

Next, we will create a small reviews table

library(dplyr)

reviews <- tribble(
  ~review,
  "This has been the best TV I've ever used. Great screen, and sound.",
  "I regret buying this laptop. It is too slow and the keyboard is too noisy",
  "Not sure how to feel about my new washing machine. Great color, but hard to figure"
)

tbl_reviews <- copy_to(con, reviews, overwrite = TRUE)

Using llm_sentiment() in Databricks will call that vendor’s SQL AI function directly:

tbl_reviews |>
  llm_sentiment(review)
#> # Source:   SQL [3 x 2]
#> # Database: Spark SQL 3.1.1[token@Spark SQL/hive_metastore]
#>   review                                                              .sentiment
#>   <chr>                                                               <chr>     
#> 1 This has been the best TV Ive ever used. Great screen, and sound.   positive  
#> 2 I regret buying this laptop. It is too slow and the keyboard is to… negative  
#> 3 Not sure how to feel about my new washing machine. Great color, bu… mixed

There are some differences in the arguments, and output of the LLM’s. Notice that instead of “neutral”, the prediction is “mixed”. The AI Sentiment function does not allow to change the possible options.

Next, we will try llm_summarize(). The max_words argument maps to the same argument in the AI Summarize function:

tbl_reviews |>
  llm_summarize(review, max_words = 5) |> 
  show_query()
#> <SQL>
#> SELECT `reviews`.*, ai_summarize(`review`, CAST(5.0 AS INT)) AS `.summary`
#> FROM `reviews`

llm_classify() for this back-end, will only accept unnamed options.

tbl_reviews |> 
  llm_classify(review, c("appliance", "computer"))
#> # Source:   SQL [3 x 2]
#> # Database: Spark SQL 3.1.1[token@Spark SQL/hive_metastore]
#>   review                                                               .classify
#>   <chr>                                                                <chr>    
#> 1 This has been the best TV Ive ever used. Great screen, and sound.    appliance
#> 2 I regret buying this laptop. It is too slow and the keyboard is too… computer 
#> 3 Not sure how to feel about my new washing machine. Great color, but… appliance