MallFrame

MallFrame(self, df)

Extension to Polars that add ability to use an LLM to run batch predictions over a data frame

We will start by loading the needed libraries, and set up the data frame that will be used in the examples:

import mall
import polars as pl
pl.Config(fmt_str_lengths=100)
pl.Config.set_tbl_hide_dataframe_shape(True)
pl.Config.set_tbl_hide_column_data_types(True)
data = mall.MallData
reviews = data.reviews
reviews.llm.use(options = dict(seed = 100))

Methods

Name Description
classify Classify text into specific categories.
custom Provide the full prompt that the LLM will process.
extract Pull a specific label from the text.
sentiment Use an LLM to run a sentiment analysis
summarize Summarize the text down to a specific number of words.
translate Translate text into another language.
use Define the model, backend, and other options to use to
verify Check to see if something is true about the text.

classify

MallFrame.classify(col, labels='', additional='', pred_name='classify')

Classify text into specific categories.

Parameters

Name Type Description Default
col str The name of the text field to process required
labels list A list or a DICT object that defines the categories to classify the text as. It will return one of the provided labels. ''
pred_name str A character vector with the name of the new column where the prediction will be placed 'classify'
additional str Inserts this text into the prompt sent to the LLM ''

Examples

reviews.llm.classify("review", ["appliance", "computer"])
review classify
"This has been the best TV I've ever used. Great screen, and sound." "computer"
"I regret buying this laptop. It is too slow and the keyboard is too noisy" "computer"
"Not sure how to feel about my new washing machine. Great color, but hard to figure" "appliance"
# Use 'pred_name' to customize the new column's name
reviews.llm.classify("review", ["appliance", "computer"], pred_name="prod_type")
review prod_type
"This has been the best TV I've ever used. Great screen, and sound." "computer"
"I regret buying this laptop. It is too slow and the keyboard is too noisy" "computer"
"Not sure how to feel about my new washing machine. Great color, but hard to figure" "appliance"
#Pass a DICT to set custom values for each classification
reviews.llm.classify("review", {"appliance" : "1", "computer" : "2"})
review classify
"This has been the best TV I've ever used. Great screen, and sound." "1"
"I regret buying this laptop. It is too slow and the keyboard is too noisy" "2"
"Not sure how to feel about my new washing machine. Great color, but hard to figure" "1"

custom

MallFrame.custom(col, prompt='', valid_resps='', pred_name='custom')

Provide the full prompt that the LLM will process.

Parameters

Name Type Description Default
col str The name of the text field to process required
prompt str The prompt to send to the LLM along with the col ''
pred_name str A character vector with the name of the new column where the prediction will be placed 'custom'

Examples

my_prompt = (
    "Answer a question."
    "Return only the answer, no explanation"
    "Acceptable answers are 'yes', 'no'"
    "Answer this about the following text, is this a happy customer?:"
)

reviews.llm.custom("review", prompt = my_prompt)
review custom
"This has been the best TV I've ever used. Great screen, and sound." "Yes"
"I regret buying this laptop. It is too slow and the keyboard is too noisy" "No"
"Not sure how to feel about my new washing machine. Great color, but hard to figure" "No"

extract

MallFrame.extract(col, labels='', expand_cols=False, additional='', pred_name='extract')

Pull a specific label from the text.

Parameters

Name Type Description Default
col str The name of the text field to process required
labels list A list or a DICT object that defines tells the LLM what to look for and return ''
pred_name str A character vector with the name of the new column where the prediction will be placed 'extract'
additional str Inserts this text into the prompt sent to the LLM ''

Examples

# Use 'labels' to let the function know what to extract
reviews.llm.extract("review", labels = "product")
review extract
"This has been the best TV I've ever used. Great screen, and sound." "tv"
"I regret buying this laptop. It is too slow and the keyboard is too noisy" "laptop"
"Not sure how to feel about my new washing machine. Great color, but hard to figure" "washing machine"
# Use 'pred_name' to customize the new column's name
reviews.llm.extract("review", "product", pred_name = "prod")
review prod
"This has been the best TV I've ever used. Great screen, and sound." "tv"
"I regret buying this laptop. It is too slow and the keyboard is too noisy" "laptop"
"Not sure how to feel about my new washing machine. Great color, but hard to figure" "washing machine"
# Pass a vector to request multiple things, the results will be pipe delimeted
# in a single column
reviews.llm.extract("review", ["product", "feelings"])
review extract
"This has been the best TV I've ever used. Great screen, and sound." "tv | great"
"I regret buying this laptop. It is too slow and the keyboard is too noisy" "laptop|frustration"
"Not sure how to feel about my new washing machine. Great color, but hard to figure" "washing machine | confusion"
# Set 'expand_cols' to True to split multiple lables
# into individual columns
reviews.llm.extract(
    col="review",
    labels=["product", "feelings"],
    expand_cols=True
    )
review product feelings
"This has been the best TV I've ever used. Great screen, and sound." "tv " " great"
"I regret buying this laptop. It is too slow and the keyboard is too noisy" "laptop" "frustration"
"Not sure how to feel about my new washing machine. Great color, but hard to figure" "washing machine " " confusion"
# Set custom names to the resulting columns
reviews.llm.extract(
    col="review",
    labels={"prod": "product", "feels": "feelings"},
    expand_cols=True
    )
review prod feels
"This has been the best TV I've ever used. Great screen, and sound." "tv " " great"
"I regret buying this laptop. It is too slow and the keyboard is too noisy" "laptop" "frustration"
"Not sure how to feel about my new washing machine. Great color, but hard to figure" "washing machine " " confusion"

sentiment

MallFrame.sentiment(col, options=['positive', 'negative', 'neutral'], additional='', pred_name='sentiment')

Use an LLM to run a sentiment analysis

Parameters

Name Type Description Default
col str The name of the text field to process required
options list or dict A list of the sentiment options to use, or a named DICT object ['positive', 'negative', 'neutral']
pred_name str A character vector with the name of the new column where the prediction will be placed 'sentiment'
additional str Inserts this text into the prompt sent to the LLM ''

Examples

reviews.llm.sentiment("review")
review sentiment
"This has been the best TV I've ever used. Great screen, and sound." "positive"
"I regret buying this laptop. It is too slow and the keyboard is too noisy" "negative"
"Not sure how to feel about my new washing machine. Great color, but hard to figure" "neutral"
# Use 'pred_name' to customize the new column's name
reviews.llm.sentiment("review", pred_name="review_sentiment")
review review_sentiment
"This has been the best TV I've ever used. Great screen, and sound." "positive"
"I regret buying this laptop. It is too slow and the keyboard is too noisy" "negative"
"Not sure how to feel about my new washing machine. Great color, but hard to figure" "neutral"
# Pass custom sentiment options
reviews.llm.sentiment("review", ["positive", "negative"])
review sentiment
"This has been the best TV I've ever used. Great screen, and sound." "positive"
"I regret buying this laptop. It is too slow and the keyboard is too noisy" "negative"
"Not sure how to feel about my new washing machine. Great color, but hard to figure" "negative"
# Use a DICT object to specify values to return per sentiment
reviews.llm.sentiment("review", {"positive" : "1", "negative" : "0"})
review sentiment
"This has been the best TV I've ever used. Great screen, and sound." "1"
"I regret buying this laptop. It is too slow and the keyboard is too noisy" "0"
"Not sure how to feel about my new washing machine. Great color, but hard to figure" "0"

summarize

MallFrame.summarize(col, max_words=10, additional='', pred_name='summary')

Summarize the text down to a specific number of words.

Parameters

Name Type Description Default
col str The name of the text field to process required
max_words int Maximum number of words to use for the summary 10
pred_name str A character vector with the name of the new column where the prediction will be placed 'summary'
additional str Inserts this text into the prompt sent to the LLM ''

Examples

# Use max_words to set the maximum number of words to use for the summary
reviews.llm.summarize("review", max_words = 5)
review summary
"This has been the best TV I've ever used. Great screen, and sound." "great tv with good features"
"I regret buying this laptop. It is too slow and the keyboard is too noisy" "laptop purchase was a mistake"
"Not sure how to feel about my new washing machine. Great color, but hard to figure" "feeling uncertain about new purchase"
# Use 'pred_name' to customize the new column's name
reviews.llm.summarize("review", 5, pred_name = "review_summary")
review review_summary
"This has been the best TV I've ever used. Great screen, and sound." "great tv with good features"
"I regret buying this laptop. It is too slow and the keyboard is too noisy" "laptop purchase was a mistake"
"Not sure how to feel about my new washing machine. Great color, but hard to figure" "feeling uncertain about new purchase"

translate

MallFrame.translate(col, language='', additional='', pred_name='translation')

Translate text into another language.

Parameters

Name Type Description Default
col str The name of the text field to process required
language str The target language to translate to. For example ‘French’. ''
pred_name str A character vector with the name of the new column where the prediction will be placed 'translation'
additional str Inserts this text into the prompt sent to the LLM ''

Examples

reviews.llm.translate("review", "spanish")
review translation
"This has been the best TV I've ever used. Great screen, and sound." "Esta ha sido la mejor televisión que he utilizado hasta ahora. Gran pantalla y sonido."
"I regret buying this laptop. It is too slow and the keyboard is too noisy" "Me arrepiento de comprar este portátil. Es demasiado lento y la tecla es demasiado ruidosa."
"Not sure how to feel about my new washing machine. Great color, but hard to figure" "No estoy seguro de cómo sentirme con mi nueva lavadora. Un color maravilloso, pero muy difícil de en…
reviews.llm.translate("review", "french")
review translation
"This has been the best TV I've ever used. Great screen, and sound." "Ceci était la meilleure télévision que j'ai jamais utilisée. Écran et son excellent."
"I regret buying this laptop. It is too slow and the keyboard is too noisy" "Je me regrette d'avoir acheté ce portable. Il est trop lent et le clavier fait trop de bruit."
"Not sure how to feel about my new washing machine. Great color, but hard to figure" "Je ne sais pas comment réagir à mon nouveau lave-linge. Couleur superbe, mais difficile à comprendre…

use

MallFrame.use(backend='', model='', _cache='_mall_cache', **kwargs)

Define the model, backend, and other options to use to interact with the LLM.

Parameters

Name Type Description Default
backend str The name of the backend to use. At the beginning of the session it defaults to “ollama”. If passing "", it will remain unchanged ''
model str The name of the model tha the backend should use. At the beginning of the session it defaults to “llama3.2”. If passing "", it will remain unchanged ''
_cache str The path of where to save the cached results. Passing "" disables the cache '_mall_cache'
**kwargs Arguments to pass to the downstream Python call. In this case, the chat function in ollama {}

Examples

# Additional arguments will be passed 'as-is' to the
# downstream R function in this example, to ollama::chat()
reviews.llm.use("ollama", "llama3.2", seed = 100, temp = 0.1)
{'backend': 'ollama',
 'model': 'llama3.2',
 '_cache': '_mall_cache',
 'options': {'seed': 100},
 'seed': 100,
 'temp': 0.1}
# During the Python session, you can change any argument
# individually and it will retain all of previous
# arguments used
reviews.llm.use(temp = 0.3)
{'backend': 'ollama',
 'model': 'llama3.2',
 '_cache': '_mall_cache',
 'options': {'seed': 100},
 'seed': 100,
 'temp': 0.3}
# Use _cache to modify the target folder for caching
reviews.llm.use(_cache = "_my_cache")
{'backend': 'ollama',
 'model': 'llama3.2',
 '_cache': '_my_cache',
 'options': {'seed': 100},
 'seed': 100,
 'temp': 0.3}
# Leave _cache empty to turn off this functionality
reviews.llm.use(_cache = "")
{'backend': 'ollama',
 'model': 'llama3.2',
 '_cache': '',
 'options': {'seed': 100},
 'seed': 100,
 'temp': 0.3}

verify

MallFrame.verify(col, what='', yes_no=[1, 0], additional='', pred_name='verify')

Check to see if something is true about the text.

Parameters

Name Type Description Default
col str The name of the text field to process required
what str The statement or question that needs to be verified against the provided text ''
yes_no list A positional list of size 2, which contains the values to return if true and false. The first position will be used as the ‘true’ value, and the second as the ‘false’ value [1, 0]
pred_name str A character vector with the name of the new column where the prediction will be placed 'verify'
additional str Inserts this text into the prompt sent to the LLM ''

Examples

reviews.llm.verify("review", "is the customer happy")
review verify
"This has been the best TV I've ever used. Great screen, and sound." 1
"I regret buying this laptop. It is too slow and the keyboard is too noisy" 0
"Not sure how to feel about my new washing machine. Great color, but hard to figure" 0
# Use 'yes_no' to modify the 'true' and 'false' values to return
reviews.llm.verify("review", "is the customer happy", ["y", "n"])
review verify
"This has been the best TV I've ever used. Great screen, and sound." "y"
"I regret buying this laptop. It is too slow and the keyboard is too noisy" "n"
"Not sure how to feel about my new washing machine. Great color, but hard to figure" "n"