MallFrame

MallFrame(self, df)

Extension to Polars that add ability to use an LLM to run batch predictions over a data frame

We will start by loading the needed libraries, and set up the data frame that will be used in the examples:

import mall
import polars as pl
pl.Config(fmt_str_lengths=100)
pl.Config.set_tbl_hide_dataframe_shape(True)
pl.Config.set_tbl_hide_column_data_types(True)
data = mall.MallData
reviews = data.reviews
reviews.llm.use(options = dict(seed = 100))

Methods

Name	Description
classify	Classify text into specific categories.
custom	Provide the full prompt that the LLM will process.
extract	Pull a specific label from the text.
sentiment	Use an LLM to run a sentiment analysis
summarize	Summarize the text down to a specific number of words.
translate	Translate text into another language.
use	Define the model, backend, and other options to use to
verify	Check to see if something is true about the text.

classify

MallFrame.classify(col, labels='', additional='', pred_name='classify')

Classify text into specific categories.

Parameters

Name	Type	Description	Default
`col`	str	The name of the text field to process	required
`labels`	list	A list or a DICT object that defines the categories to classify the text as. It will return one of the provided labels.	`''`
`pred_name`	str	A character vector with the name of the new column where the prediction will be placed	`'classify'`
`additional`	str	Inserts this text into the prompt sent to the LLM	`''`

Examples

reviews.llm.classify("review", ["appliance", "computer"])

review	classify
"This has been the best TV I've ever used. Great screen, and sound."	"computer"
"I regret buying this laptop. It is too slow and the keyboard is too noisy"	"computer"
"Not sure how to feel about my new washing machine. Great color, but hard to figure"	"appliance"

# Use 'pred_name' to customize the new column's name
reviews.llm.classify("review", ["appliance", "computer"], pred_name="prod_type")

review	prod_type
"This has been the best TV I've ever used. Great screen, and sound."	"computer"
"I regret buying this laptop. It is too slow and the keyboard is too noisy"	"computer"
"Not sure how to feel about my new washing machine. Great color, but hard to figure"	"appliance"

#Pass a DICT to set custom values for each classification
reviews.llm.classify("review", {"appliance" : "1", "computer" : "2"})

review	classify
"This has been the best TV I've ever used. Great screen, and sound."	"1"
"I regret buying this laptop. It is too slow and the keyboard is too noisy"	"2"
"Not sure how to feel about my new washing machine. Great color, but hard to figure"	"1"

custom

MallFrame.custom(col, prompt='', valid_resps='', pred_name='custom')

Provide the full prompt that the LLM will process.

Parameters

Name	Type	Description	Default
`col`	str	The name of the text field to process	required
`prompt`	str	The prompt to send to the LLM along with the `col`	`''`
`pred_name`	str	A character vector with the name of the new column where the prediction will be placed	`'custom'`

Examples

my_prompt = (
    "Answer a question."
    "Return only the answer, no explanation"
    "Acceptable answers are 'yes', 'no'"
    "Answer this about the following text, is this a happy customer?:"
)

reviews.llm.custom("review", prompt = my_prompt)

review	custom
"This has been the best TV I've ever used. Great screen, and sound."	"Yes"
"I regret buying this laptop. It is too slow and the keyboard is too noisy"	"No"
"Not sure how to feel about my new washing machine. Great color, but hard to figure"	"No"

extract

MallFrame.extract(col, labels='', expand_cols=False, additional='', pred_name='extract')

Pull a specific label from the text.

Parameters

Name	Type	Description	Default
`col`	str	The name of the text field to process	required
`labels`	list	A list or a DICT object that defines tells the LLM what to look for and return	`''`
`pred_name`	str	A character vector with the name of the new column where the prediction will be placed	`'extract'`
`additional`	str	Inserts this text into the prompt sent to the LLM	`''`

Examples

# Use 'labels' to let the function know what to extract
reviews.llm.extract("review", labels = "product")

review	extract
"This has been the best TV I've ever used. Great screen, and sound."	"tv"
"I regret buying this laptop. It is too slow and the keyboard is too noisy"	"laptop"
"Not sure how to feel about my new washing machine. Great color, but hard to figure"	"washing machine"

# Use 'pred_name' to customize the new column's name
reviews.llm.extract("review", "product", pred_name = "prod")

review	prod
"This has been the best TV I've ever used. Great screen, and sound."	"tv"
"I regret buying this laptop. It is too slow and the keyboard is too noisy"	"laptop"
"Not sure how to feel about my new washing machine. Great color, but hard to figure"	"washing machine"

# Pass a vector to request multiple things, the results will be pipe delimeted
# in a single column
reviews.llm.extract("review", ["product", "feelings"])

review	extract
"This has been the best TV I've ever used. Great screen, and sound."	"tv \| great"
"I regret buying this laptop. It is too slow and the keyboard is too noisy"	"laptop\|frustration"
"Not sure how to feel about my new washing machine. Great color, but hard to figure"	"washing machine \| confusion"

# Set 'expand_cols' to True to split multiple lables
# into individual columns
reviews.llm.extract(
    col="review",
    labels=["product", "feelings"],
    expand_cols=True
    )

review	product	feelings
"This has been the best TV I've ever used. Great screen, and sound."	"tv "	" great"
"I regret buying this laptop. It is too slow and the keyboard is too noisy"	"laptop"	"frustration"
"Not sure how to feel about my new washing machine. Great color, but hard to figure"	"washing machine "	" confusion"

# Set custom names to the resulting columns
reviews.llm.extract(
    col="review",
    labels={"prod": "product", "feels": "feelings"},
    expand_cols=True
    )

review	prod	feels
"This has been the best TV I've ever used. Great screen, and sound."	"tv "	" great"
"I regret buying this laptop. It is too slow and the keyboard is too noisy"	"laptop"	"frustration"
"Not sure how to feel about my new washing machine. Great color, but hard to figure"	"washing machine "	" confusion"

sentiment

MallFrame.sentiment(col, options=['positive', 'negative', 'neutral'], additional='', pred_name='sentiment')

Use an LLM to run a sentiment analysis

Parameters

Name	Type	Description	Default
`col`	str	The name of the text field to process	required
`options`	list or dict	A list of the sentiment options to use, or a named DICT object	`['positive', 'negative', 'neutral']`
`pred_name`	str	A character vector with the name of the new column where the prediction will be placed	`'sentiment'`
`additional`	str	Inserts this text into the prompt sent to the LLM	`''`

Examples

reviews.llm.sentiment("review")

review	sentiment
"This has been the best TV I've ever used. Great screen, and sound."	"positive"
"I regret buying this laptop. It is too slow and the keyboard is too noisy"	"negative"
"Not sure how to feel about my new washing machine. Great color, but hard to figure"	"neutral"

# Use 'pred_name' to customize the new column's name
reviews.llm.sentiment("review", pred_name="review_sentiment")

review	review_sentiment
"This has been the best TV I've ever used. Great screen, and sound."	"positive"
"I regret buying this laptop. It is too slow and the keyboard is too noisy"	"negative"
"Not sure how to feel about my new washing machine. Great color, but hard to figure"	"neutral"

# Pass custom sentiment options
reviews.llm.sentiment("review", ["positive", "negative"])

review	sentiment
"This has been the best TV I've ever used. Great screen, and sound."	"positive"
"I regret buying this laptop. It is too slow and the keyboard is too noisy"	"negative"
"Not sure how to feel about my new washing machine. Great color, but hard to figure"	"negative"

# Use a DICT object to specify values to return per sentiment
reviews.llm.sentiment("review", {"positive" : 1, "negative" : 0})

review	sentiment
"This has been the best TV I've ever used. Great screen, and sound."	1
"I regret buying this laptop. It is too slow and the keyboard is too noisy"	0
"Not sure how to feel about my new washing machine. Great color, but hard to figure"	0

summarize

MallFrame.summarize(col, max_words=10, additional='', pred_name='summary')

Summarize the text down to a specific number of words.

Parameters

Name	Type	Description	Default
`col`	str	The name of the text field to process	required
`max_words`	int	Maximum number of words to use for the summary	`10`
`pred_name`	str	A character vector with the name of the new column where the prediction will be placed	`'summary'`
`additional`	str	Inserts this text into the prompt sent to the LLM	`''`

Examples

# Use max_words to set the maximum number of words to use for the summary
reviews.llm.summarize("review", max_words = 5)

review	summary
"This has been the best TV I've ever used. Great screen, and sound."	"great tv with good features"
"I regret buying this laptop. It is too slow and the keyboard is too noisy"	"laptop purchase was a mistake"
"Not sure how to feel about my new washing machine. Great color, but hard to figure"	"feeling uncertain about new purchase"

# Use 'pred_name' to customize the new column's name
reviews.llm.summarize("review", 5, pred_name = "review_summary")

review	review_summary
"This has been the best TV I've ever used. Great screen, and sound."	"great tv with good features"
"I regret buying this laptop. It is too slow and the keyboard is too noisy"	"laptop purchase was a mistake"
"Not sure how to feel about my new washing machine. Great color, but hard to figure"	"feeling uncertain about new purchase"

translate

MallFrame.translate(col, language='', additional='', pred_name='translation')

Translate text into another language.

Parameters

Name	Type	Description	Default
`col`	str	The name of the text field to process	required
`language`	str	The target language to translate to. For example ‘French’.	`''`
`pred_name`	str	A character vector with the name of the new column where the prediction will be placed	`'translation'`
`additional`	str	Inserts this text into the prompt sent to the LLM	`''`

Examples

reviews.llm.translate("review", "spanish")

review	translation
"This has been the best TV I've ever used. Great screen, and sound."	"Esta ha sido la mejor televisión que he utilizado hasta ahora. Gran pantalla y sonido."
"I regret buying this laptop. It is too slow and the keyboard is too noisy"	"Me arrepiento de comprar este portátil. Es demasiado lento y la tecla es demasiado ruidosa."
"Not sure how to feel about my new washing machine. Great color, but hard to figure"	"No estoy seguro de cómo sentirme con mi nueva lavadora. Un color maravilloso, pero muy difícil de en…

reviews.llm.translate("review", "french")

review	translation
"This has been the best TV I've ever used. Great screen, and sound."	"Ceci était la meilleure télévision que j'ai jamais utilisée. Écran et son excellent."
"I regret buying this laptop. It is too slow and the keyboard is too noisy"	"Je me regrette d'avoir acheté ce portable. Il est trop lent et le clavier fait trop de bruit."
"Not sure how to feel about my new washing machine. Great color, but hard to figure"	"Je ne sais pas comment réagir à mon nouveau lave-linge. Couleur superbe, mais difficile à comprendre…

use

MallFrame.use(backend='', model='', _cache='_mall_cache', **kwargs)

Define the model, backend, and other options to use to interact with the LLM.

Parameters

Name	Type	Description	Default
`backend`	str \| Chat	The name of the backend to use, or a `chatlas` chat object. At the beginning of the session it defaults to “ollama”. If passing `""`, it will remain unchanged	`''`
`model`	str	The name of the model tha the backend should use. At the beginning of the session it defaults to “llama3.2”. If passing `""`, it will remain unchanged	`''`
`_cache`	str	The path of where to save the cached results. Passing `""` disables the cache	`'_mall_cache'`
`**kwargs`		Arguments to pass to the downstream Python call. In this case, the `chat` function in `ollama`	`{}`

Examples

# Additional arguments will be passed 'as-is' to the
# downstream R function in this example, to ollama::chat()
reviews.llm.use("ollama", "llama3.2", options = dict(seed = 100, temperature = 0.1))

{'backend': 'ollama',
 'model': 'llama3.2',
 '_cache': '_mall_cache',
 'options': {'seed': 100, 'temperature': 0.1}}

# During the Python session, you can change any argument
# individually and it will retain all of previous
# arguments used
reviews.llm.use(options = dict(temperature = 0.3))

{'backend': 'ollama',
 'model': 'llama3.2',
 '_cache': '_mall_cache',
 'options': {'temperature': 0.3}}

# Use _cache to modify the target folder for caching
reviews.llm.use(_cache = "_my_cache")

{'backend': 'ollama',
 'model': 'llama3.2',
 '_cache': '_my_cache',
 'options': {'temperature': 0.3}}

# Leave _cache empty to turn off this functionality
reviews.llm.use(_cache = "")

{'backend': 'ollama',
 'model': 'llama3.2',
 '_cache': '',
 'options': {'temperature': 0.3}}

# Use a `chatlas` object 
from chatlas import ChatOpenAI
chat = ChatOpenAI()
reviews.llm.use(chat)

{'backend': 'chatlas',
 'model': 'llama3.2',
 '_cache': '_mall_cache',
 'options': {'temperature': 0.3},
 'chat': <Chat turns=0 tokens=0>}

verify

MallFrame.verify(col, what='', yes_no=[1, 0], additional='', pred_name='verify')

Check to see if something is true about the text.

Parameters

Name	Type	Description	Default
`col`	str	The name of the text field to process	required
`what`	str	The statement or question that needs to be verified against the provided text	`''`
`yes_no`	list	A positional list of size 2, which contains the values to return if true and false. The first position will be used as the ‘true’ value, and the second as the ‘false’ value	`[1, 0]`
`pred_name`	str	A character vector with the name of the new column where the prediction will be placed	`'verify'`
`additional`	str	Inserts this text into the prompt sent to the LLM	`''`

Examples

reviews.llm.verify("review", "is the customer happy")

review	verify
"This has been the best TV I've ever used. Great screen, and sound."	null
"I regret buying this laptop. It is too slow and the keyboard is too noisy"	null
"Not sure how to feel about my new washing machine. Great color, but hard to figure"	null

# Use 'yes_no' to modify the 'true' and 'false' values to return
reviews.llm.verify("review", "is the customer happy", ["y", "n"])

review	verify
"This has been the best TV I've ever used. Great screen, and sound."	null
"I regret buying this laptop. It is too slow and the keyboard is too noisy"	null
"Not sure how to feel about my new washing machine. Great color, but hard to figure"	null