lang
Use an LLM to translate a function’s help documentation on-the-fly. lang
overrides the ?
and help()
functions in your R session. If you are using RStudio or Positron, the translated help page will appear in the Help pane on the IDE.
Installing
To install the GitHub version of lang
, use:
install.packages("pak")
::pak("mlverse/lang") pak
Setup lang
lang
can be initialized by one of two ways. The first way is to use an ellmer
chat object:
library(lang)
lang_use(ellmer::chat_openai(model = "gpt-4o"))
Or, call Ollama directly by passing "ollama"
as the backend argument, and specify the model to be used:
lang_use("ollama", "llama3.2", seed = 100)
As a convenience feature, lang
is able to automatically establish a connection with the LLM at the beginning o R session. To do this you can use the .lang_chat
option:
options(.lang_chat = ellmer::chat_openai(model = "gpt-4o"))
Add this line to your .Rprofile file in order for that code to run every time you start R. You can call usethis::edit_r_profile()
to open your .Rprofile file so you can add the option.
Using lang
After setup, simply use ?
to trigger and display the translated documentation. During translation, lang
will display its progress by showing which section of the documentation is currently translating:
> ?lm
: Title Translating
If your environment is set to use the Spanish language, the help pane should display this:
R enforces the printed name of each section, so they cannot be translated. So titles such as Description, Usage and Arguments will always remain untranslated.
How it works
The language that the help documentation will be translated to, is determined by one of the following two environment variables. In order of priority, the variables are:
LANGUAGE
LANG
It is likely that your LANG
variable already defaults to your locale. For example, mine is set to: en_US.UTF-8
(That means English, United States). For someone in France, the locale would be something such as fr_FR.UTF-8
. Llama3.2, recognizes these UTF locales, and using lang
, calling ?
will result in translating the function’s help documentation into French.
It uses the mall
package as the integration point with the LLM. Under the hood, it runs llm_vec_translate()
multiple times to translate the most common sections of the help documentation (e.g.: Title, Description, Details, Arguments, etc.). If lang
determines that your environment is set to use English, it will simply display the original documentation.
Considerations
Translations are not perfect
As you can imagine, the quality of translation will mostly depend on the LLM being used. This solution is meant to be as helpful as possible, but acknowledging that at this stage of LLMs, only a human curated translation will be the best solution. Having said that, I believe that even an imperfect translation could go a long way with someone who is struggling to understand how to use a specific function in a package, and may also struggle with the English language.
Debug
If the original English help page displays, check your environment variables:
Sys.getenv("LANG")
#> [1] "en_US.UTF-8"
Sys.getenv("LANGUAGE")
#> [1] ""
In my case, lang
recognizes that the environment is set to English, because of the en
code in the variable. If your LANG
variable is set to en_...
then no translation will occur.
If this is your case, set the LANGUAGE
variable to your preference. You can use the full language name, such as ‘spanish’, or ‘french’, etc. You can use Sys.setenv(LANGUAGE = "[my language]")
, or, for a more permanent solution, add the entry to your your .Renviron file (usethis::edit_r_environ()
).
Interaction with mall
lang
uses the mall
package to produce the translations. To avoid conflicts in the setup and use of both packages during the R session, lang
runs mall
in a separate R process which is only alive while translating the documentation. This means that you can have a specific LLM setup for lang
, and a different one for mall
during your R session.