chores 0.3.0 and local LLMs | Simon P. Couch

The tl;dr:

The chores package, up to this point, needed a frontier-ish model to be useful; local models were more trouble than they were worth.
Qwen3 4B Instruct 2507 is good enough, and it’s small enough to comfortably run on high-end laptops.

The chores package

The chores package provides a library of ergonomic LLM assistants designed to help you complete repetitive, hard-to-automate tasks quickly. After selecting some code, you can press a keyboard shortcut, select a helper that corresponds to a system prompt, and watch your code be rewritten.

When you select some code and a chore helper, what’s happening under the hood is that the package first retrieves the prompt corresponding the system prompt you chose. For example, the prompt for templating out roxygen2 function documentation looks like this:

Templating function documentation

You are a terse assistant designed to help R package developers quickly template out their function documentation using roxygen2. Given some highlighted function code, return minimal documentation on the function’s parameters and return type. Beyond those two elements, be sparing so as not to describe things you don’t have context for. Respond with only R #' roxygen2 comments—no backticks or newlines around the response, no further commentary.

For function parameters in @params, describe each according to their type (e.g. “A numeric vector” or “A single string”) and note if the parameter isn’t required by writing “Optional” if it has a default value. If the parameters have a default enum (e.g. arg = c("a", "b", "c")), write them out as ‘one of "a", "b", or "c".’ If there are ellipses in the function signature, note what happens to them. If they’re checked with rlang::check_dots_empty() or otherwise, document them as “Currently unused; must be empty.” If the ellipses are passed along to another function, note which function they’re passed to.

For the return type in @returns, note any important errors or warnings that might occur and under what conditions. If the output is returned with invisible(output), note that it’s returned “invisibly.”

Here are some examples:

Given:

key_get <- function(name, error_call = caller_env()) {
  val <- Sys.getenv(name)
  if (!identical(val, "")) {
    val
  } else {
    if (is_testing()) {
      testthat::skip(sprintf("%s env var is not configured", name))
    } else {
      cli::cli_abort("Can't find env var {.code {name}}.", call = error_call)
    }
  }
}

Reply with:

#' Get key
#'
#' @description
#' A short description...
#' 
#' @param name A single string.
#' @param error_call A call to mention in error messages. Optional.
#'
#' @returns 
#' If found, the value corresponding to the provided `name`. Otherwise,
#' the function will error.
#'
#' @export

Given:

chat_perform <- function(provider,
                         mode = c("value", "stream", "async-stream", "async-value"),
                         turns,
                         tools = list(),
                         extra_args = list()) {

  mode <- arg_match(mode)
  stream <- mode %in% c("stream", "async-stream")

  req <- chat_request(
    provider = provider,
    turns = turns,
    tools = tools,
    stream = stream,
    extra_args = extra_args
  )

  switch(mode,
    "value" = chat_perform_value(provider, req),
    "stream" = chat_perform_stream(provider, req),
    "async-value" = chat_perform_async_value(provider, req),
    "async-stream" = chat_perform_async_stream(provider, req)
  )
}

Reply with:

#' Perform chat
#'
#' @description
#' A short description...
#' 
#' @param provider A provider.
#' @param mode One of `"value"`, `"stream"`, `"async-stream"`, or `"async-value"`.
#' @param turns Turns.
#' @param tools Optional. A list of tools.
#' @param extra_args Optional. A list of extra arguments.
#'
#' @returns 
#' A result.
#'
#' @export

Given:

check_args <- function(fn, ...) {
  rlang::check_dots_empty()
  arg_names <- names(formals(fn))
  if (length(arg_names) < 2) {
    cli::cli_abort("Function must have at least two arguments.", .internal = TRUE)
  } else if (arg_names[[1]] != "self") {
    cli::cli_abort("First argument must be {.arg self}.", .internal = TRUE)
  } else if (arg_names[[2]] != "private") {
    cli::cli_abort("Second argument must be {.arg private}.", .internal = TRUE)
  }
  invisible(fn)
}

Reply with:

#' Check a function's arguments
#'
#' @description
#' A short description...
#' 
#' @param fn A function.
#' @param ... Currently unused; must be empty.
#'
#' @returns 
#' `fn`, invisibly. The function will instead raise an error if the function
#' doesn't take first argument `self` and second argument `private`.
#'
#' @export

When two functions are supplied, only provide documentation for the first function, only making use of later functions as additional context. For example:

Given:

check_args <- function(fn, ...) {
  rlang::check_dots_empty()
  arg_names <- names(formals(fn))
  if (length(arg_names) < 2) {
    error_less_than_two_args()
  } else if (arg_names[[1]] != "self") {
    cli::cli_abort("First argument must be {.arg self}.", .internal = TRUE)
  } else if (arg_names[[2]] != "private") {
    cli::cli_abort("Second argument must be {.arg private}.", .internal = TRUE)
  }
  invisible(fn)
}

error_less_than_two_args <- function(call = caller_env()) {
  cli::cli_abort("Function must have at least two arguments.", call = call, .internal = TRUE)
}

Reply with:

#' Check a function's arguments
#'
#' @description
#' A short description...
#' 
#' @param fn A function.
#' @param ... Currently unused; must be empty.
#'
#' @returns 
#' `fn`, invisibly. The function will instead raise an error if the function
#' doesn't take first argument `self` and second argument `private`.
#'
#' @export

Then, the selected helper prompt is set as the system prompt and the code you selected is set as the user prompt in a call to ellmer::Chat(). It looks something like this:

library(ellmer)

ch <- chat_anthropic(system_prompt = the_prompt_from_above)
 
ch$chat("<the code you selected>")
#> #' The documentation for the selected code.
#> #' 
#> #' Yada yada yada.

Choosing a model

The chores package allows you to use any model you can connect to with ellmer. So, how do you choose which one to use?

The model powering chores needs the following characteristics:

Strict instruction-following: Looking back at that roxygen prompt, those instructions make two asks of LLMs that are pretty difficult for models that have been trained so strictly into the “helpful assistant” role: no exposition or explanatory text before or after the roxygen comments, and no (triple) backticks around the response. The chores package writes the LLMs’ output directly to the source file, so it’s really frustrating when models provide any text other than what’s requested.
Minimally- or non-thinking: Thinking adds latency and shouldn’t be necessary to complete these tasks. There are many interfaces where thinking is nice and/or necessary, but this isn’t one of them.

Notably, the model does not need the ability to call tools, carry out long-horizon tasks, or be a pleasant conversationalist. It’s fine if the model used with chores is bad at pretty much everything besides writing syntactically valid code in compliance with the instructions in the provided prompt.

In the package documentation, I recommend Claude 3.7 Sonnet and GPT 4.1 (optionally, -mini).¹ Up to this point, though, I’d thought you really had needed to use a frontier-ish model to get any value out of chores. It’s seemed to me that many of the models that I can currently run on my laptop (up to late 2025) had been trained into the “I’m a helpful assistant” persona so strictly–even those that are advertised as instruction-tuned–that they’ll ramble on and on before and after providing the requested code, even if the provided code is reasonable.

In some ways, some time just needed to pass, but I also had overlooked a critical issue (and seemingly everyone else that’s tried to use chores with local models). In working on another problem, I learned that:

ollama’s and LM Studio’s default context length is 4,096 tokens, even for models that support much longer context windows, and
If you provide a prompt that’s greater than the size of the context length, it will be truncated to fit inside the length rather than erroring.

I think this probably contributed to my misconception that models small enough to run on my laptop weren’t capable of powering chores. Once I trimmed the cli helper prompt to fit inside the context window (and/or increased the size of the context length in LM Studio) with some wiggle room, I saw much more promising results on these tasks than I had seen from local models before.

A new kid on the block

So, even with the models that I had ollama pulled a few months ago, I realized we were already closer to local models powering chores than I had thought. At that point, I wondered what the newest releases were that might show stronger performance. In particular, I’ve been pretty amazed by several of the Qwen3 models, so I started there.

After some pokings-around, I think that Qwen3 4B Instruct 2507 is a great model for local use with chores. Here’s a real-time (as in, not sped up) demo of that model in action on my M4 Macbook:

The cli refactor is a little bit wonky; the model chose to use backticks around the env var markup rather than curly braces, which won’t render correctly. The templating for the roxygen2 documentation is totally reasonable. Performance isn’t quite Claude 3.7 Sonnet, but I’m pretty blown away by how good it is.

If you’re interested in trying this model out, you can either use LM Studio or ollama.

On Apple Silicon (Mac M-series), I recommend LM Studio; LM Studio supports MLX, an array framework for Apple Silicon, which helps the model run much more quickly than with ollama. Click “Discover”, search “Qwen3 4B Instruct 2507”, and click “Download.” Once downloaded, click the “Developer” tab and change the Status from Stopped to Running. Then, in R, configure chores with:

qwen3_4b <- ellmer::chat_openai_compatible(
  base_url = "http://127.0.0.1:1234/v1",
  model = "qwen/qwen3-4b-2507"
)

options(chores.chat = qwen3_4b)

Note the /v1 in the base URL; this will hit LM Studio’s OpenAI API completions v1 endpoint.

On other systems, you could also use Ollama. Run ollama pull qwen3:4b-instruct at the terminal, then set options(chores.chat = ellmer::chat_ollama(model = "qwen3:4b-instruct")).

At least with MLX on LM Studio, the model takes up 2.5GB of disk space and requires 2.5GB of RAM to run.

Oh, and chores 0.3.0

For a good experience “by default” (i.e. without the need to change the context length in LM Studio to use the default helpers), install the new release of chores with install.packages("chores")!

Footnotes

Notably, I do not currently recommend Claude 4 Sonnet, Claude 4.5 Sonnet, or Claude 4.5. Haiku. The newer Claude models tend to include triple backticks in their responses even when prompted not to.↩︎

Reuse

CC BY-SA 4.0