Skip to contents

Embedd Text

Usage

embed_ollama(
  x,
  base_url = "http://localhost:11434",
  model = "all-minilm",
  batch_size = 10L
)

embed_openai(
  x,
  model = "text-embedding-3-small",
  base_url = "https://api.openai.com/v1",
  api_key = get_envvar("OPENAI_API_KEY"),
  dims = NULL,
  user = get_ragnar_username(),
  batch_size = 20L
)

Arguments

x

x can be:

  • A character vector, in which case a matrix of embeddings is returned.

  • A data frame with a column named text, in which case the dataframe is returned with an additional column named embedding.

  • Missing or NULL, in which case a function is returned that can be called to get embeddings. This is a convenient way to partial in additional arguments like model, and is the most convenient way to produce a function that can be passed to the embed argument of ragnar_store_create().

base_url

string, url where the service is available.

model

string; model name

batch_size

split x into batches when embedding. Integer, limit of strings to include in a single request.

api_key

resolved using env var OPENAI_API_KEY

dims

An integer, can be used to truncate the embedding to a specific size.

user

User name passed via the API.

Value

If x is a character vector, then a numeric matrix is returned, where nrow = length(x) and ncol = <model-embedding-size>. If x is a data.frame, then a new embedding matrix "column" is added, containing the matrix described in the previous sentence.

A matrix of embeddings with 1 row per input string, or a dataframe with an 'embedding' column.

Examples

text <- c("a chunk of text", "another chunk of text", "one more chunk of text")
if (FALSE) { # \dontrun{
text |>
  embed_ollama() |>
  str()

text |>
  embed_openai() |>
  str()
} # }