Eval¶

Synopsis¶

kbolt eval run [--file <path>]
kbolt eval import beir --dataset <name> --source <dir> --output <dir> [--collection <name>]

What eval does¶

eval is the benchmark surface for retrieval quality.

Use it to:

run an evaluation from an eval.toml manifest
import a BEIR dataset into a local benchmark corpus plus manifest

`run`¶

Use run to evaluate the current index against an eval manifest:

kbolt eval run
kbolt eval run --file /path/to/eval.toml

Without --file, kbolt loads eval.toml from the config directory.

Important rules:

the top-level --space flag is rejected for eval; set scope inside each eval case instead
each case must include a non-empty query
each case must include at least one judgments entry
each case must include at least one judgment with relevance > 0
judgment paths must be unique within each case
referenced collections must already exist and have indexed chunks

Minimal manifest shape¶

[[cases]]
query = "trait object vs generic"
space = "bench"
collections = ["rust"]
judgments = [
  { path = "rust/traits.md", relevance = 2 },
  { path = "rust/generics.md", relevance = 1 },
]

Each run reports metrics per search mode, including:

keyword
auto
auto+rerank
semantic when an embedder is configured
deep-norerank
deep

`import beir`¶

Use import beir to turn an extracted BEIR dataset into:

a corpus/ directory with materialized Markdown documents
an eval.toml manifest

Example:

kbolt eval import beir --dataset scifact --source /path/to/scifact --output /tmp/scifact-bench

Required source layout¶

The source directory must contain:

corpus.jsonl
queries.jsonl
qrels/test.tsv

This command always imports the BEIR test split.

Import rules¶

--output must point to an empty directory, or to a directory that does not exist yet
--collection defaults to the dataset name
imported corpus files are written as <document-id>.md
the generated eval cases use the default benchmark space bench

After import, the usual path is:

kbolt space add bench
kbolt --space bench collection add /tmp/scifact-bench/corpus --name scifact --no-index
kbolt --space bench update --collection scifact
kbolt eval run --file /tmp/scifact-bench/eval.toml