This repository is a comparator-based Lean benchmark for formal mathematics.
Benchmark authors write trusted problem statements once in shared Lean modules, and the
tooling generates one comparator workspace per problem under generated/.
A submission is scored entirely by comparator results: a problem counts as solved iff comparator accepts the submitted solution.
The main user-facing entrypoint is:
lake exe lean-eval --helpUse this path if you are adding or editing benchmark problems.
lake exe cache get
lake buildPut the statement in one of the shared modules under LeanEval/ and mark it with
@[eval_problem].
@[eval_problem]
theorem my_new_problem : ... := by
sorryCurrent source modules live in topic folders such as:
LeanEval/NumberTheory/LeanEval/Topology/LeanEval/ComplexAnalysis/LeanEval/EasyProblems.lean
Each tagged declaration must be listed by exactly one entry in
manifests/problems.toml. The holes array
names every @[eval_problem]-tagged declaration in the module that the
problem owns; for the common single-theorem case it has one element.
[[problem]]
id = "my_new_problem"
title = "My new problem"
test = false
module = "LeanEval.SomeModule"
holes = ["my_new_problem"]
submitter = "Your Name"
notes = "Optional notes."
source = "Optional citation or URL."
informal_solution = "Optional proof sketch or reference."The required fields are:
idtitletestmoduleholessubmitter
A problem may bundle several defs, instances and theorems — list
them all in holes. Comparator then asks the participant to fill every
listed declaration in their Submission.lean. Two conventions:
- Name every instance. The generator addresses holes by their
declaration name, and Lean's auto-generated names for anonymous
instances (e.g.instTopologicalSpaceJacobian) are not stable. Useinstance instAddCommGroup : ... := sorryrather thaninstance : ... := sorry. - Use
holeseven for one declaration. There is notheorem = "..."shorthand; a singleton hole is justholes = ["my_thm"].
See LeanEval/Sandbox/DefHoleExample.lean
and LeanEval/Sandbox/InstanceHoleExample.lean
for the smallest working examples, or
LeanEval/Geometry/JacobianChallenge.lean
for a realistic multi-hole problem.
lake exe lean-eval validate-manifest
lake exe lean-eval check-problem-buildvalidate-manifest checks that @[eval_problem] declarations and manifest entries
match. check-problem-build builds the problem modules so warning-producing Lean changes
do not slip through. Both are cheap and catch the most common mistakes before a CI
roundtrip.
That's it — push your branch and open a PR. CI regenerates the comparator workspaces
under generated/ and verifies they build, so you do not need to commit anything
under generated/ yourself. Once the PR merges, a separate workflow regenerates
generated/ on main and pushes the result.
If CI fails with a generation or build error, you'll need to fix the source. The fastest local equivalent is:
lake exe lean-eval generate --problem my_new_problem
lake exe lean-eval check-generated-builds --problem my_new_problemUse this path if you want to prove benchmark problems without touching the trusted benchmark files.
Generated workspaces live under generated/, one directory per problem. The current
catalog is summarized in generated/index.json.
Examples:
generated/two_plus_two/generated/list_append_singleton_length/generated/cyclotomic_integer_house_le_two/
Copy a clean starter workspace:
lake exe lean-eval start-problem two_plus_twoThat creates workspaces/two_plus_two/ by default. You can also pass a destination:
lake exe lean-eval start-problem two_plus_two /tmp/two_plus_twocd workspaces/two_plus_two
lake updateSolver-owned files are:
Submission.leanSubmission/Helpers.lean- any additional Lean files you add under
Submission/
Trusted files you should not edit in the normal solver workflow are:
Challenge.leanSolution.leanconfig.jsonlakefile.toml
Challenge.lean contains the benchmark statement. Solution.lean is the fixed bridge
that tells comparator to check your theorem from the Submission namespace.
lake testIf the comparator binary is not on your PATH, set it explicitly:
COMPARATOR_BIN=/path/to/comparator lake testYou can verify the installation against the starter problem from the repo root with:
lake exe lean-eval check-comparator-installationComparator setup also requires the upstream external tools, including landrun and
lean4export. Install landrun from its git main branch
(go install github.com/zouuup/landrun/cmd/landrun@main); the latest tagged
release (v0.1.15) is missing fixes that comparator's sandbox relies on.
CI pins lean4export to tag v4.30.0-rc2 and comparator to commit
71b52ec29e06d4b7d882726553b1ceb99a2499e0 (which adds support for
def-shaped holes).
From the repo root:
lake exe lean-eval run-eval
lake exe lean-eval run-eval --jsonThe scorer prefers workspaces/<problem-id>/ when present and falls back to
generated/<problem-id>/ otherwise.
Participants may use Mathlib freely.
If a proof needs helper code that is not already in Mathlib, that code must be included
inside the submission workspace itself. Multi-file submissions are allowed through
Submission.lean and extra local modules under Submission/.
For benchmark-repo submissions, validate changed paths with:
lake exe lean-eval validate-submission --file generated/two_plus_two/Solution.leanThe current validator accepts:
- modifications to
generated/<problem-id>/Solution.leanandgenerated/<problem-id>/Submission.lean - additions, modifications, deletions, renames, or copies of
.leanfiles undergenerated/<problem-id>/Submission/ - additions (only) of markdown files anywhere inside
generated/<problem-id>/, other than the generatedREADME.md - additions (only) of a top-level
generated/<problem-id>/LICENCEorgenerated/<problem-id>/LICENSEfile
In practice, solvers should normally work in Submission.lean and Submission/.
LeanEval/: trusted authored problem statementsmanifests/problems.toml: problem metadatagenerated/: generated comparator workspacesscripts/: generation, validation, and scoring helpersPLAN.md: deferred design and roadmap notes
For a local health pass over the repository:
lake exe lean-eval validate-manifest
lake exe lean-eval check-problem-build
lake exe lean-eval generate --check
lake exe lean-eval check-generated-builds
lake exe lean-eval run-evalThere is also an end-to-end workflow self-check:
lake exe lean-eval check-eval-workflow