ToolSmith

Test, lint, and report on AI agent tool use before shipping.

View the Project on GitHub landon-personal/toolsmith

CI Mode

ToolSmith can fail builds when eval scores fall below a threshold:

npm run dev -- eval examples/calendar-email --fail-under 80

If the score is below the threshold, the command exits non-zero and prints:

Fail-under threshold: 80%
CI result: failed

Compare a baseline run with a current run:

npm run dev -- compare baseline.json .toolsmith/runs/latest.json

Fail on score regression:

npm run dev -- compare baseline.json .toolsmith/runs/latest.json --fail-on-regression

The docs-only GitHub Actions example lives at:

docs/examples/github-actions.md

No GitHub Actions workflow is enabled in this repo. GitHub Pages is used for documentation only and does not run CI checks.

Before using CI checks publicly, review docs/RELEASE_CHECKLIST.md and verify macOS/Windows expectations in docs/CROSS_PLATFORM.md.