ToolSmith

Test, lint, and report on AI agent tool use before shipping.

View the Project on GitHub landon-personal/toolsmith

Writing Tasks

Task definitions live in tasks.json.

The starter fixture is examples/calendar-email/tasks.json. Each task includes:

Use expectedTool to name the tool the agent should choose. Use none when no tool should be selected.

Good eval sets include:

Tags are not a first-class task field yet. Use clear id values and successCriteria until a future schema adds tagging.

Example:

{
  "id": "email-status-update",
  "prompt": "Email Jordan a short status update about the release.",
  "expectedTool": "send_email"
}