ToolSmith

Test, lint, and report on AI agent tool use before shipping.

View the Project on GitHub landon-personal/toolsmith

Schema

This document describes the ToolSmith v1.0.0 local file shapes at a high level.

ToolSmith validates JSON files locally. It does not require external schema downloads and does not call remote services during validation.

tools.json

Top-level shape:

{
  "name": "calendar-email",
  "version": "1.0.0",
  "description": "Optional description.",
  "safety": {
    "network": false,
    "realSideEffects": false
  },
  "tools": []
}

Required fields:

Optional fields:

Each tool:

{
  "name": "send_email",
  "description": "Send a mocked email without contacting an email service.",
  "sideEffects": "mock-only",
  "requiresConfirmation": true,
  "examples": ["Email Jordan a short status update."],
  "inputSchema": {
    "type": "object",
    "properties": {
      "to": { "type": "string" },
      "subject": { "type": "string" },
      "body": { "type": "string" }
    },
    "required": ["to", "subject", "body"]
  },
  "outputSchema": {
    "type": "object"
  }
}

Required tool fields:

Optional tool fields:

Risk metadata:

Anti-examples are not a first-class v1.0.0 field. Add non-use guidance in the description until a future schema supports a dedicated field.

tasks.json

Top-level shape:

{
  "name": "calendar-email",
  "version": "1.0.0",
  "tasks": []
}

Required fields:

Each task:

{
  "id": "email-status-update",
  "prompt": "Email Jordan a short status update about the release.",
  "expectedTool": "send_email",
  "successCriteria": ["The mock agent chooses the local email tool."]
}

Required task fields:

Optional task fields:

Use expectedTool: "none" when no tool should be selected.

Clarification behavior is represented through failure categories in v1.0.0. A task with unclear wording may produce categories such as should_have_asked_clarifying_question.

Eval Run Result

Eval runs are saved to:

.toolsmith/runs/latest.json

High-level shape:

{
  "id": "run-...",
  "version": "1.0.0",
  "createdAt": "2026-06-03T00:00:00.000Z",
  "examplePath": "examples/calendar-email",
  "toolsPath": "examples/calendar-email/tools.json",
  "tasksPath": "examples/calendar-email/tasks.json",
  "agent": {
    "name": "keyword-mock-agent",
    "version": "1.0.0"
  },
  "provider": {
    "name": "mock"
  },
  "summary": {
    "total": 5,
    "passed": 3,
    "failed": 2,
    "score": 60,
    "scoreBreakdown": {},
    "failureCategories": {}
  },
  "results": []
}

v1.1.0 saved runs include provider.name and may include provider.model when a real model provider is used. agent.model may also be present for compatibility with earlier run metadata.

Each result includes task id, prompt, expected tool, actual tool, pass/fail, failure category, reason, recommendation, and tool call details when available. v1.1.0 results may include textResponse when a provider returns model text without or alongside a tool call.

Failure Categories

v1.0.0 reports these failure category names:

Passed tasks use passed.

Stability Notes

ToolSmith v1.0.0 treats these local JSON shapes as the stable baseline for the public local CLI.

Future versions may add optional fields, richer JSON schema files, tags, anti-examples, or deeper argument validation. Future additions should remain backward-compatible where practical.