Test, lint, and report on AI agent tool use before shipping.
This document describes the ToolSmith v1.0.0 local file shapes at a high level.
ToolSmith validates JSON files locally. It does not require external schema downloads and does not call remote services during validation.
tools.jsonTop-level shape:
{
"name": "calendar-email",
"version": "1.0.0",
"description": "Optional description.",
"safety": {
"network": false,
"realSideEffects": false
},
"tools": []
}
Required fields:
name: non-empty stringversion: non-empty stringtools: non-empty arrayOptional fields:
description: stringsafety.network: booleansafety.realSideEffects: booleanEach tool:
{
"name": "send_email",
"description": "Send a mocked email without contacting an email service.",
"sideEffects": "mock-only",
"requiresConfirmation": true,
"examples": ["Email Jordan a short status update."],
"inputSchema": {
"type": "object",
"properties": {
"to": { "type": "string" },
"subject": { "type": "string" },
"body": { "type": "string" }
},
"required": ["to", "subject", "body"]
},
"outputSchema": {
"type": "object"
}
}
Required tool fields:
name: non-empty string, unique within the fileOptional tool fields:
description: stringsideEffects: stringrequiresConfirmation: booleanexamples: array of stringsinputSchema: JSON objectoutputSchema: JSON objectRisk metadata:
sideEffects is free text in v1.0.0. Use values such as mock-only, external side effect if connected to a real API, or destructive external side effect if connected to a real API.requiresConfirmation is optional and useful for tools that would need human approval before any future real execution.Anti-examples are not a first-class v1.0.0 field. Add non-use guidance in the description until a future schema supports a dedicated field.
tasks.jsonTop-level shape:
{
"name": "calendar-email",
"version": "1.0.0",
"tasks": []
}
Required fields:
name: non-empty stringversion: non-empty stringtasks: non-empty arrayEach task:
{
"id": "email-status-update",
"prompt": "Email Jordan a short status update about the release.",
"expectedTool": "send_email",
"successCriteria": ["The mock agent chooses the local email tool."]
}
Required task fields:
id: non-empty string, unique within the fileprompt: non-empty stringexpectedTool: non-empty stringOptional task fields:
successCriteria: array of stringsUse expectedTool: "none" when no tool should be selected.
Clarification behavior is represented through failure categories in v1.0.0. A task with unclear wording may produce categories such as should_have_asked_clarifying_question.
Eval runs are saved to:
.toolsmith/runs/latest.json
High-level shape:
{
"id": "run-...",
"version": "1.0.0",
"createdAt": "2026-06-03T00:00:00.000Z",
"examplePath": "examples/calendar-email",
"toolsPath": "examples/calendar-email/tools.json",
"tasksPath": "examples/calendar-email/tasks.json",
"agent": {
"name": "keyword-mock-agent",
"version": "1.0.0"
},
"provider": {
"name": "mock"
},
"summary": {
"total": 5,
"passed": 3,
"failed": 2,
"score": 60,
"scoreBreakdown": {},
"failureCategories": {}
},
"results": []
}
v1.1.0 saved runs include provider.name and may include provider.model when a real model provider is used. agent.model may also be present for compatibility with earlier run metadata.
Each result includes task id, prompt, expected tool, actual tool, pass/fail, failure category, reason, recommendation, and tool call details when available. v1.1.0 results may include textResponse when a provider returns model text without or alongside a tool call.
v1.0.0 reports these failure category names:
wrong_toolmissing_tool_callhallucinated_toolinvalid_argumentsmissing_required_argumentunnecessary_tool_callunsafe_tool_attemptshould_have_asked_clarifying_questionshould_not_have_asked_clarifying_questionunknown_errorPassed tasks use passed.
ToolSmith v1.0.0 treats these local JSON shapes as the stable baseline for the public local CLI.
Future versions may add optional fields, richer JSON schema files, tags, anti-examples, or deeper argument validation. Future additions should remain backward-compatible where practical.