自动生成测试场景并分析代理执行过程,量化技能与规则遵循率。
复制安装指令,让 AI 自动完成配置 · 推荐新手
请帮我安装 askskill 上的 "skill-comply" 技能: 1. 下载 https://raw.githubusercontent.com/affaan-m/ECC/main/skills/skill-comply/SKILL.md 2. 保存为 ~/.claude/skills/skill-comply/SKILL.md 3. 装好后重载技能,告诉我可以用了
请对这个客服代理做合规测试:围绕退款、投诉、升级处理生成宽松、标准、严格三档提示场景,运行代理,识别是否遵守语气规范、升级规则和禁用承诺,并输出各场景合规率、失败样例和完整工具调用时间线。
一份合规报告,包含三档场景结果、行为分类、违规原因、合规率统计和工具调用明细。
针对这个带搜索和数据库查询能力的代理,自动生成测试场景并检查它是否按定义先检索再查询、是否错误跳过工具、是否调用了未授权工具。请给出行为序列分类、每一步时间线和总体遵循率。
一份流程合规分析,展示工具调用顺序、异常路径、未授权调用情况与总体遵循比例。
请对该代理在宽松、标准、严格三种提示严格度下分别生成任务并执行,比较其在规则遵守、输出一致性和工具使用上的差异,输出分层统计和可视化结论摘要。
一份分层对比结果,说明不同提示强度下的表现变化、主要风险点和总体稳定性结论。
Measures whether coding agents actually follow skills, rules, or agent definitions by:
claude -p and capturing tool call traces via stream-jsonskills/*/SKILL.md): Workflow skills like search-first, TDD guidesrules/common/*.md): Mandatory rules like testing.md, security.md, git-workflow.mdagents/*.md): Whether an agent gets invoked when expected (internal workflow verification not yet supported)/skill-comply <path># Full run
uv run python -m scripts.run ~/.claude/rules/common/testing.md
# Dry run (no cost, spec + scenarios only)
uv run python -m scripts.run --dry-run ~/.claude/skills/search-first/SKILL.md
# Custom models
uv run python -m scripts.run --gen-model haiku --model sonnet <path>
Measures whether a skill/rule is followed even when the prompt doesn't explicitly support it.
Reports are self-contained and include:
For users familiar with hooks, reports also include hook promotion recommendations for steps with low compliance. This is informational — the main value is the compliance visibility itself.
通过双评审智能体对结果进行对抗式校验,提升输出发布前的可靠性
帮助你创建、整理、校验并重构 AgentSkills 与 SKILL.md 技能文件。