帮助你编写、校验并运行基于 eval.yaml 的智能体评测套件
复制安装指令,让 AI 自动完成配置 · 推荐新手
请帮我安装 askskill 上的 "vally-eval" 技能: 1. 下载 https://raw.githubusercontent.com/microsoft/GitHub-Copilot-for-Azure/main/.github/skills/vally-eval/SKILL.md 2. 保存为 ~/.claude/skills/vally-eval/SKILL.md 3. 装好后重载技能,告诉我可以用了
请为一个客服智能体编写 vally eval.yaml 评测套件,包含 5 个测试用例、输入刺激、期望行为和评分规则。
一份结构完整的 eval.yaml 文件草稿,含测试样例与评分配置。
请检查这份 eval.yaml 是否存在字段缺失、格式错误或评分器配置问题,并给出修复建议。
列出配置问题、原因说明,以及可直接应用的修正建议。
把现有的 10 条手工测试样例迁移成 vally eval.yaml 格式,并为每条映射合适的评分标准。
整理后的 eval.yaml 内容,包含测试映射结果与对应评分逻辑。
Skills in the azure-skills plugin are required to have integration tests that run prompts against an LLM agent to evaluate whether they help the agent accomplish goals in target scenarios. Such integration tests are written as vally eval suites, using vally as the underlying tool for running tests and grading the agent outcome.
Vally eval suites are written as yaml documents. All eval suites share eval spec.
Refer to the official documentation on the schema of the spec and the schema of the eval suites writing-eval-specs.
Vally eval suites for azure-skills plugin have the following file layout. The shared eval spec is located at <repo-root>/.vally.yaml. The eval suites are categorized by skills. The eval suites for each skill are located at <repo-root>/evals/<skill-name>/eval.yaml, e.g. <repo-root>/evals/azure-ai/eval.yaml. If a skill needs fixture files for its eval suites, it should organize such fixture files in a fixture directory under its directory, e.g. <repo-root>/evals/azure-ai/fixture/.
azure-skills plugin have implemented JavaScript integration test using Jest as the underlying test runner. All such integration tests are under tests/**/integration.test.ts files.
To migrate integration test for a skill to vally suites, create its eval suite spec at <repo-root>/evals/<skill-name>/eval.yaml, add a suite that runs the same prompt and uses vally's built-in graders to grade the trajectory of the agent run. If the integration test grades the agent run in a way that vally's built-in graders don't support, refer to the official documentation on how to create a custom grader writing-custom-grader.
The legacy Jest based integration test framework implemented features that vally doesn't support yet, such as early termination, follow up, system prompt modification, screenshot taking, etc. Besides, the azure-skills plugin runs automated integration tests, collects its exported data and feeds the data to a dashboard web app under <repo-root>/dashboard/ to monitor skill integration test results.
If you intend to have your vally suites use any of the extended features or have their results be consumed by the dashboard, you MUST use the custom executor in your vally suites.
The custom executor in azure-skills plugin uses special tag values to control the behavior of the custom executor. See tag-helpers.ts to learn what special tags are supported.
Note: If an eval suite specifies an earlyTerminate condition, the suite MUST NOT use the
completedgrader because early terminated runs will always fail thecompletedgrader by design.
Vally eval suites for azure-skills plugin follow certain conventions. For example, all eval suites must have a type, tier, cost and area tag so they can be run for a corresponding target group. To ensure all eval suites follow the conventions, a script is added to validate the eval suites and report errors when it sees any violation. To run the script, execute this command from the scripts/ directory.
# cwd as <repo-root>/scripts/
npm run vally validate-stimulus
Extended features such as early termination are implemented using tags and many of them use serialized JSON objects as input. This validation script also validates the values of these special tags.
Use vally-cli to run vally eval suites. In most cases, you would like to use a command like this.
# In tests/
npm run test:vally -- --skill $SKILL
…
帮助开发者为 Web 应用接入 Azure Application Insights 并配置遥测采集。
调用 Azure AI 完成搜索、语音转写、文本转语音与 OCR 识别
分析并精简 Markdown 内容,降低 token 消耗并提升 AI 处理效率。
用于迭代检查并修复技能 frontmatter 合规性、分数与 token 问题。
帮助你编写、审查并规范符合 agentskills.io 规范的技能文档
自动校验技能结构、升级版本并提交带规范说明的修复 PR。
自动评估网页应用的功能、性能与可用性,帮助快速发现问题
在部署前深度检查 Azure 配置、权限与基础设施就绪情况,提前发现风险。
帮助校验 JSON、邮箱、URL 与 API 响应,快速发现数据与质量问题
将代码库变成自动研究循环,自动发现指标、构建基准并并行搜索优化方案。
帮助你创建、整理、校验并重构 AgentSkills 与 SKILL.md 技能文件。
用于创建、编辑与优化AI技能,并评测其效果与触发准确性。