$ ~/registry/skill/microsoft-github-skills-vally-eval

SKILL

vally-eval

帮助你编写、校验并运行基于 eval.yaml 的智能体评测套件

来源

GitHub

更新于

2026-06-07

// 安全评估低风险

仅提示词，不执行代码
开源可审计

正在进行安全审计…

凭证密钥
网络外发
代码执行
数据访问
来源供应链

// 安装

复制安装指令，让 AI 自动完成配置 · 推荐新手

请帮我安装 askskill 上的 "vally-eval" 技能：
1. 下载 https://raw.githubusercontent.com/microsoft/GitHub-Copilot-for-Azure/main/.github/skills/vally-eval/SKILL.md
2. 保存为 ~/.claude/skills/vally-eval/SKILL.md
3. 装好后重载技能，告诉我可以用了

// 下载

下载 SKILL.md机读安装清单 ↗

// 用法示例

创建评测套件

输入

请为一个客服智能体编写 vally eval.yaml 评测套件，包含 5 个测试用例、输入刺激、期望行为和评分规则。

预期产出

一份结构完整的 eval.yaml 文件草稿，含测试样例与评分配置。

校验评测配置

输入

请检查这份 eval.yaml 是否存在字段缺失、格式错误或评分器配置问题，并给出修复建议。

预期产出

列出配置问题、原因说明，以及可直接应用的修正建议。

迁移测试到评测

输入

把现有的 10 条手工测试样例迁移成 vally eval.yaml 格式，并为每条映射合适的评分标准。

预期产出

整理后的 eval.yaml 内容，包含测试映射结果与对应评分逻辑。

// 文档

Vally eval suites

Skills in the azure-skills plugin are required to have integration tests that run prompts against an LLM agent to evaluate whether they help the agent accomplish goals in target scenarios. Such integration tests are written as vally eval suites, using vally as the underlying tool for running tests and grading the agent outcome.

Write vally eval suites

Vally eval suites are written as yaml documents. All eval suites share eval spec.

Refer to the official documentation on the schema of the spec and the schema of the eval suites writing-eval-specs.

Vally eval suites for azure-skills plugin have the following file layout. The shared eval spec is located at <repo-root>/.vally.yaml. The eval suites are categorized by skills. The eval suites for each skill are located at <repo-root>/evals/<skill-name>/eval.yaml, e.g. <repo-root>/evals/azure-ai/eval.yaml. If a skill needs fixture files for its eval suites, it should organize such fixture files in a fixture directory under its directory, e.g. <repo-root>/evals/azure-ai/fixture/.

Migrate integration tests

azure-skills plugin have implemented JavaScript integration test using Jest as the underlying test runner. All such integration tests are under tests/**/integration.test.ts files.

To migrate integration test for a skill to vally suites, create its eval suite spec at <repo-root>/evals/<skill-name>/eval.yaml, add a suite that runs the same prompt and uses vally's built-in graders to grade the trajectory of the agent run. If the integration test grades the agent run in a way that vally's built-in graders don't support, refer to the official documentation on how to create a custom grader writing-custom-grader.

Why is there a custom executor

The legacy Jest based integration test framework implemented features that vally doesn't support yet, such as early termination, follow up, system prompt modification, screenshot taking, etc. Besides, the azure-skills plugin runs automated integration tests, collects its exported data and feeds the data to a dashboard web app under <repo-root>/dashboard/ to monitor skill integration test results.

If you intend to have your vally suites use any of the extended features or have their results be consumed by the dashboard, you MUST use the custom executor in your vally suites.

Use tags to control the custom executor

The custom executor in azure-skills plugin uses special tag values to control the behavior of the custom executor. See tag-helpers.ts to learn what special tags are supported.

Note: If an eval suite specifies an earlyTerminate condition, the suite MUST NOT use the completed grader because early terminated runs will always fail the completed grader by design.

Validate vally eval suites

Vally eval suites for azure-skills plugin follow certain conventions. For example, all eval suites must have a type, tier, cost and area tag so they can be run for a corresponding target group. To ensure all eval suites follow the conventions, a script is added to validate the eval suites and report errors when it sees any violation. To run the script, execute this command from the scripts/ directory.

# cwd as <repo-root>/scripts/
npm run vally validate-stimulus

Extended features such as early termination are implemented using tags and many of them use serialized JSON objects as input. This validation script also validates the values of these special tags.

Run vally eval suites

Use vally-cli to run vally eval suites. In most cases, you would like to use a command like this.

# In tests/
npm run test:vally -- --skill $SKILL

…

查看完整文档 ↗

// 同源资产

技能

appinsights-instrumentation

帮助开发者为 Web 应用接入 Azure Application Insights 并配置遥测采集。

microsoft装→

技能

azure-ai

调用 Azure AI 完成搜索、语音转写、文本转语音与 OCR 识别

microsoft装→

技能

markdown-token-optimizer

分析并精简 Markdown 内容，降低 token 消耗并提升 AI 处理效率。

microsoft装→

技能

sensei

用于迭代检查并修复技能 frontmatter 合规性、分数与 token 问题。

microsoft装→

技能

skill-authoring

帮助你编写、审查并规范符合 agentskills.io 规范的技能文档

microsoft装→

技能

submit-skill-fix-pr

自动校验技能结构、升级版本并提交带规范说明的修复 PR。

microsoft装→

// 功能相似

MCP 工具★1.2k

web-eval-agent

自动评估网页应用的功能、性能与可用性，帮助快速发现问题

refreshdotdev装→

技能

azure-validate

在部署前深度检查 Azure 配置、权限与基础设施就绪情况，提前发现风险。

microsoft装→

MCP 工具

Validator Ai MCP

帮助校验 JSON、邮箱、URL 与 API 响应，快速发现数据与质量问题

—装→

技能★972

evo

将代码库变成自动研究循环，自动发现指标、构建基准并并行搜索优化方案。

evo-hq装→

技能★377k

skill-creator

帮助你创建、整理、校验并重构 AgentSkills 与 SKILL.md 技能文件。

openclaw装→

技能

skill-creator

用于创建、编辑与优化AI技能，并评测其效果与触发准确性。

Anthropic装→

$ loading_