$ ~/registry/skill/microsoft-verification-discipline

SKILL

verification-discipline

Use when verifying that completed work actually works. Auto-surface during /verify mode, post-implementation review, or before claiming a task is done. Teaches the discipline of testing outcomes vs implementation, the unit/integration/smoke gradient, and what "done" actually means.

来源

GitHub

更新于

2026-06-06

// 安全评估低风险

仅提示词，不执行代码
开源可审计

正在进行安全审计…

凭证密钥
网络外发
代码执行
数据访问
来源供应链

// 安装

复制安装指令，让 AI 自动完成配置 · 推荐新手

请帮我安装 askskill 上的 "verification-discipline" 技能：
1. 下载 https://raw.githubusercontent.com/microsoft/amplifier-bundle-skills/main/skills/verification-discipline/SKILL.md
2. 保存为 ~/.claude/skills/verification-discipline/SKILL.md
3. 装好后重载技能，告诉我可以用了

// 下载

下载 SKILL.md机读安装清单 ↗

// 文档

Verification Discipline

The Principle

Unit tests verify that code-as-written behaves as-written. Smoke and integration tests verify that the system achieves the intended outcome. Those are different questions. You need both.

"All unit tests pass" is necessary. It is rarely sufficient. A finding from the field: four consecutive integration-blocking bugs, all of which passed unit tests, all of which would have been caught by a five-minute smoke test on a fresh environment. The bugs were not exotic — they were the cost of declaring "done" too early.

The Four Failure Modes

Tests written from the implementation outward miss scenarios the code doesn't anticipate. The engineer writes code, then writes tests that exercise the code as written. The tests ask "does this code do what I wrote it to do?" They don't ask "what scenarios does the system need to handle?"
Mocks verify shape, not behavior. A mocked dependency returns the value you told it to return. That tells you nothing about whether the real dependency would have behaved that way.
Tests in isolation miss integration boundaries. Component A passes. Component B passes. Their interaction at the seam fails. The seam was never tested.
Happy-path tests pass while activated code paths fail. A golden-file test verifies that the default (un-activated) configuration renders correctly. The activated configuration — the one production actually uses — was never exercised.

The Verification Gradient

Treat verification as a ladder. Skip a rung and you discover its bugs in production.

Tier	What it verifies	Example
1. Unit	Code does what I wrote it to do	`pytest tests/unit/`
2. Integration	Component pairs interact correctly	`pytest tests/integration/`, real DB
3. Smoke / E2E	System achieves the user-visible outcome	Fresh DTU launch, run real pipeline, observe artifacts
4. Production-equivalent	Real environment, real load, real data	Staging deployment, canary, replay traces

Each tier catches bugs the tier below it cannot. Each tier costs more time than the tier below it. The economic choice is not "skip the expensive tiers." The economic choice is "spend five minutes on tier 3 to avoid five hours of rollback."

What "Done" Actually Means

Before claiming a task is done, satisfy this checklist:

Code does what I wrote it to do (unit tests pass).
Code interacts correctly with other components (integration tests pass, or — if no integration tests exist for this code path — a manual integration check is documented).
The system achieves the user-visible outcome (smoke or E2E test passed, ideally on a fresh environment).
Repo-specific gates from AGENTS.md and .github/PULL_REQUEST_TEMPLATE.md are satisfied.
Evidence is observable — log file, screenshot, output excerpt, events.jsonl analysis. Not "tests pass." Not "looks right."

If any box is unchecked, the work is not done. Say so, explicitly.

Tests-From-Outcomes Pattern

Different from classic TDD. TDD writes unit tests first. Tests-from-outcomes writes the outcome assertion first.

1. Before writing implementation, write down the user-observable outcome.
   "After running this pipeline, events.jsonl contains a `branch_completed`
    event for each branch and no `contract_violation` events."

2. Write a test asserting that outcome. The test runs the real pipeline,
   inspects the real events.jsonl, checks the real conditions.

3. Implement code until the test passes.

Both patterns are valuable. Unit-level TDD verifies internal correctness. Outcome-level testing verifies that the system behaves as the user expects. Use both.

Anti-Patterns

"All unit tests pass, so we're done." Usually wrong. The unit tests verified the code you wrote. They did not verify the system you shipped.

…

查看完整文档 ↗

// 同源资产

技能

module-development

Guide for creating new Amplifier modules including protocol implementation, entry points, mount functions, and testing patterns. Use when creating new modules or understanding module architecture.

microsoft装→

技能

python-standards

Python coding standards for Amplifier including type hints, async patterns, error handling, and formatting. Use when writing Python code for Amplifier modules.

microsoft装→

技能

adapt-skill

Adapt a skill written for another AI coding assistant (Claude Code, Cursor, etc.) into a properly structured Amplifier SKILL.md file. Reads the source skill, identifies platform-specific conventions, researches the source platform if needed, and produces an Amplifier-native skill conforming to the Agent Skills specification with Amplifier extensions. Use when the user wants to adapt a skill, port a skill, convert a skill to amplifier, translate a skill, or has a SKILL.md from another platform they want to bring into Amplifier.

microsoft装→

技能

auth-tls-patterns

Use when your service needs authentication that works without friction locally but secures remote access, automatic TLS certificate setup, or token-based auth with auto-generation and localhost bypass.

microsoft装→

技能

cli-packaging-patterns

Use when building a new CLI tool that needs one-line install via uv or npm, subcommand dispatch with a default action, or 3-tier config resolution (CLI flags, config file, hardcoded defaults).

microsoft装→

技能

amplifier-philosophy

Amplifier design philosophy using Linux kernel metaphor. Covers mechanism vs policy, module architecture, event-driven design, and kernel principles. Use when designing new modules or making architectural decisions.

microsoft装→

$ loading_