通过只读命令行快速诊断 Fabric 中 Spark 作业失败、会话异常与性能瓶颈。
复制安装指令,让 AI 自动完成配置 · 推荐新手
请帮我安装 askskill 上的 "spark-operations-cli" 技能: 1. 下载 https://raw.githubusercontent.com/microsoft/skills-for-fabric/main/plugins/fabric-operations/skills/spark-operations-cli/SKILL.md 2. 保存为 ~/.claude/skills/spark-operations-cli/SKILL.md 3. 装好后重载技能,告诉我可以用了
请帮我诊断 Microsoft Fabric 里这个失败的 Spark Notebook 作业。重点检查失败原因、Driver/Executor 日志、Spark Advisor 建议,以及是否存在 OOM、数据倾斜或 shuffle spill,并给出下一步排查建议。
返回失败根因分析、关键日志摘要、性能问题线索,以及可执行的修复与排查建议。
我的 Livy 会话一直停留在 starting 状态,请帮我做只读诊断。检查会话健康状态、相关日志、可能的资源或初始化问题,并说明建议如何恢复。
输出会话状态判断、异常迹象、可能成因,以及恢复或重试的建议步骤。
请诊断这个失败的 Fabric Pipeline 运行中所有 Spark 活动,找出是哪一步失败、每个活动的异常摘要,并汇总是否涉及日志错误、资源不足或性能瓶颈。
提供按活动分组的诊断结果、失败步骤定位、共性问题总结,以及后续修复优先级建议。
Update Check — ONCE PER SESSION (mandatory) The first time this skill is used in a session, run the check-updates skill before proceeding.
- GitHub Copilot CLI / VS Code: invoke the
check-updatesskill.- Claude Code / Cowork / Cursor / Windsurf / Codex: compare local vs remote package.json version.
- Skip if the check was already performed earlier in this session.
CRITICAL NOTES
- To find the workspace details (including its ID) from workspace name: list all workspaces and, then, use JMESPath filtering
- To find the item details (including its ID) from workspace ID, item type, and item name: list all items of that type in that workspace and, then, use JMESPath filtering
- Skill disambiguation:
spark-operations-cliis for read-only triage and diagnosis of existing jobs and sessions. For creating notebooks, running new jobs, or Spark development, usespark-authoring-cli. For interactive PySpark analysis and Livy session creation, usespark-consumption-cli.
This skill provides diagnostics for Microsoft Fabric Spark job failures, Livy session health, and performance bottlenecks using Fabric REST APIs and CLI tools (). All diagnostic operations are read-only; session cleanup (e.g., stopping zombie sessions) requires explicit user confirmation. For Spark development and notebook authoring, use . For interactive PySpark analysis, use .
az restspark-authoring-clispark-consumption-cliThe TOC is grouped by purpose. Start at Diagnostic Workflows when triaging an active failure; the earlier sections are foundational references.
| Task | Reference | Notes |
|---|---|---|
| Fabric Topology & Key Concepts | COMMON-CORE.md § Fabric Topology & Key Concepts | |
| Environment URLs | COMMON-CORE.md § Environment URLs | |
| Authentication & Token Acquisition | COMMON-CORE.md § Authentication & Token Acquisition | Wrong audience = 401; read before any auth issue |
| Core Control-Plane REST APIs | COMMON-CORE.md § Core Control-Plane REST APIs | |
| Pagination | COMMON-CORE.md § Pagination | |
| Long-Running Operations (LRO) | COMMON-CORE.md § Long-Running Operations (LRO) | |
| Rate Limiting & Throttling | COMMON-CORE.md § Rate Limiting & Throttling | |
| Job Execution | COMMON-CORE.md § Job Execution | |
| Capacity Management | COMMON-CORE.md § Capacity Management | |
| Gotchas & Troubleshooting | COMMON-CORE.md § Gotchas & Troubleshooting | |
| Best Practices | COMMON-CORE.md § Best Practices |
| Task | Reference | Notes |
|---|---|---|
| Tool Selection Rationale | COMMON-CLI.md § Tool Selection Rationale | |
| Finding Workspaces and Items in Fabric | COMMON-CLI.md § Finding Workspaces and Items in Fabric | Mandatory — READ link first [needed for finding workspace id by its name or item id by its name, item type, and workspace id] |
| Authentication Recipes | COMMON-CLI.md § Authentication Recipes | az login flows and token acquisition |
…