使用 PySpark 与 Livy 会话交互分析 Lakehouse 数据并完成高级处理
复制安装指令,让 AI 自动完成配置 · 推荐新手
请帮我安装 askskill 上的 "spark-consumption-cli" 技能: 1. 下载 https://raw.githubusercontent.com/microsoft/skills-for-fabric/main/plugins/fabric-skills/skills/spark-consumption-cli/SKILL.md 2. 保存为 ~/.claude/skills/spark-consumption-cli/SKILL.md 3. 装好后重载技能,告诉我可以用了
请用 PySpark 分析这个 Lakehouse 里的 sales_orders 表,检查空值、重复记录、异常金额分布,并输出可执行代码和分析结论。
返回基于 Livy 会话的 PySpark 代码,以及数据质量问题摘要和关键统计结果。
请启动 Livy 会话,使用 Spark DataFrame 读取两个 Lakehouse 的客户表和订单表,完成跨库关联,并找出高价值客户的近 90 天消费趋势。
返回跨 Lakehouse 读取与关联的 PySpark 代码,并给出高价值客户趋势分析结果。
请用 Spark SQL 或 PySpark 对 inventory_delta 表做 Delta time-travel,对比当前版本与 7 天前版本的库存差异,并输出差异明细和汇总。
返回可执行的时间旅行查询代码,以及库存变更明细、汇总统计和差异说明。
Update Check — ONCE PER SESSION (mandatory) The first time this skill is used in a session, run the check-updates skill before proceeding.
- GitHub Copilot CLI / VS Code: invoke the
check-updatesskill.- Claude Code / Cowork / Cursor / Windsurf / Codex: compare local vs remote package.json version.
- Skip if the check was already performed earlier in this session.
CRITICAL NOTES
- To find the workspace details (including its ID) from workspace name: list all workspaces and, then, use JMESPath filtering
- To find the item details (including its ID) from workspace ID, item type, and item name: list all items of that type in that workspace and, then, use JMESPath filtering
| Task | Reference | Notes |
|---|---|---|
| Fabric Topology & Key Concepts | COMMON-CORE.md § Fabric Topology & Key Concepts |
| Environment URLs | COMMON-CORE.md § Environment URLs |
| Authentication & Token Acquisition | COMMON-CORE.md § Authentication & Token Acquisition | Wrong audience = 401; read before any auth issue |
| Core Control-Plane REST APIs | COMMON-CORE.md § Core Control-Plane REST APIs |
| Pagination | COMMON-CORE.md § Pagination |
| Long-Running Operations (LRO) | COMMON-CORE.md § Long-Running Operations (LRO) |
| Rate Limiting & Throttling | COMMON-CORE.md § Rate Limiting & Throttling |
| OneLake Data Access | COMMON-CORE.md § OneLake Data Access | Requires storage.azure.com token, not Fabric token |
| Job Execution | COMMON-CORE.md § Job Execution |
| Capacity Management | COMMON-CORE.md § Capacity Management |
| Gotchas & Troubleshooting | COMMON-CORE.md § Gotchas & Troubleshooting |
| Best Practices | COMMON-CORE.md § Best Practices |
| Tool Selection Rationale | COMMON-CLI.md § Tool Selection Rationale |
| Finding Workspaces and Items in Fabric | COMMON-CLI.md § Finding Workspaces and Items in Fabric | Mandatory — READ link first [needed for finding workspace id by its name or item id by its name, item type, and workspace id] |
| Authentication Recipes | COMMON-CLI.md § Authentication Recipes | az login flows and token acquisition |
Fabric Control-Plane API via az rest | COMMON-CLI.md § Fabric Control-Plane API via az rest | Always pass --resource https://api.fabric.microsoft.com or az rest fails |
| Pagination Pattern | COMMON-CLI.md § Pagination Pattern |
| Long-Running Operations (LRO) Pattern | COMMON-CLI.md § Long-Running Operations (LRO) Pattern |
OneLake Data Access via curl | COMMON-CLI.md § OneLake Data Access via curl | Use curl not az rest (different token audience) |
| SQL / TDS Data-Plane Access | COMMON-CLI.md § SQL / TDS Data-Plane Access | sqlcmd (Go) connect, query, CSV export |
…