帮助将 Azure HDInsight 的 Spark、Hive 与 Oozie 工作负载迁移到 Microsoft Fabric。
复制安装指令,让 AI 自动完成配置 · 推荐新手
请帮我安装 askskill 上的 "hdinsight-migration" 技能: 1. 下载 https://raw.githubusercontent.com/microsoft/skills-for-fabric/main/skills/hdinsight-migration/SKILL.md 2. 保存为 ~/.claude/skills/hdinsight-migration/SKILL.md 3. 装好后重载技能,告诉我可以用了
请将这段基于 Azure HDInsight 的 Spark 代码迁移到 Microsoft Fabric:移除 HiveContext 和 standalone SparkContext 初始化,改用预实例化的 SparkSession;同时把其中的 WASB/ABFS 路径改写为 OneLake 的 abfss URL,并说明每处修改原因。
返回可在 Fabric 中运行的 Spark 代码,并附带关键迁移点说明。
请把以下 Hive DDL 迁移到 Microsoft Fabric Lakehouse:将 STORED AS ORC、外部表定义和存储路径转换为适合 Delta Lake 的表结构与位置;如果涉及 Hive metastore 语义差异,也请一并标注。
输出转换后的 Delta Lake 建表语句,并列出语法与存储映射差异。
我准备下线 HDInsight 集群,请把这套 Oozie workflow/coordinator 迁移为 Microsoft Fabric Pipeline:识别 spark、hive、shell、sqoop 等 action 的对应活动,给出调度触发器设计,并说明原有脚本中需要改用 notebookutils 的文件与凭据操作。
提供 Fabric Pipeline 映射方案、触发配置建议,以及脚本替换清单。
Update Check — ONCE PER SESSION (mandatory) The first time this skill is used in a session, run the check-updates skill before proceeding.
- GitHub Copilot CLI / VS Code: invoke the
check-updatesskill.- Claude Code / Cowork / Cursor / Windsurf / Codex: compare local vs remote package.json version.
- Skip if the check was already performed earlier in this session.
CRITICAL NOTES
- To find workspace details (including its ID) from a workspace name: list all workspaces, then use JMESPath filtering
- To find item details (including its ID) from workspace ID, item type, and item name: list all items of that type in that workspace, then use JMESPath filtering
- HDInsight has no
mssparkutilsordbutilsequivalent —notebookutilsis net-new capability being introducedHiveContextandSQLContextare legacy Spark 1.x/2.x APIs — Fabric uses Spark 3.xSparkSessionexclusively- paths are deprecated and require a Storage Account key or SAS — replace with OneLake shortcuts
wasb://Read these companion documents before executing migration tasks:
az rest, az login, token acquisition, Fabric REST via CLIFor notebook and Lakehouse creation, see spark-authoring-cli. For Fabric Warehouse DDL/DML authoring, see sqldw-authoring-cli.
| Topic | Reference |
|---|---|
| Migration Workload Map | § Migration Workload Map |
| SparkSession & Context API Changes | § SparkSession API Changes |
| WASB / ABFS → OneLake Path Migration | path-migration.md |
| Hive DDL → Delta Lake / Lakehouse Schemas | hive-to-delta.md |
| Oozie → Fabric Pipelines | § Oozie → Fabric Pipelines |
Introducing notebookutils | § Introducing notebookutils |
| Before/After Code Patterns | code-patterns.md |
| Spark Configuration Differences | § Spark Configuration Differences |
| Must / Prefer / Avoid | § Must / Prefer / Avoid |
| Authentication & Token Acquisition | COMMON-CORE.md § Authentication |
| Lakehouse Management | SPARK-AUTHORING-CORE.md § Lakehouse Management |
| HDInsight Component | Fabric Target | Notes |
|---|---|---|
| Spark cluster (notebooks, scripts) | Fabric Spark (Lakehouse / Notebooks / SJD) | No persistent cluster — Starter Pool or Custom Pool provides on-demand Spark |
| Hive / HiveServer2 | Lakehouse SQL Endpoint + Lakehouse schemas | Delta Lake replaces Hive metastore; schemas provide namespace equivalent |
| HBase | Fabric Warehouse or Azure Cosmos DB (separate from Fabric) | HBase has no direct Fabric equivalent — assess workload access patterns |
| Oozie workflows | Fabric Data Pipelines | Map Oozie actions to Fabric activities; see § Oozie → Fabric Pipelines |
| YARN Resource Manager | Fabric Spark monitoring (Spark UI, Monitoring Hub) | No YARN — Fabric manages compute automatically |
| Ambari | Fabric Monitoring Hub + Admin Portal | Cluster health, capacity, and job monitoring |
…
帮助你开发 Microsoft Fabric Spark 工作流、编写调试 Notebook 代码并管理湖仓与资源。