hdinsight-migration

Name: hdinsight-migration
Author: microsoft

帮助将 Azure HDInsight 的 Spark、Hive 与 Oozie 工作负载迁移到 Microsoft Fabric。

星标

★ 821

来源

GitHub

更新于

2026-07-18

// 安全评估低风险

仅提示词，不执行代码
开源可审计
社区验证· 821

正在进行安全审计…

凭证密钥
网络外发
代码执行
数据访问
来源供应链

// 安装

复制安装指令，让 AI 自动完成配置 · 推荐新手

请帮我安装 askskill 上的 "hdinsight-migration" 技能：
1. 下载 https://raw.githubusercontent.com/microsoft/skills-for-fabric/main/skills/hdinsight-migration/SKILL.md
2. 保存为 ~/.claude/skills/hdinsight-migration/SKILL.md
3. 装好后重载技能，告诉我可以用了

// 下载

下载 SKILL.md机读安装清单 ↗

// 用法示例

迁移 Spark 代码到 Fabric

输入

请将这段基于 Azure HDInsight 的 Spark 代码迁移到 Microsoft Fabric：移除 HiveContext 和 standalone SparkContext 初始化，改用预实例化的 SparkSession；同时把其中的 WASB/ABFS 路径改写为 OneLake 的 abfss URL，并说明每处修改原因。

预期产出

返回可在 Fabric 中运行的 Spark 代码，并附带关键迁移点说明。

转换 Hive DDL 到 Lakehouse

输入

请把以下 Hive DDL 迁移到 Microsoft Fabric Lakehouse：将 STORED AS ORC、外部表定义和存储路径转换为适合 Delta Lake 的表结构与位置；如果涉及 Hive metastore 语义差异，也请一并标注。

预期产出

输出转换后的 Delta Lake 建表语句，并列出语法与存储映射差异。

将 Oozie 工作流改为 Fabric Pipeline

输入

我准备下线 HDInsight 集群，请把这套 Oozie workflow/coordinator 迁移为 Microsoft Fabric Pipeline：识别 spark、hive、shell、sqoop 等 action 的对应活动，给出调度触发器设计，并说明原有脚本中需要改用 notebookutils 的文件与凭据操作。

预期产出

提供 Fabric Pipeline 映射方案、触发配置建议，以及脚本替换清单。

// 文档

Update Check — ONCE PER SESSION (mandatory) The first time this skill is used in a session, run the check-updates skill before proceeding.

GitHub Copilot CLI / VS Code: invoke the check-updates skill.

Claude Code / Cowork / Cursor / Windsurf / Codex: compare local vs remote package.json version.

Skip if the check was already performed earlier in this session.

CRITICAL NOTES

To find workspace details (including its ID) from a workspace name: list all workspaces, then use JMESPath filtering

To find item details (including its ID) from workspace ID, item type, and item name: list all items of that type in that workspace, then use JMESPath filtering

HDInsight has no mssparkutils or dbutils equivalent — notebookutils is net-new capability being introduced

HiveContext and SQLContext are legacy Spark 1.x/2.x APIs — Fabric uses Spark 3.x SparkSession exclusively

wasb:// paths are deprecated and require a Storage Account key or SAS — replace with OneLake shortcuts

HDInsight → Microsoft Fabric Migration

Prerequisite Knowledge

Read these companion documents before executing migration tasks:

COMMON-CORE.md — Fabric REST API patterns, authentication, token audiences, item discovery
COMMON-CLI.md — az rest, az login, token acquisition, Fabric REST via CLI
SPARK-AUTHORING-CORE.md — Notebook deployment, lakehouse creation, Spark job execution

For notebook and Lakehouse creation, see spark-authoring-cli. For Fabric Warehouse DDL/DML authoring, see sqldw-authoring-cli.

Topic	Reference
Migration Workload Map	§ Migration Workload Map
SparkSession & Context API Changes	§ SparkSession API Changes
WASB / ABFS → OneLake Path Migration	path-migration.md
Hive DDL → Delta Lake / Lakehouse Schemas	hive-to-delta.md
Oozie → Fabric Pipelines	§ Oozie → Fabric Pipelines
Introducing `notebookutils`	§ Introducing notebookutils
Before/After Code Patterns	code-patterns.md
Spark Configuration Differences	§ Spark Configuration Differences
Must / Prefer / Avoid	§ Must / Prefer / Avoid
Authentication & Token Acquisition	COMMON-CORE.md § Authentication
Lakehouse Management	SPARK-AUTHORING-CORE.md § Lakehouse Management

Migration Workload Map

HDInsight Component	Fabric Target	Notes
Spark cluster (notebooks, scripts)	Fabric Spark (Lakehouse / Notebooks / SJD)	No persistent cluster — Starter Pool or Custom Pool provides on-demand Spark
Hive / HiveServer2	Lakehouse SQL Endpoint + Lakehouse schemas	Delta Lake replaces Hive metastore; schemas provide namespace equivalent
HBase	Fabric Warehouse or Azure Cosmos DB (separate from Fabric)	HBase has no direct Fabric equivalent — assess workload access patterns
Oozie workflows	Fabric Data Pipelines	Map Oozie actions to Fabric activities; see § Oozie → Fabric Pipelines
YARN Resource Manager	Fabric Spark monitoring (Spark UI, Monitoring Hub)	No YARN — Fabric manages compute automatically
Ambari	Fabric Monitoring Hub + Admin Portal	Cluster health, capacity, and job monitoring