帮助你在 AKS 上完成 AI Runway 初始化、GPU 检查与首个模型部署。
复制安装指令,让 AI 自动完成配置 · 推荐新手
请帮我安装 askskill 上的 "airunway-aks-setup" 技能: 1. 下载 https://raw.githubusercontent.com/microsoft/GitHub-Copilot-for-Azure/main/plugin/skills/airunway-aks-setup/SKILL.md 2. 保存为 ~/.claude/skills/airunway-aks-setup/SKILL.md 3. 装好后重载技能,告诉我可以用了
请提供在 AKS 集群上安装 AI Runway 的完整步骤,包括集群验证、控制器安装、基础依赖检查和常见前置条件。
一份分步骤的 AI Runway 安装指南,覆盖验证、安装与依赖准备。
帮我评估 AKS 集群是否适合运行 GPU 推理,包括节点 GPU 可用性、驱动/插件状态、资源配额以及可能的风险点。
一份 GPU 就绪性评估结果,指出缺失项、风险和后续处理建议。
请指导我在已完成安装的 AKS 环境中,通过 AI Runway 部署第一个模型服务,并说明 provider 配置、部署命令与验证方法。
可执行的首个模型部署流程,包含配置示例、命令和验证清单。
This skill walks users from a bare Kubernetes cluster to a running AI model deployment. Follow each step in sequence unless the user provides skip-to-step N to resume from a specific phase.
Cost awareness: GPU node pools incur significant compute charges (A100-80GB can cost $3–5+/hr). Confirm the user understands cost implications before provisioning GPU resources.
This skill assumes an AKS cluster already exists. If the user does not have a cluster, hand off to the azure-kubernetes skill first to provision one (with a GPU node pool unless CPU-only inference is acceptable), then return here.
| Property | Value |
|---|---|
| Best for | End-to-end AI Runway onboarding on AKS |
| CLI tools | kubectl, make, curl |
| MCP tools | None |
| Related skills |
azure-kubernetes (cluster setup), azure-diagnostics (troubleshooting) |
Use this skill when the user wants to:
This skill uses no MCP tools. All cluster operations are performed directly via kubectl and make.
skip-to-step N, start at step N; assume prior steps are complete| # | Step | Reference |
|---|---|---|
| 1 | Cluster Verification — context check, node inventory, GPU detection | step-1-verify.md |
| 2 | Controller Installation — CRD + controller deployment | step-2-controller.md |
| 3 | GPU Assessment — detect GPU models, flag dtype/attention constraints | step-3-gpu.md |
| 4 | Provider Setup — recommend and install inference provider | step-4-provider.md |
| 5 | First Deployment — pick a model, deploy, verify Ready | step-5-deploy.md |
| 6 | Summary — recap, smoke test, next steps | step-6-summary.md |
| Error / Symptom | Likely Cause | Remediation |
|---|---|---|
| No kubeconfig context | Not connected to a cluster | Run az aks get-credentials or equivalent |
| Controller in CrashLoopBackOff | Config or RBAC issue | kubectl logs -n airunway-system -l control-plane=controller-manager --previous |
| Provider not ready | Image pull or RBAC issue | kubectl logs <pod-name> -n <namespace> for the provider pod |
| ModelDeployment stuck in Pending | GPU scheduling failure or provider not ready | kubectl describe modeldeployment <name> -n <namespace> events |
bfloat16 errors at inference | T4 or V100 lacks bfloat16 support | Add --dtype float16 to serving args |
For full error handling and rollback procedures, see troubleshooting.md.
分析并精简 Markdown 内容,降低 token 消耗并提升 AI 处理效率。
引导你逐步自定义 Azure OpenAI 模型部署参数与高级选项。