帮助你在 AKS 上完成 AI Runway 初始化、GPU 检查与首个模型部署。
复制安装指令,让 AI 自动完成配置 · 推荐新手
请帮我安装 askskill 上的 "airunway-aks-setup" 技能: 1. 下载 https://raw.githubusercontent.com/microsoft/GitHub-Copilot-for-Azure/main/plugin/skills/airunway-aks-setup/SKILL.md 2. 保存为 ~/.claude/skills/airunway-aks-setup/SKILL.md 3. 装好后重载技能,告诉我可以用了
请提供在 AKS 集群上安装 AI Runway 的完整步骤,包括集群验证、控制器安装、基础依赖检查和常见前置条件。
一份分步骤的 AI Runway 安装指南,覆盖验证、安装与依赖准备。
帮我评估 AKS 集群是否适合运行 GPU 推理,包括节点 GPU 可用性、驱动/插件状态、资源配额以及可能的风险点。
一份 GPU 就绪性评估结果,指出缺失项、风险和后续处理建议。
请指导我在已完成安装的 AKS 环境中,通过 AI Runway 部署第一个模型服务,并说明 provider 配置、部署命令与验证方法。
可执行的首个模型部署流程,包含配置示例、命令和验证清单。
This skill walks users from a bare Kubernetes cluster to a running AI model deployment. Follow each step in sequence unless the user provides skip-to-step N to resume from a specific phase.
Cost awareness: GPU node pools incur significant compute charges (A100-80GB can cost $3–5+/hr). Confirm the user understands cost implications before provisioning GPU resources.
This skill assumes an AKS cluster already exists. If the user does not have a cluster, hand off to the azure-kubernetes skill first to provision one (with a GPU node pool unless CPU-only inference is acceptable), then return here.
| Property | Value |
|---|---|
| Best for | End-to-end AI Runway onboarding on AKS |
| CLI tools | kubectl, make, curl |
| MCP tools | None |
| Related skills | azure-kubernetes (cluster setup), azure-diagnostics (troubleshooting) |
Use this skill when the user wants to:
This skill uses no MCP tools. All cluster operations are performed directly via kubectl and make.
skip-to-step N, start at step N; assume prior steps are complete| # | Step | Reference |
|---|---|---|
| 1 | Cluster Verification — context check, node inventory, GPU detection | step-1-verify.md |
| 2 | Controller Installation — CRD + controller deployment | step-2-controller.md |
| 3 | GPU Assessment — detect GPU models, flag dtype/attention constraints | step-3-gpu.md |
| 4 | Provider Setup — recommend and install inference provider | step-4-provider.md |
| 5 | First Deployment — pick a model, deploy, verify Ready | step-5-deploy.md |
| 6 | Summary — recap, smoke test, next steps | step-6-summary.md |
| Error / Symptom | Likely Cause | Remediation |
|---|---|---|
| No kubeconfig context | Not connected to a cluster | Run az aks get-credentials or equivalent |
| Controller in CrashLoopBackOff | Config or RBAC issue | kubectl logs -n airunway-system -l control-plane=controller-manager --previous |
| Provider not ready | Image pull or RBAC issue | kubectl logs <pod-name> -n <namespace> for the provider pod |
| ModelDeployment stuck in Pending | GPU scheduling failure or provider not ready | kubectl describe modeldeployment <name> -n <namespace> events |
bfloat16 errors at inference | T4 or V100 lacks bfloat16 support | Add --dtype float16 to serving args |
For full error handling and rollback procedures, see troubleshooting.md.
帮助开发者为 Web 应用接入 Azure Application Insights 并配置遥测采集。
调用 Azure AI 完成搜索、语音转写、文本转语音与 OCR 识别
分析并精简 Markdown 内容,降低 token 消耗并提升 AI 处理效率。
用于迭代检查并修复技能 frontmatter 合规性、分数与 token 问题。
帮助你编写、审查并规范符合 agentskills.io 规范的技能文档
帮助你编写、校验并运行基于 eval.yaml 的智能体评测套件
让 AI 助手连接并操作 AKS 集群,辅助查询、运维与排障工作
评估 AKS 集群与工作负载对 Automatic 的兼容性并提供迁移修复建议
引导你逐步自定义 Azure OpenAI 模型部署参数与高级选项。
帮助用户按意图在 Azure OpenAI 中部署模型并查询区域容量与可用性。
帮助你将 Azure API 管理配置为 AI 网关,统一治理模型、工具与代理访问。
智能选择 Azure OpenAI 最佳可用区域并快速完成部署