$ ~/registry/skill/microsoft-system-type-ml-serving

SKILL

system-type-ml-serving

Name: system-type-ml-serving
Author: microsoft

帮助设计与评估机器学习训练、部署及实验平台架构方案

来源

GitHub

更新于

2026-07-20

// 安全评估低风险

仅提示词，不执行代码
开源可审计

正在进行安全审计…

凭证密钥
网络外发
代码执行
数据访问
来源供应链

// 安装

复制安装指令，让 AI 自动完成配置 · 推荐新手

请帮我安装 askskill 上的 "system-type-ml-serving" 技能：
1. 下载 https://raw.githubusercontent.com/microsoft/amplifier-bundle-systems-design/main/skills/system-type-ml-serving/SKILL.md
2. 保存为 ~/.claude/skills/system-type-ml-serving/SKILL.md
3. 装好后重载技能，告诉我可以用了

// 下载

下载 SKILL.md机读安装清单 ↗

// 用法示例

设计模型服务架构

输入

请为一个日活百万的推荐系统设计模型服务架构，需覆盖在线推理、特征存储、灰度发布、A/B 测试、监控告警与故障降级，并说明关键权衡。

预期产出

一套分层架构方案，包含核心组件、数据流、容量与稳定性权衡说明。

评审训练流水线

输入

请评审这套机器学习训练流水线设计，重点检查数据版本管理、实验追踪、特征一致性、GPU 调度、模型注册和失败恢复机制，并指出高风险点与改进建议。

预期产出

一份结构化评审意见，列出问题清单、风险等级和可执行优化建议。

分析线上故障模式

输入

请分析一个 AI 推理平台的常见故障模式，包括特征延迟、模型版本漂移、缓存失效、GPU 资源争抢和下游依赖超时，并给出排查路径与缓解方案。

预期产出

故障模式清单、根因分析框架，以及分阶段处置与预防措施。

// 文档

System Type: ML/AI Serving & Training

Patterns, failure modes, and anti-patterns for machine learning infrastructure and model serving systems.

Model Serving Patterns

Online Inference (Real-Time)

What it is. Model receives a request, runs inference synchronously, returns a prediction. The model sits in the request path — latency matters as much as accuracy. When to use. User-facing predictions where the result must be immediate: search ranking, recommendation, fraud scoring at transaction time, autocomplete. When to avoid. When predictions can be precomputed. When the model is too large to meet latency budgets. When the cost per prediction doesn't justify real-time serving. Key concerns. Tail latency (P99, not just P50) dominates user experience. Model loading time creates cold start problems. Memory footprint determines how many models fit per node. Timeouts must be set aggressively — a slow prediction is worse than a fallback.

Batch Inference

What it is. Run predictions over a large dataset on a schedule (hourly, daily). Write results to a store; serve precomputed predictions at request time. When to use. Recommendations that refresh periodically. Risk scoring where real-time freshness isn't required. Any case where the input space is bounded and enumerable. When to avoid. When input features change faster than the batch interval. When the input space is too large to precompute (e.g., arbitrary user queries). When staleness directly harms the user. Key concerns. Batch jobs that overrun their schedule. Incomplete batches that leave stale predictions for some entities. The join between precomputed predictions and request-time serving (cache misses for new entities). Monitoring must cover prediction freshness, not just job success.

Streaming Inference

What it is. Model consumes events from a stream (Kafka, Kinesis), produces predictions continuously. Sits between batch and real-time — lower latency than batch, lower cost than synchronous serving. When to use. Event-driven predictions: fraud detection on transaction streams, anomaly detection on telemetry, real-time feature updates feeding downstream models. When to avoid. When you need sub-100ms request-response latency. When the prediction consumer expects synchronous responses. Key concerns. Consumer lag means predictions fall behind reality. Backpressure from slow models causes event queue growth. Exactly-once semantics for predictions that trigger side effects (e.g., blocking a transaction). Reprocessing on model update — do you recompute predictions for the backlog or only apply the new model forward?

Model-as-a-Service vs. Embedded Models

Model-as-a-service. Centralized inference endpoint. Clear ownership, independent scaling, versioning decoupled from application deploys. But: network latency, one more service to operate, coupling on availability. Embedded models. Model ships inside the application binary or container. No network hop. But: every application deploy includes the model, model updates require app redeployment, resource isolation is harder (the model competes with the app for memory and CPU). The real question: How often does the model change independently of the application? If weekly or more, service extraction pays off. If quarterly, embedding avoids operational overhead.

Feature Engineering

Feature Stores

What it is. A system that manages feature computation, storage, and serving for ML models. Separates feature engineering from model training and serving. Online store. Low-latency key-value lookups at serving time. Backed by Redis, DynamoDB, or similar. Optimized for point lookups by entity ID. Offline store. Historical feature values for training. Backed by a data warehouse, object storage, or lakehouse. Optimized for bulk reads with time-range filters.

…

查看完整文档 ↗

// 同源资产

技能

system-type-edge-offline

帮助设计可离线运行的边缘系统，处理同步、冲突恢复与弱网故障场景。

microsoft装→

技能

system-type-enterprise-integration

帮助设计与评估企业系统集成方案，覆盖遗留改造、网关、事件与失效模式。

microsoft装→

技能

systems-design-review-methodology

用于按七步法系统审查架构设计，识别风险、权衡并产出改进建议。

microsoft装→

技能

design-philosophy-linux

用 Unix/Linux 设计哲学评估系统方案的可组合性、简洁性与关注点分离。

microsoft装→

技能

system-type-azure

帮助用户设计或评估基于 Azure 的系统架构与运维方案

microsoft装→

技能

system-type-multi-tenant-saas

帮助设计和评估多租户SaaS的平台隔离、计量计费与稳定性方案

microsoft装→

// 功能相似

技能

system-type-cli-tool

帮助设计或评估命令行工具与开发者 SDK 的架构、兼容性和使用体验

microsoft装→

技能

system-type-web-service

帮助你设计或评估Web服务架构、API模式、扩展性与可靠性问题。

microsoft装→

技能

system-type-data-pipeline

帮助设计与评估数据管道架构，覆盖批流处理、调度、质量与故障策略。

microsoft装→

技能

system-type-real-time

帮助设计与评估实时协作系统的连接、同步、冲突处理与故障模式

microsoft装→

技能

system-type-spa

帮助设计或评估单页应用架构，涵盖路由、状态、性能与离线能力。

microsoft装→

技能

system-type-event-driven

帮助设计和评估事件驱动、消息驱动与异步工作流系统架构。

microsoft装→

$ loading_

请帮我安装 askskill 上的 "system-type-ml-serving" 技能： 1. 下载 https://raw.githubusercontent.com/microsoft/amplifier-bundle-systems-design/main/skills/system-type-ml-serving/SKILL.md 2. 保存为 ~/.claude/skills/system-type-ml-serving/SKILL.md 3. 装好后重载技能，告诉我可以用了

// 用法示例

设计模型服务架构

输入

请为一个日活百万的推荐系统设计模型服务架构，需覆盖在线推理、特征存储、灰度发布、A/B 测试、监控告警与故障降级，并说明关键权衡。

预期产出

一套分层架构方案，包含核心组件、数据流、容量与稳定性权衡说明。

评审训练流水线

输入

请评审这套机器学习训练流水线设计，重点检查数据版本管理、实验追踪、特征一致性、GPU 调度、模型注册和失败恢复机制，并指出高风险点与改进建议。

预期产出

一份结构化评审意见，列出问题清单、风险等级和可执行优化建议。

分析线上故障模式

输入

请分析一个 AI 推理平台的常见故障模式，包括特征延迟、模型版本漂移、缓存失效、GPU 资源争抢和下游依赖超时，并给出排查路径与缓解方案。

预期产出

故障模式清单、根因分析框架，以及分阶段处置与预防措施。

// 文档

System Type: ML/AI Serving & Training

Patterns, failure modes, and anti-patterns for machine learning infrastructure and model serving systems.