$ loading_

data-throughput-accelerator — askskill

$ ~/registry/skill/affaan-m-data-throughput-accelerator

SKILL

data-throughput-accelerator

加速大规模数据导入、回填与同步流程，同时保障结果正确一致。

星标

★ 210,546

来源

GitHub

更新于

2026-06-08

// 安全评估低风险

仅提示词，不执行代码
开源可审计
社区验证· 210.5k

正在进行安全审计…

凭证密钥
网络外发
代码执行
数据访问
来源供应链

// 安装

复制安装指令，让 AI 自动完成配置 · 推荐新手

请帮我安装 askskill 上的 "data-throughput-accelerator" 技能：
1. 下载 https://raw.githubusercontent.com/affaan-m/ECC/main/skills/data-throughput-accelerator/SKILL.md
2. 保存为 ~/.claude/skills/data-throughput-accelerator/SKILL.md
3. 装好后重载技能，告诉我可以用了

// 下载

下载 SKILL.md机读安装清单 ↗

// 用法示例

加速历史数据回填

输入

请为一个包含 50 亿条记录的数据仓库回填任务设计加速方案，目标是在保证幂等性、校验完整性和失败可恢复的前提下，缩短执行时间。请给出分片策略、并发控制、重试机制、校验步骤和监控指标。

预期产出

一套可执行的回填加速方案，包含分批并发设计、正确性保障措施与监控建议。

优化 ETL 导入性能

输入

分析当前 ETL 导入流程的性能瓶颈，并提出优化方案，用于提升大批量 CSV 到数据仓库的加载速度。请重点说明批量写入、压缩、分区、并行处理和去重校验如何设计。

预期产出

一份 ETL 性能优化建议，说明如何提升吞吐量并保持数据质量。

提升表同步效率

输入

为跨库表同步任务制定提速方案：源表每天新增 2 亿行，需要更快完成增量同步，同时避免漏数、重复和顺序错乱。请输出同步架构、检查点机制、异常恢复和一致性校验方案。

预期产出

一套高吞吐表同步方案，覆盖增量同步设计、容错机制与一致性验证。

// 文档

Data Throughput Accelerator

Use this skill when the bottleneck is moving, transforming, or saving lots of data. The goal is not just speed. The goal is faster correct data landing in the right place with proof.

First Distinction

Separate these before optimizing:

source extraction speed;
network transfer speed;
warehouse/load speed;
transform speed;
serving-table freshness;
live tail growth while the job runs.

A pipeline can be "fast" and still appear behind if new data arrives faster than the final catch-up window.

Fast Path Heuristics

Move compute to where the data already is.
Prefer warehouse-native scans, joins, and appends for large landed files.
Use manifests or checkpoints so completed files/partitions are skipped.
Use partitioning and clustering that match the read and append pattern.
Batch small files, requests, and writes.
Make writes idempotent through unique keys, manifests, or replaceable staging.
Keep raw, derived, and serving tables separately accountable.

Workflow

Read the current source, target, and manifest contracts.
Measure backlog: external files, manifest rows, raw rows, derived rows, min/max timestamps, and unprocessed counts.

// 同源资产

技能★211k

lead-intelligence

帮助用户智能挖掘高价值潜在客户并生成多渠道触达方案

affaan-m装→

技能★211k

bun-runtime

帮助开发者使用 Bun 进行运行、打包、测试与依赖管理，并评估替代 Node 的时机。

affaan-m装→

// 功能相似

技能★420

sqldw-authoring-cli

通过命令行执行 Fabric 数仓与 SQL 端点的 T-SQL 编写、装载和变更操作。

microsoft装→

技能

sql-queries

帮助你编写、优化并转换多种数据仓库方言的高质量 SQL 查询。

anthropics装→

Data throughput result:
- Source files discovered: 294
- Files processed this run: 294
- Raw rows added: 9,683,598
- Derived rows added: 8,917,585
- Remaining tail: 24 files at readback time
- Runtime: 38.7s
- Correctness gate: manifest counts and table max timestamps match

data-throughput-accelerator

// 用法示例

// 文档

Data Throughput Accelerator

First Distinction

Fast Path Heuristics

Workflow

// 同源资产

lead-intelligence

bun-runtime

// 功能相似

sqldw-authoring-cli

sql-queries

Accounting Output

Guardrails

quarkus-verification

jira-integration

plankton-code-quality

rust-patterns

sqldw-operations-cli

sqldw-consumption-cli

General-Purpose Snowflake MCP Server

AWS Athena Cost MCP Server