$ loading_
帮助用户设计和优化 ClickHouse 数据库模式、查询与高性能分析数据流程。
复制安装指令,让 AI 自动完成配置 · 推荐新手
请帮我安装 askskill 上的 "clickhouse-io" 技能: 1. 下载 https://raw.githubusercontent.com/affaan-m/ECC/main/docs/zh-CN/skills/clickhouse-io/SKILL.md 2. 保存为 ~/.claude/skills/clickhouse-io/SKILL.md 3. 装好后重载技能,告诉我可以用了
请分析这条 ClickHouse 慢查询的性能瓶颈,并给出可执行的优化建议,包括表结构、分区键、排序键、索引和 SQL 改写方案:SELECT user_id, count() FROM events WHERE event_date >= '2024-01-01' AND event_type = 'purchase' GROUP BY user_id ORDER BY count() DESC LIMIT 100。
一份针对查询性能的诊断与优化方案,包含原因分析和改写建议。
我要用 ClickHouse 存储电商用户行为日志,每天约 5 亿条记录,查询场景包括按时间、渠道、地区和事件类型聚合分析。请为我设计表结构,并说明分区、主键、排序键、TTL 和物化视图的最佳实践。
一套适合高并发分析场景的 ClickHouse 建模方案与设计说明。
请评估我的 ClickHouse 集群和数据写入方案是否适合实时分析:3 分片 2 副本,Kafka 持续写入,查询延迟目标 1 秒内。请指出潜在风险,并给出扩缩容、数据分布、MergeTree 配置和摄取链路优化建议。
一份关于集群架构与实时写入链路的评估报告及优化建议。
用于高性能分析和数据工程的 ClickHouse 特定模式。
ClickHouse 是一个用于在线分析处理 (OLAP) 的列式数据库管理系统 (DBMS)。它针对大型数据集上的快速分析查询进行了优化。
关键特性:
CREATE TABLE markets_analytics (
date Date,
market_id String,
market_name String,
volume UInt64,
trades UInt32,
unique_traders UInt32,
avg_trade_size Float64,
created_at DateTime
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(date)
ORDER BY (date, market_id)
SETTINGS index_granularity = 8192;
-- For data that may have duplicates (e.g., from multiple sources)
CREATE TABLE user_events (
event_id String,
user_id String,
event_type String,
timestamp DateTime,
properties String
) ENGINE = ReplacingMergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (user_id, event_id, timestamp)
PRIMARY KEY (user_id, event_id);
-- For maintaining aggregated metrics
CREATE TABLE market_stats_hourly (
hour DateTime,
market_id String,
total_volume AggregateFunction(sum, UInt64),
total_trades AggregateFunction(count, UInt32),
unique_users AggregateFunction(uniq, String)
) ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(hour)
ORDER BY (hour, market_id);
-- Query aggregated data
SELECT
hour,
market_id,
sumMerge(total_volume) AS volume,
countMerge(total_trades) AS trades,
uniqMerge(unique_users) AS users
FROM market_stats_hourly
WHERE hour >= toStartOfHour(now() - INTERVAL 24 HOUR)
GROUP BY hour, market_id
ORDER BY hour DESC;
-- PASS: GOOD: Use indexed columns first
SELECT *
FROM markets_analytics
WHERE date >= '2025-01-01'
AND market_id = 'market-123'
AND volume > 1000
ORDER BY date DESC
LIMIT 100;
-- FAIL: BAD: Filter on non-indexed columns first
SELECT *
FROM markets_analytics
WHERE volume > 1000
AND market_name LIKE '%election%'
AND date >= '2025-01-01';
-- PASS: GOOD: Use ClickHouse-specific aggregation functions
SELECT
toStartOfDay(created_at) AS day,
market_id,
sum(volume) AS total_volume,
count() AS total_trades,
uniq(trader_id) AS unique_traders,
avg(trade_size) AS avg_size
FROM trades
WHERE created_at >= today() - INTERVAL 7 DAY
GROUP BY day, market_id
ORDER BY day DESC, total_volume DESC;
-- PASS: Use quantile for percentiles (more efficient than percentile)
SELECT
quantile(0.50)(trade_size) AS median,
quantile(0.95)(trade_size) AS p95,
quantile(0.99)(trade_size) AS p99
FROM trades
WHERE created_at >= now() - INTERVAL 1 HOUR;
-- Calculate running totals
SELECT
date,
market_id,
volume,
sum(volume) OVER (
PARTITION BY market_id
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_volume
FROM markets_analytics
WHERE date >= today() - INTERVAL 30 DAY
ORDER BY market_id, date;
import { ClickHouse } from 'clickhouse'
const clickhouse = new ClickHouse({
url: process.env.CLICKHOUSE_URL,
port: 8123,
basicAuth: {
username: process.env.CLICKHOUSE_USER,
password: process.env.CLICKHOUSE_PASSWORD
}
})
// PASS: Batch insert (efficient)
async function bulkInsertTrades(trades: Trade[]) {
const values = trades.map(trade => `(
'${trade.id}',
'${trade.market_id}',
'${trade.user_id}',
${trade.amount},
'${trade.timestamp.toISOString()}'
)`).join(',')
await clickhouse.query(`
INSERT INTO trades (id, market_id, user_id, amount, timestamp)
VALUES ${values}
`).toPromise()
}
// FAIL: Individual inserts (slow)
async function insertTrade(trade: Trade) {
// Don't do this in a loop!
await clickhouse.query(`
…
帮助开发者为代码代理配置性能优化、安全防护与研究优先工作流。
提供数据库迁移、回滚与零停机发布的最佳实践指导,适用于多种 ORM 与 SQL 数据库。
通过双评审智能体对结果进行对抗式校验,提升输出发布前的可靠性
帮助你掌握地道 Rust 模式、所有权与并发实践,编写安全高性能应用。
基于 C++ Core Guidelines 编写、审查并重构更安全现代的 C++ 代码。
为 Claude Code 会话提供系统化校验流程,帮助检查结果正确性与质量。
帮助用户掌握 ClickHouse 查询优化、分析建模与数据工程最佳实践。
为 ClickHouse 提供只读查询优化、延迟分析与数据质量监控支持
让 AI 直接查询、管理 ClickHouse 数据库并查看元数据
用自然语言查询 ClickHouse 数据库,并按调用设置访问白名单限制。
提供 PostgreSQL 查询优化、模式设计、索引与安全实践建议
提供 PostgreSQL 查询优化、表结构设计、索引与安全实践建议