$ ~/registry/skill/anthropics-bio-research-skills-nextflow-development

SKILL

nextflow-development

Name: nextflow-development
Author: anthropics

运行 nf-core/Nextflow 流水线，完成 RNA-seq、变异检测与 ATAC-seq 数据分析

星标

★ 22,528

来源

GitHub

更新于

2026-07-11

// 安全评估需留意

仅提示词，不执行代码
开源可审计

总评

该技能材料显示其为开源的提示/文档型 Skill，本身未声明需要密钥或固定远程端点，整体风险较低。需注意其目标工作流涉及下载公共数据、安装依赖并运行生信流程，若按说明落地执行将带来本机执行、文件访问与外部拉取依赖的常规操作风险。

凭证密钥低风险

材料与客观检查项均表明不需要密钥或环境变量，未见要求提供 API token、云凭证或账户口令；凭证暴露面较小。

网络外发需留意

README 明确包含从 GEO/SRA 获取公共测序数据，以及安装 Docker/Nextflow 等外部资源的步骤；虽未声明固定业务端点，但实际使用会发生面向公开数据源和依赖源的网络访问，可能传出查询参数或样本标识。

代码执行需留意

文档中包含运行 Python 脚本、Nextflow 流程以及安装/更新软件的命令，还提到 Docker、Java 与系统级命令；这属于本机执行代码和启动进程的常规能力，需在受控环境中使用。

数据访问需留意

技能声明会处理本地 FASTQ、生成 samplesheet 并验证输出，说明其预期会读取本地测序数据并写入结果文件；当前材料未显示超出声明用途的过度数据权限，但仍涉及较大规模科研数据的本地访问。

来源供应链低风险

来源为 GitHub 上的开源仓库，且系统标记为 prompt-only、open-source，可审计性较好；未见闭源外传或明显恶意迹象。需留意该仓库 star 为 0、许可证未声明、维护状态未知，可信度虽不高但不足以单独升为高风险。

安全建议

仅在隔离环境中运行其建议的脚本与流程，避免直接在生产科研工作站执行安装命令。
对 GEO/SRA 下载、Nextflow 拉取内容和容器镜像做来源校验，并固定版本。
在处理本地测序数据前确认输入目录、输出目录和磁盘占用，避免误读写重要文件。
优先人工审阅仓库脚本与依赖说明，确认许可证和维护状态后再采用。

审计模型: gpt-5.4 · 2026-06-17

// 安装

复制安装指令，让 AI 自动完成配置 · 推荐新手

请帮我安装 askskill 上的 "nextflow-development" 技能：
1. 下载 https://raw.githubusercontent.com/anthropics/knowledge-work-plugins/main/bio-research/skills/nextflow-development/SKILL.md
2. 保存为 ~/.claude/skills/nextflow-development/SKILL.md
3. 装好后重载技能，告诉我可以用了

// 下载

下载 SKILL.md机读安装清单 ↗

// 用法示例

RNA-seq 表达分析

输入

请用 nf-core/rnaseq 分析本地 FASTQ 数据，生成 samplesheet，并输出基因表达定量、质控结果和差异表达分析所需文件。

预期产出

返回可运行的流程配置、samplesheet 结构说明，以及表达定量与质控产物清单。

WGS/WES 变异检测

输入

请使用 nf-core/sarek 对 WGS/WES 测序数据进行变异检测，说明需要的输入格式、参考基因组配置和主要输出文件。

预期产出

返回 Sarek 流程的运行方案，并列出 SNP/Indel/结构变异等关键结果文件。

GEO/SRA 公开数据复现

输入

我有 GEO/SRA 的 GSE、GSM 或 SRR 编号，请帮我整理下载与分析步骤，并选择合适的 nf-core 流水线完成复现分析。

预期产出

返回从公开编号到样本表、下载流程和下游分析方案的完整执行建议。

// 文档

nf-core Pipeline Deployment

Run nf-core bioinformatics pipelines on local or public sequencing data.

Target users: Bench scientists and researchers without specialized bioinformatics training who need to run large-scale omics analyses—differential expression, variant calling, or chromatin accessibility analysis.

Workflow Checklist

- [ ] Step 0: Acquire data (if from GEO/SRA)
- [ ] Step 1: Environment check (MUST pass)
- [ ] Step 2: Select pipeline (confirm with user)
- [ ] Step 3: Run test profile (MUST pass)
- [ ] Step 4: Create samplesheet
- [ ] Step 5: Configure & run (confirm genome with user)
- [ ] Step 6: Verify outputs

Step 0: Acquire Data (GEO/SRA Only)

Skip this step if user has local FASTQ files.

For public datasets, fetch from GEO/SRA first. See references/geo-sra-acquisition.md for the full workflow.

Quick start:

# 1. Get study info
python scripts/sra_geo_fetch.py info GSE110004

# 2. Download (interactive mode)
python scripts/sra_geo_fetch.py download GSE110004 -o ./fastq -i

# 3. Generate samplesheet
python scripts/sra_geo_fetch.py samplesheet GSE110004 --fastq-dir ./fastq -o samplesheet.csv

DECISION POINT: After fetching study info, confirm with user:

Which sample subset to download (if multiple data types)
Suggested genome and pipeline

Then continue to Step 1.

Step 1: Environment Check

Run first. Pipeline will fail without passing environment.

python scripts/check_environment.py

All critical checks must pass. If any fail, provide fix instructions:

Docker issues

Problem	Fix
Not installed	Install from https://docs.docker.com/get-docker/
Permission denied	`sudo usermod -aG docker $USER` then re-login
Daemon not running	`sudo systemctl start docker`

Nextflow issues

Problem	Fix
Not installed	`curl -s https://get.nextflow.io \| bash && mv nextflow ~/bin/`
Version < 23.04	`nextflow self-update`

Java issues

Problem	Fix
Not installed / < 11	`sudo apt install openjdk-11-jdk`

Do not proceed until all checks pass. For HPC/Singularity, see references/troubleshooting.md.

Step 2: Select Pipeline

DECISION POINT: Confirm with user before proceeding.

Data Type	Pipeline	Version	Goal
RNA-seq	`rnaseq`	3.22.2	Gene expression
WGS/WES	`sarek`	3.7.1	Variant calling
ATAC-seq	`atacseq`	2.1.2	Chromatin accessibility

Auto-detect from data:

python scripts/detect_data_type.py /path/to/data

For pipeline-specific details:

references/pipelines/rnaseq.md
references/pipelines/sarek.md
references/pipelines/atacseq.md

Step 3: Run Test Profile

Validates environment with small data. MUST pass before real data.

nextflow run nf-core/<pipeline> -r <version> -profile test,docker --outdir test_output

Pipeline	Command
rnaseq	`nextflow run nf-core/rnaseq -r 3.22.2 -profile test,docker --outdir test_rnaseq`
sarek	`nextflow run nf-core/sarek -r 3.7.1 -profile test,docker --outdir test_sarek`
atacseq	`nextflow run nf-core/atacseq -r 2.1.2 -profile test,docker --outdir test_atacseq`

Verify:

ls test_output/multiqc/multiqc_report.html
grep "Pipeline completed successfully" .nextflow.log

If test fails, see references/troubleshooting.md.

Step 4: Create Samplesheet

Generate automatically

python scripts/generate_samplesheet.py /path/to/data <pipeline> -o samplesheet.csv

The script:

Discovers FASTQ/BAM/CRAM files
Pairs R1/R2 reads
Infers sample metadata
Validates before writing

…

查看完整文档 ↗

anthropics装→

用自然语言检索并下载 NCBI GEO 的基因表达数据与相关条目

—装→

$ loading_