$ ~/registry/skill/openai-curated-transcribe

SKILL

transcribe

Name: transcribe
Author: OpenAI

将音频或视频中的语音转成文字，并可区分说话人与整理访谈记录。

星标

★ 23,132

来源

GitHub

更新于

2026-07-21

// 安全评估需留意

仅提示词，不执行代码
开源可审计
社区验证· 22.0k

总评

整体风险较低。该技能来自高可信开源仓库且社区采用广，但文档中提到会调用 OpenAI 转写 CLI/API、读取本地音频并生成输出，因此相关维度更适合评为需留意而非高风险。

凭证密钥需留意

元数据称“无密钥/环境变量”，但 README 明确要求设置 OPENAI_API_KEY 才能进行实时 API 调用；该密钥属于敏感凭证，若在本机环境中配置不当可能被其他进程或日志暴露。文档同时明确不要在聊天中粘贴完整密钥，降低了直接泄露风险。

网络外发需留意

材料显示该技能用于“using OpenAI”进行转写，说明音频内容及可选说话人参考音频在启用实时调用时会发送至 OpenAI 服务。未声明其他第三方端点，且来源仓库可信，因此属于与声明功能一致的常规外发风险。

代码执行需留意

README 指示通过 python3 运行 bundled CLI（transcribe_diarize.py），并在缺失依赖时安装 openai 包；这意味着会在本机执行脚本并可能触发依赖安装。此类本地进程执行属于工具常规能力，材料中未见请求越权系统权限的红旗。

数据访问需留意

该技能需要读取用户提供的音频/视频文件路径、可选的已知说话人参考文件，并将结果写入 output/transcribe/ 或用户指定输出路径。访问范围与其转写/分离说话人功能相匹配，未见要求广泛扫描无关目录或过度授权。

来源供应链低风险

来源为 GitHub 上的 openai/skills 开源仓库，具备可审计源码且社区采用度高（约 2.2 万 star），这是明显的正面信号。许可证与维护状态在所给材料中未明确，但不足以抵消其官方/高信任来源带来的低风险判断。

安全建议

仅在确需实时转写时配置 OPENAI_API_KEY，并通过本机环境变量安全注入，避免写入共享脚本或日志。
在处理敏感录音前，确认组织是否允许将音频及说话人参考样本发送至 OpenAI 服务。
优先固定依赖版本并审查 transcribe_diarize.py 与相关引用文档，减少安装时的供应链不确定性。
将输出目录限制在受控工作区，避免覆盖或混入其他项目数据。

审计模型: gpt-5.4 · 2026-06-16

// 安装

复制安装指令，让 AI 自动完成配置 · 推荐新手

请帮我安装 askskill 上的 "transcribe" 技能：
1. 下载 https://raw.githubusercontent.com/openai/skills/main/skills/.curated/transcribe/SKILL.md
2. 保存为 ~/.claude/skills/transcribe/SKILL.md
3. 装好后重载技能，告诉我可以用了

// 下载

下载 SKILL.md机读安装清单 ↗

// 文档

Audio Transcribe

Transcribe audio using OpenAI, with optional speaker diarization when requested. Prefer the bundled CLI for deterministic, repeatable runs.

Workflow

Collect inputs: audio file path(s), desired response format (text/json/diarized_json), optional language hint, and any known speaker references.
Verify OPENAI_API_KEY is set. If missing, ask the user to set it locally (do not ask them to paste the key).
Run the bundled transcribe_diarize.py CLI with sensible defaults (fast text transcription).
Validate the output: transcription quality, speaker labels, and segment boundaries; iterate with a single targeted change if needed.
Save outputs under output/transcribe/ when working in this repo.

Decision rules

Default to gpt-4o-mini-transcribe with --response-format text for fast transcription.
If the user wants speaker labels or diarization, use --model gpt-4o-transcribe-diarize --response-format diarized_json.
If audio is longer than ~30 seconds, keep --chunking-strategy auto.
Prompting is not supported for gpt-4o-transcribe-diarize.

Output conventions

Use output/transcribe/<job-id>/ for evaluation runs.
Use --out-dir for multiple files to avoid overwriting.

Dependencies (install if missing)

Prefer uv for dependency management.

uv pip install openai

If uv is unavailable:

python3 -m pip install openai

Environment

OPENAI_API_KEY must be set for live API calls.
If the key is missing, instruct the user to create one in the OpenAI platform UI and export it in their shell.
Never ask the user to paste the full key in chat.

Skill path (set once)

export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
export TRANSCRIBE_CLI="$CODEX_HOME/skills/transcribe/scripts/transcribe_diarize.py"

User-scoped skills install under $CODEX_HOME/skills (default: ~/.codex/skills).

CLI quick start

Single file (fast text default):

python3 "$TRANSCRIBE_CLI" \
  path/to/audio.wav \
  --out transcript.txt

Diarization with known speakers (up to 4):

python3 "$TRANSCRIBE_CLI" \
  meeting.m4a \
  --model gpt-4o-transcribe-diarize \
  --known-speaker "Alice=refs/alice.wav" \
  --known-speaker "Bob=refs/bob.wav" \
  --response-format diarized_json \
  --out-dir output/transcribe/meeting

Plain text output (explicit):

python3 "$TRANSCRIBE_CLI" \
  interview.mp3 \
  --response-format text \
  --out interview.txt

Reference map

references/api.md: supported formats, limits, response formats, and known-speaker notes.

OpenAI装→

// 功能相似

MCP 工具