$ ~/registry/skill/microsoft-image-vision

SKILL

image-vision

Name: image-vision
Author: microsoft

用多模型视觉能力分析图片内容、提取文字并回答图像相关问题。

星标

★ 10

来源

GitHub

更新于

2026-07-20

// 安全评估低风险

仅提示词，不执行代码
开源可审计

正在进行安全审计…

凭证密钥
网络外发
代码执行
数据访问
来源供应链

// 安装

复制安装指令，让 AI 自动完成配置 · 推荐新手

请帮我安装 askskill 上的 "image-vision" 技能：
1. 下载 https://raw.githubusercontent.com/microsoft/amplifier-bundle-skills/main/skills/image-vision/SKILL.md
2. 保存为 ~/.claude/skills/image-vision/SKILL.md
3. 装好后重载技能，告诉我可以用了

// 下载

下载 SKILL.md机读安装清单 ↗

// 用法示例

图片内容解析

输入

请分析这张图片的主要内容，列出关键对象、场景环境、可见文字，以及它可能传达的主题。

预期产出

返回结构化的图像描述，包括主要元素、文字内容和主题判断。

图片文字提取

输入

请从这张图片中提取所有可见文字，按阅读顺序输出，并标注可能不清晰的部分。

预期产出

输出按顺序整理的 OCR 文字结果，并提示识别不确定的内容。

多图差异比较

输入

请比较这两张图片的差异，从布局、颜色、文字、对象变化四个方面总结，并指出最明显的不同。

预期产出

生成清晰的差异对比结果，便于审阅版本变化或设计修改。

// 文档

Image Vision Analysis

Overview

Analyze images using state-of-the-art LLM vision models. Use the provided scripts for most tasks - custom code only needed for advanced scenarios.

Workflow Decision Tree

First time using this skill?

→ Read setup.md for one-time environment and API key setup

Simple image analysis (most common)

→ Use "Quick Start" canned scripts below

Batch processing or multi-turn conversations

→ Read patterns.md for advanced patterns

Something failing?

→ Check setup.md for troubleshooting

Quick Start (Use Wrapper Scripts)

ALWAYS use the wrapper scripts - they handle venv setup automatically:

# Simple analysis (auto-creates venv on first use)
./vision-analyze.sh <provider> <image_path> <prompt>

# Robust analysis (auto-fallback if provider times out)
./vision-analyze-robust.sh <image_path> <prompt> [timeout_seconds]

The wrapper scripts automatically:

Create venv if it doesn't exist
Install required SDKs
Use venv Python (no manual activation needed)
Handle errors gracefully

Example usage:

# Analyze a UI screenshot (Anthropic Claude)
./vision-analyze.sh anthropic screenshot.png "Describe any UI bugs or issues you see"

# Extract text (Google Gemini - fastest)
./vision-analyze.sh gemini document.jpg "Extract all text from this image"

# Robust analysis with auto-fallback (tries Gemini → Anthropic → OpenAI)
./vision-analyze-robust.sh photo.png "Describe this image in detail"

# With custom timeout (default is 60 seconds)
./vision-analyze-robust.sh large-image.png "Analyze this" 120

Advanced: Direct Script Usage (Not Recommended)

If you need to call the Python scripts directly, you MUST use the venv Python:

# ❌ WRONG - uses system Python, will fail
python examples/anthropic-vision.py image.png "prompt"

# ✅ CORRECT - uses venv Python
./.venv/bin/python examples/anthropic-vision.py image.png "prompt"

For agents: Always use the wrapper scripts to avoid setup issues.

Provider Comparison

Provider	Model	Best For	Speed	Cost
Anthropic	claude-sonnet-4-5	Latest, balanced quality/speed	Fast	$$
Anthropic	claude-3-opus	Highest quality (older)	Slow	$$$
Anthropic	claude-3-haiku	Fastest, simple tasks	Very Fast	$
OpenAI	gpt-5	Latest flagship model	Fast	$$$
OpenAI	gpt-4.1	High-volume production	Fast	$$
Gemini	gemini-2.5-flash	Latest, excellent balance	Very Fast	$
Gemini	gemini-2.5-pro	Large images, best quality	Medium	$$
Azure	(deployment-based)	Enterprise, compliance	Varies	Varies

Supported Image Formats

JPEG/JPG - Most common
PNG - With transparency
GIF - Static or animated
WEBP - Modern format

Max sizes:

Anthropic: 5MB per image
OpenAI: 20MB (auto-resizes)
Gemini: Varies by model (1.5 pro handles very large)

Common Use Cases

# UI/UX Analysis - High-level layout and spacing
./vision-analyze.sh anthropic app-screenshot.png \
  "Analyze this UI for accessibility issues and suggest improvements"

# Bug Identification (use robust for auto-fallback)
./vision-analyze-robust.sh error-state.png \
  "What's wrong with this interface? Describe any visual bugs."

# Content Moderation
./vision-analyze.sh openai user-upload.jpg \
  "Does this image contain inappropriate content? Yes or no, and explain."

# Document Understanding (Gemini is fastest)
./vision-analyze.sh gemini invoice.png \
  "Extract the total amount, date, and vendor name from this invoice"

# Design Review - Layout, color, hierarchy (not typography details)
./vision-analyze-robust.sh mockup.png \
  "Provide design feedback on this mockup. Consider layout, color hierarchy, and spacing."