Analyze images using LLM vision APIs (Anthropic Claude, OpenAI GPT-4, Google Gemini, Azure OpenAI). Use when tasks require: (1) Understanding image content, (2) Describing visual elements, (3) Answering questions about images, (4) Comparing images, (5) Extracting text from images (OCR). Provides ready-to-use scripts - no custom code needed for simple cases.
复制安装指令,让 AI 自动完成配置 · 推荐新手
请帮我安装 askskill 上的 "image-vision" 技能: 1. 下载 https://raw.githubusercontent.com/microsoft/amplifier-bundle-skills/main/skills/image-vision/SKILL.md 2. 保存为 ~/.claude/skills/image-vision/SKILL.md 3. 装好后重载技能,告诉我可以用了
Analyze images using state-of-the-art LLM vision models. Use the provided scripts for most tasks - custom code only needed for advanced scenarios.
→ Read setup.md for one-time environment and API key setup
→ Use "Quick Start" canned scripts below
→ Read patterns.md for advanced patterns
→ Check setup.md for troubleshooting
ALWAYS use the wrapper scripts - they handle venv setup automatically:
# Simple analysis (auto-creates venv on first use)
./vision-analyze.sh <provider> <image_path> <prompt>
# Robust analysis (auto-fallback if provider times out)
./vision-analyze-robust.sh <image_path> <prompt> [timeout_seconds]
The wrapper scripts automatically:
Example usage:
# Analyze a UI screenshot (Anthropic Claude)
./vision-analyze.sh anthropic screenshot.png "Describe any UI bugs or issues you see"
# Extract text (Google Gemini - fastest)
./vision-analyze.sh gemini document.jpg "Extract all text from this image"
# Robust analysis with auto-fallback (tries Gemini → Anthropic → OpenAI)
./vision-analyze-robust.sh photo.png "Describe this image in detail"
# With custom timeout (default is 60 seconds)
./vision-analyze-robust.sh large-image.png "Analyze this" 120
If you need to call the Python scripts directly, you MUST use the venv Python:
# ❌ WRONG - uses system Python, will fail
python examples/anthropic-vision.py image.png "prompt"
# ✅ CORRECT - uses venv Python
./.venv/bin/python examples/anthropic-vision.py image.png "prompt"
For agents: Always use the wrapper scripts to avoid setup issues.
| Provider | Model | Best For | Speed | Cost |
|---|---|---|---|---|
| Anthropic | claude-sonnet-4-5 | Latest, balanced quality/speed | Fast | $$ |
| Anthropic | claude-3-opus | Highest quality (older) | Slow | $$$ |
| Anthropic | claude-3-haiku | Fastest, simple tasks | Very Fast | $ |
| OpenAI | gpt-5 | Latest flagship model | Fast | $$$ |
| OpenAI | gpt-4.1 | High-volume production | Fast | $$ |
| Gemini | gemini-2.5-flash | Latest, excellent balance | Very Fast | $ |
| Gemini | gemini-2.5-pro | Large images, best quality | Medium | $$ |
| Azure | (deployment-based) | Enterprise, compliance | Varies | Varies |
Max sizes:
# UI/UX Analysis - High-level layout and spacing
./vision-analyze.sh anthropic app-screenshot.png \
"Analyze this UI for accessibility issues and suggest improvements"
# Bug Identification (use robust for auto-fallback)
./vision-analyze-robust.sh error-state.png \
"What's wrong with this interface? Describe any visual bugs."
# Content Moderation
./vision-analyze.sh openai user-upload.jpg \
"Does this image contain inappropriate content? Yes or no, and explain."
# Document Understanding (Gemini is fastest)
./vision-analyze.sh gemini invoice.png \
"Extract the total amount, date, and vendor name from this invoice"
# Design Review - Layout, color, hierarchy (not typography details)
./vision-analyze-robust.sh mockup.png \
"Provide design feedback on this mockup. Consider layout, color hierarchy, and spacing."
…
Guide for creating new Amplifier modules including protocol implementation, entry points, mount functions, and testing patterns. Use when creating new modules or understanding module architecture.
Python coding standards for Amplifier including type hints, async patterns, error handling, and formatting. Use when writing Python code for Amplifier modules.
Adapt a skill written for another AI coding assistant (Claude Code, Cursor, etc.) into a properly structured Amplifier SKILL.md file. Reads the source skill, identifies platform-specific conventions, researches the source platform if needed, and produces an Amplifier-native skill conforming to the Agent Skills specification with Amplifier extensions. Use when the user wants to adapt a skill, port a skill, convert a skill to amplifier, translate a skill, or has a SKILL.md from another platform they want to bring into Amplifier.
Use when your service needs authentication that works without friction locally but secures remote access, automatic TLS certificate setup, or token-based auth with auto-generation and localhost bypass.
Use when building a new CLI tool that needs one-line install via uv or npm, subcommand dispatch with a default action, or 3-tier config resolution (CLI flags, config file, hardcoded defaults).
Amplifier design philosophy using Linux kernel metaphor. Covers mechanism vs policy, module architecture, event-driven design, and kernel principles. Use when designing new modules or making architectural decisions.