帮助排查 Linux 上由 systemd 管理的 EdenFS 服务故障与重启问题
该技能材料显示其为开源的纯提示/文档型技能,不要求密钥,也未声明远程端点,整体风险较低。内容主要提供 EdenFS/systemd 故障排查指引,但包含本地诊断命令与日志路径,使用时仍应注意最小权限和避免在生产环境误操作。
材料明确写明无需要的密钥或环境变量,未见要求输入 token、cookie 或长期凭证,也未见明显凭证收集/外传描述。
未声明远程端点,系统检查项也标记为 prompt-only;README 仅提到可供人工查看的链接/监控查询,没有证据表明该技能自身会主动把用户数据发送到外部服务。
作为 Skill 材料,其核心是排障说明与命令建议;虽然文档列出 systemctl、eden status、machinectl shell 等本地命令,但没有证据表明该技能自带可执行组件或会自动执行代码。
文档引导查看本地系统状态与日志,如 `/var/facebook/logs/edenfs_upgrade.log`、`/var/log/messages` 以及 systemd/eden 状态信息;这属于排障场景下的常规本地数据读取,但可能涉及系统与用户环境细节,应按需最小化访问。
来源为 GitHub 上的开源仓库 `facebook/sapling`,可审计性较好,且系统已标记 open-source;虽然 star 数与维护状态信息不足、许可证未声明,但当前材料未见闭源外发、伪装来源或明显供应链红旗。
复制安装指令,让 AI 自动完成配置 · 推荐新手
请帮我安装 askskill 上的 "edenfs-systemd-triage" 技能: 1. 下载 https://raw.githubusercontent.com/facebook/sapling/main/eden/.llms/skills/edenfs-systemd-triage/SKILL.md 2. 保存为 ~/.claude/skills/edenfs-systemd-triage/SKILL.md 3. 装好后重载技能,告诉我可以用了
请帮我排查这台 Linux devserver 上的 EdenFS 为什么挂了。请检查 systemd 管理状态、最近失败原因、是否发生过自动重启,并给出下一步排查建议。
输出 EdenFS 当前服务状态、关键 systemd 事件、可能根因以及可执行的排查建议。
请根据 systemd 和 EdenFS 相关信息,整理这台机器上 EdenFS 的生命周期时间线,包括启动、停止、重启、升级和异常事件,并说明它是如何进入当前状态的。
输出按时间排序的事件时间线,并总结导致当前状态的关键转折点。
请检查 edenfs_upgrade、edenfs_restarter 和相关 systemd timer/service 是否异常,分析最近的触发记录、失败信息,以及它们是否导致 EdenFS 重启或不可用。
输出升级与定时任务的健康检查结果、异常记录,以及它们对 EdenFS 可用性的影响分析。
This skill helps EdenFS team members understand, monitor, and triage systemd-managed EdenFS.
Systemd manages the EdenFS daemon lifecycle on Linux (devservers and OnDemands). When EdenFS exits unexpectedly (crash, OOM-kill), systemd automatically restarts it — no user intervention needed.
Two systemd components to know:
| Component | What it does | Scope |
|---|---|---|
[email protected] | User-scoped service managing the edenfs daemon lifecycle (auto-restart on failure) | Per-user (systemctl --user) |
edenfs_upgrade.timer | System-scoped hourly timer that runs edenfs_restarter to upgrade edenfs gracefully | System-wide (systemctl) |
Config gate: [experimental] systemd-managed-lifecycle = true in eden config.
Read the reference file that matches your need:
| Need | Reference File | When to Read |
|---|---|---|
| Understand the architecture | references/architecture.md | How systemd-managed EdenFS works, service unit file, lifecycle operations |
| Check health & monitor | references/monitoring.md | Scuba queries, dashboards, success rate metrics, rollout monitoring |
| Build a lifecycle timeline | references/timeline.md | Reconstruct chronological EdenFS events from Scuba + local logs + systemd properties to understand how the system reached its current state |
| Triage a specific failure | references/triage-playbook.md | Step-by-step procedures for common failure scenarios |
| Identify known failure patterns | references/common-failures.md | Error signatures, root causes, and fixes for known issues |
When triaging an EdenFS systemd issue, start here:
eden config | grep systemd-managededen status --debug/var/facebook/logs/edenfs_upgrade.log and /var/log/messagesIf you need to run commands on a user's machine via sush, you cannot use su — you must use:
machinectl shell <username>@.host /usr/local/bin/eden status --debug
These are read-only and safe to execute automatically:
# Check if systemd-managed
eden config | grep systemd-managed
# Full service status with systemd details
eden status --debug
# Check eden version mismatch
eden version
# Check systemd service properties (restart policy, crash counters, timestamps)
systemctl --user show edenfs@home-$(whoami)-local-.eden.service \
--property=Id,Type,Restart,RestartUSec,StartLimitIntervalUSec,StartLimitBurst,NRestarts,ExecMainStartTimestamp,ExecMainPID,ExecMainCode,ExecMainStatus,ActiveState,SubState,Result,InvocationID,ActiveEnterTimestamp,ActiveExitTimestamp,InactiveEnterTimestamp,InactiveExitTimestamp
# Check eden logs for recent errors
eden debug log | tail -50
# Check startup log
cat <state_dir>/.edenfs_startup.log
# Check system messages for edenfs service events
grep 'edenfs@' /var/log/messages | tail -20
# Check edenfs_upgrade logs
tail -50 /var/facebook/logs/edenfs_upgrade.log
# Check kernel OOM kills
dmesg | grep -i -E '(edenfs|oom|killed)' | tail -20
# Check dbus connectivity (needed for systemctl --user)
python3 -c "import socket; s=socket.socket(socket.AF_UNIX,socket.SOCK_STREAM); s.settimeout(1); s.connect('/run/user/$(id -u)/bus'); print('alive'); s.close()"
# Check linger (needed for user services to persist)
loginctl show-user $(whoami) --property=Linger
CRITICAL: Scuba uses the short hostname (e.g., devvm21611.cco0), NOT the FQDN returned by hostname (e.g., devvm21611.cco0.facebook.com). Always strip the .facebook.com suffix:
hostname | sed 's/\.facebook\.com$//'
…