Try an interactive version of this dialog: Sign up at solve.it.com, click Upload, and pass this URL.

sync.py

Idempotent host synchronization/status script for the devbox workstation.

This dialog is the literate source for host/sync.py: each section explains the shape of the program first, then provides the corresponding function signatures and interface skeletons.

Overview

host/sync.py is the host-side control plane for the devbox workstation. It is not a general provisioning framework and not a Python package installer. It is a small reconciliation script whose job is to compare a real Ubuntu host with the desired devbox state, report the delta, and optionally converge the host toward that state.

The desired workstation has two cooperating identities:

dev@box  -> durable host work user; owns files, secrets, Docker daemon access, SSH agent state
dev@dev  -> disposable Docker container shell; consumes host state through mounts and volumes

The central design rule is: host owns durable state; container is disposable execution layer. sync.py therefore prepares the host user, host directories, host dotfiles, local config, container runtime files, and the systemd unit that manages the Compose service.

The command surface is intentionally small:

python host/sync.py                 # same as status
python host/sync.py status          # report only; no mutation
python host/sync.py enable          # converge host state and enable systemd service
python host/sync.py start           # converge enable-level state, then start service
python host/sync.py restart         # converge enable-level state, then restart service

Later, the same implementation can sit behind:

devbox sync [status|enable|start|restart]

The implementation is organized as a tiny desired-state engine:

%%{init: {"theme": "dark"}}%%
flowchart LR
    cli[CLI<br>args/env/git<br>identity] --> cfg[config<br>dict]
    cfg --> target[target<br>groups]
    target --> plan[operation<br>plan]
    plan --> runner[operation<br>runner]
    runner --> recon[ensure_*<br>reconcilers]
    recon --> report[report<br>+ next steps]

Each operation follows the same idempotent pattern:

%%{init: {"theme": "dark"}}%%
flowchart LR
    A[inspect<br>current<br>state] --> B{desired state<br>already true?}
    B -- yes --> OK[ok]
    B -- no --> C{apply?}
    C -- no --> W[would_change]
    C -- yes --> M[mutate]
    M --> V[report<br>changed / warn / error]

The most important engineering constraint is that status and mutating commands use the same planning and reconciliation path. The only difference is apply=False versus apply=True. This avoids the classic drift where dry-run logic says one thing and real setup logic does another.

constants

This section defines the stable names and paths used throughout the script. These constants are intentionally few and domain-specific: they make later tables and operations readable without introducing a configuration framework.

The defaults describe the canonical MVP workstation: host user dev, UID/GID 1111, container hostname dev, installed runtime directory /home/dev/.config/dev-env, and systemd service dev-container.service.

imports and module constants

The script should use only the Python standard library. These imports cover argument parsing, filesystem operations, subprocess execution, user/group inspection, atomic writes, and small template rendering.

from pathlib import Path
import argparse, grp, os, pwd, shutil, subprocess, sys, tempfile
from string import Template

The constants below are defaults, not hidden magic: most can be overridden from CLI or environment during config construction, but the rest of the program can refer to these names when building the desired-state tables.

ROOT = Path(__file__).resolve().parents[1]
DEV = "dev"
ID = 1111
CTHOST = "dev"
HOME = Path("/home/dev")
RUN = HOME/".config/dev-env"
APP = "dev-container.service"

desired-state tables

This section is the declarative heart of the script. Instead of scattering desired state through imperative setup code, we keep it in short tables: packages, commands, directories, managed files, rendered files, and create-once local files.

These tables are not applied directly. They are later converted into operation dictionaries by plan(c, groups). Keeping the data separate from the operation runner makes the script easier to inspect, easier to test, and naturally idempotent.

packages and commands

PKGS lists apt packages that should be present on the host. CMDS lists command names that should resolve on PATH after package installation.

The distinction is useful because packages and commands are related but not identical: for example, the Debian package may be named fd-find, while the command may be fdfind.

PKGS = [
    "ca-certificates", "curl", "git", "gh", "make", "zsh", "openssh-client", "sudo",
    "docker.io", "docker-compose-v2", "python3", "fzf", "ripgrep", "fd-find",
    "bat", "tree",
]
CMDS = [
    "curl", "git", "gh", "make", "zsh", "ssh", "sudo", "docker", "python3",
    "fzf", "rg", "fdfind", "batcat", "tree",
]

directories

Directories are represented as small dictionaries so ownership and permissions are part of desired state, not incidental cleanup. The reconciler should create missing directories and repair ownership/mode when needed.

All paths here are host paths for the dev identity.

DIRS = [
    {"path": HOME/"git", "owner": "dev:dev", "mode": 0o755},
    {"path": HOME/"pj", "owner": "dev:dev", "mode": 0o755},
    {"path": HOME/".config", "owner": "dev:dev", "mode": 0o755},
    {"path": RUN, "owner": "dev:dev", "mode": 0o755},
    {"path": HOME/".oh-my-zsh/custom/dev", "owner": "dev:dev", "mode": 0o755},
    {"path": HOME/".ssh", "owner": "dev:dev", "mode": 0o700},
]

shell framework repos

Oh My Zsh, Powerlevel10k, and Zsh plugins are installed on the host as ordinary cloned upstream distributions. The sync script only ensures they exist at the expected locations for the dev user. It does not update them on every run; updates remain an explicit user action through normal Oh My Zsh/plugin workflows.

OMZ_REPOS = [
    {"url": "https://github.com/ohmyzsh/ohmyzsh.git", "dst": HOME/".oh-my-zsh", "owner": "dev:dev"},
    {"url": "https://github.com/romkatv/powerlevel10k.git", "dst": HOME/".oh-my-zsh/custom/themes/powerlevel10k", "owner": "dev:dev"},
    {"url": "https://github.com/zsh-users/zsh-autosuggestions", "dst": HOME/".oh-my-zsh/custom/plugins/zsh-autosuggestions", "owner": "dev:dev"},
    {"url": "https://github.com/zsh-users/zsh-syntax-highlighting.git", "dst": HOME/".oh-my-zsh/custom/plugins/zsh-syntax-highlighting", "owner": "dev:dev"},
]

managed, rendered, and create-once files

Files are split into three classes:

  • FILES: managed exact copies from the repo into the host/runtime locations;
  • RENDER: managed files whose content is produced from a template and config values;
  • ONCE: local files created only if missing and never overwritten afterward.

This split is one of the main idempotency safeguards. The script can be strict about files it owns, while being conservative around local state and secrets.

FILES = [
    {"src": ROOT/"dotfiles/dev/.zshrc", "dst": HOME/".zshrc", "owner": "dev:dev", "mode": 0o644},
    {"src": ROOT/"dotfiles/dev/.p10k.zsh", "dst": HOME/".p10k.zsh", "owner": "dev:dev", "mode": 0o644},
    {"src": ROOT/"container/Dockerfile", "dst": RUN/"Dockerfile", "owner": "dev:dev", "mode": 0o644},
    {"src": ROOT/"container/docker-compose.yml", "dst": RUN/"docker-compose.yml", "owner": "dev:dev", "mode": 0o644},
    {"src": ROOT/"container/README.md", "dst": RUN/"README.md", "owner": "dev:dev", "mode": 0o644},
]
ZSH_FILES = [
    "10-rc.zsh", "aliases.zsh", "dev.zsh", "history.zsh",
    "_docker.zsh", "_git.zsh", "_python.zsh", "i.zsh",
    "dev/10-rc.zsh",
]
RENDER = [
    {"src": ROOT/"dotfiles/dev/.gitconfig", "dst": HOME/".gitconfig", "owner": "dev:dev", "mode": 0o644},
]
ONCE = [
    {"src": ROOT/"templates/env.zsh.example", "dst": HOME/".oh-my-zsh/custom/env.zsh", "owner": "dev:dev", "mode": 0o600},
    {"src": ROOT/"templates/ssh_config.example", "dst": HOME/".ssh/config", "owner": "dev:dev", "mode": 0o600},
    {"src": None, "dst": HOME/".zsh_history", "owner": "dev:dev", "mode": 0o600, "content": ""},
]

cli/config

This section turns command-line arguments, environment variables, optional global Git identity, and interactive prompts into one plain config dictionary.

The config dictionary is the only object later stages need. It carries both user choices, such as name and email, and derived values, such as apply, home, run, and selected command target.

command-line interface

The CLI has one optional positional command. If omitted, it defaults to status. Options provide explicit identity/config values and small execution controls.

In the future, this implementation may sit behind devbox sync; during MVP development it remains directly runnable as python host/sync.py.

def cli(argv=None) -> argparse.Namespace:
    """Parse command-line arguments."""
    p = argparse.ArgumentParser(
        prog="devbox sync",
        description="Check or converge the devbox host workstation state.",
    )
    p.add_argument(
        "cmd", nargs="?", default="status",
        choices=["status", "enable", "start", "restart"],
        help="sync action; default: status",
    )
    p.add_argument("--name", help="Git/user display name for rendered config")
    p.add_argument("--email", help="Git/user email for rendered config")
    p.add_argument("--dev", default=os.environ.get("DEV", DEV), help=f"work user name; default: {DEV}")
    p.add_argument("--id", type=int, default=int(os.environ.get("ID", ID)), help=f"UID/GID; default: {ID}")
    p.add_argument("--cthost", default=os.environ.get("CTHOST", CTHOST), help=f"container hostname; default: {CTHOST}")
    p.add_argument("--root", type=Path, default=ROOT, help="repo root; default: inferred from this file")
    p.add_argument("-y", "--yes", action="store_true", help="assume yes for package/system mutations")
    p.add_argument("-v", "--verbose", action="store_true", help="print extra command details")
    p.add_argument("--ssh-sign-key", default=None,
        help="path to existing SSH signing private key (sign); sign.pub inferred")
    p.add_argument("--ssh-private-key", default=None,
        help="path to existing SSH auth private key to copy")
    p.add_argument("--generate-ssh-keys", action="store_true",
        help="force ssh-keygen instead of importing")
    return p.parse_args(argv)

config construction

cfg resolves all inputs into a single dictionary. Human-specific values come from CLI args, environment variables, global Git config, or an interactive prompt when available. Non-human values use stable defaults unless explicitly overridden.

If name or email cannot be found in a noninteractive context, config construction should fail clearly rather than guessing.

def cfg(a: argparse.Namespace, env=os.environ) -> dict:
    """Build the effective sync configuration."""
    dev_home = Path(f"/home/{a.dev}")
    gu = git_user(home=dev_home)
    name  = a.name  or gu.get("name")  or env.get("NAME")
    email = a.email or gu.get("email") or env.get("EMAIL")

    if not name and sys.stdin.isatty():
        name = input("Name for Git config: ").strip() or None
    if not email and sys.stdin.isatty():
        email = input("Email for Git config: ").strip() or None
    if not name or not email:
        raise SystemExit("NAME and EMAIL are required: pass --name/--email, set env vars, or configure global git user.name/user.email")

    home = dev_home
    ssh_sign_key = a.ssh_sign_key or os.environ.get("SSH_SIGN_KEY")
    ssh_private_key = a.ssh_private_key or os.environ.get("SSH_PRIVATE_KEY")
    ssh_generate = a.generate_ssh_keys

    if not ssh_generate and not ssh_sign_key and not ssh_private_key:
        if not (home / ".ssh" / "sign.pub").exists() and sys.stdin.isatty():
            ans = input("SSH signing private key path (or empty to generate new): ").strip()
            if ans:
                ssh_sign_key = ans
        if not (home / ".ssh" / "dev").exists() and sys.stdin.isatty():
            ans = input("SSH auth private key path (or empty to generate new): ").strip()
            if ans:
                ssh_private_key = ans

    return {
        "cmd": a.cmd,
        "apply": a.cmd != "status",
        "name": name,
        "email": email,
        "dev": a.dev,
        "id": a.id,
        "gid": a.id,
        "cthost": a.cthost,
        "home": home,
        "root": a.root,
        "run": home/".config/dev-env",
        "app": APP,
        "yes": a.yes,
        "verbose": a.verbose,
        "ssh_sign_key": ssh_sign_key,
        "ssh_private_key": ssh_private_key,
        "generate_ssh_keys": ssh_generate,
    }
def git_user(home: Path | None = None) -> dict:
    """Return Git user.name/user.email from global or explicit home config."""
    r = {}
    for key, out_key in [("user.name", "name"), ("user.email", "email")]:
        if home:
            cp = cmd(["git", "config", "--file", str(home/".gitconfig"), key])
        else:
            cp = cmd(["git", "config", "--global", key])
        if cp.returncode == 0 and cp.stdout.strip():
            r[out_key] = cp.stdout.strip()
    return r

planning

Planning maps a command target to resource groups, then expands those groups into operation dictionaries. This is where high-level intent becomes concrete desired state.

The planner should not inspect the machine or mutate anything. It should only build a clear list of operations for the operation runner.

target groups

Each command selects a set of resource groups. status uses the same planning machinery as mutating commands, but later runs with apply=False. start and restart deliberately include enable-level convergence before touching runtime service state.

def target(cmd: str) -> list[str]:
    """Return resource groups selected by a sync command."""
    return {
        "status":  ["host", "systemd", "validate"],
        "enable":  ["host", "systemd", "enable", "validate"],
        "start":   ["host", "systemd", "enable", "validate", "start"],
        "restart": ["host", "systemd", "enable", "validate", "restart"],
    }[cmd]

operation plan

plan expands the selected groups into operation dictionaries. The exact operation format is intentionally simple: each dict has a kind, a human-readable name or enough fields to derive one, and kind-specific parameters.

The operation runner dispatches by kind, so adding new resource types later should not disturb existing reconcilers.

def plan(c: dict, groups: list[str]) -> list[dict]:
    """Build operation dictionaries for the selected resource groups."""
    ops = []
    home, root, run = c["home"], c["root"], c["run"]
    dev, id = c["dev"], c["id"]
    own = f"{dev}:{dev}"

    if "host" in groups:
        ops.append({"kind": "pkgs", "name": "apt packages", "pkgs": PKGS})
        ops += [{"kind": "cmd", "name": f"command {x}", "cmd": x} for x in CMDS]
        ops.append({"kind": "group", "name": f"group {dev}", "group": dev, "gid": id})
        ops.append({"kind": "user", "name": f"user {dev}", "user": dev, "uid": id, "gid": id, "home": home, "shell": "/usr/bin/zsh", "groups": ["sudo", "docker"]})

        omz_repos = [{**r, "dst": Path(str(r["dst"]).replace(str(HOME), str(home))), "owner": own} for r in OMZ_REPOS]
        ops += [{"kind": "git_repo", "name": f"repo {r['dst']}", **r} for r in omz_repos]

        dirs = [{**d, "path": Path(str(d["path"]).replace(str(HOME), str(home))), "owner": own} for d in DIRS]
        ops += [{"kind": "dir", "name": f"dir {d['path']}", **d} for d in dirs]
        files = [
            {"src": root/"dotfiles/dev/.zshrc", "dst": home/".zshrc", "owner": own, "mode": 0o644},
            {"src": root/"dotfiles/dev/.p10k.zsh", "dst": home/".p10k.zsh", "owner": own, "mode": 0o644},
            {"src": root/"container/Dockerfile", "dst": run/"Dockerfile", "owner": own, "mode": 0o644},
            {"src": root/"container/docker-compose.yml", "dst": run/"docker-compose.yml", "owner": own, "mode": 0o644},
            {"src": root/"container/README.md", "dst": run/"README.md", "owner": own, "mode": 0o644},
        ]
        zsrc = root/"dotfiles/dev/.oh-my-zsh/custom"
        zdst = home/".oh-my-zsh/custom"
        files += [{"src": zsrc/x, "dst": zdst/x, "owner": own, "mode": 0o644} for x in ZSH_FILES]
        ops += [{"kind": "file", "name": f"file {f['dst']}", **f} for f in files]

        renders = [{"src": root/"dotfiles/dev/.gitconfig", "dst": home/".gitconfig", "owner": own, "mode": 0o644}]
        ops += [{"kind": "render", "name": f"render {r['dst']}", **r} for r in renders]

        once = [
            {"src": root/"templates/env.zsh.example", "dst": home/".oh-my-zsh/custom/env.zsh", "owner": own, "mode": 0o600},
            {"src": root/"templates/ssh_config.example", "dst": home/".ssh/config", "owner": own, "mode": 0o600},
            {"src": None, "dst": home/".zsh_history", "owner": own, "mode": 0o600, "content": ""},
        ]
        ops += [{"kind": "once", "name": f"create-once {o['dst']}", **o} for o in once]

        ops.append({"kind": "ssh_keys", "name": "SSH keys for dev user"})
        ops.append({"kind": "repo_copy", "name": "copy repo to dev home"})

    if "systemd" in groups:
        ops.append({"kind": "unit", "name": f"unit {c['app']}", "path": Path("/etc/systemd/system")/c["app"]})
    if "enable" in groups:
        ops.append({"kind": "svc", "name": f"enable {c['app']}", "svc": c["app"], "state": "enabled"})
    if "validate" in groups:
        ops.append({"kind": "compose", "name": "docker compose config", "cwd": run})
    if "start" in groups:
        ops.append({"kind": "svc", "name": f"start {c['app']}", "svc": c["app"], "state": "started"})
    if "restart" in groups:
        ops.append({"kind": "svc", "name": f"restart {c['app']}", "svc": c["app"], "state": "restarted"})
    return ops

operation runner

The operation runner is the small engine of the script. It receives operation dictionaries from the planner, dispatches each one to the appropriate reconciler, and returns result dictionaries for reporting.

The key rule is that status and mutating commands use the same operation path. The only behavioral difference is the apply boolean passed through to reconcilers.

result helpers

Result helpers keep the reconcilers short and uniform. Every reconciler returns the same result shape: a name, a state, and optional detail. chg(apply) is small sugar for the common changed versus would_change distinction.

def res(name: str, state: str = "ok", detail: str = "") -> dict:
    """Return a standard operation result dictionary."""
    return {"name": name, "state": state, "detail": detail}
def chg(apply: bool) -> str:
    """Return the mutation state appropriate for apply/dry-run mode."""
    return "changed" if apply else "would_change"

operation dispatch

Dispatch is deliberately table-driven. Each operation dict carries a kind, and op uses that kind to select the matching reconciler. Unknown operation kinds should produce an error result, because that indicates a programming mistake in the planner.

def op(o: dict, c: dict, apply: bool) -> dict:
    """Run one operation by dispatching on o['kind']."""
    f = {
        "pkgs": ensure_pkgs,
        "cmd": ensure_cmd,
        "group": ensure_group,
        "user": ensure_user,
        "dir": ensure_dir,
        "git_repo": ensure_git_repo,
        "file": ensure_file,
        "render": ensure_render,
        "once": ensure_once,
        "unit": ensure_unit,
        "svc": ensure_svc,
        "compose": check_compose,
        "ssh_keys": ensure_ssh_keys,
        "repo_copy": ensure_repo_copy,
    }.get(o.get("kind"))
    if not f:
        return res(o.get("name", "operation"), "error", f"unknown operation kind: {o.get('kind')}")
    return f(o, c, apply)

running operations

run_ops is the only loop that executes planned operations. It should catch unexpected exceptions around each operation and convert them into error results, so one bad step does not hide the rest of the status report.

def run_ops(ops: list[dict], c: dict, apply: bool) -> list[dict]:
    """Run all operations and return result dictionaries."""
    rs = []
    for o in ops:
        try:
            rs.append(op(o, c, apply))
        except Exception as e:
            rs.append(res(o.get("name", "operation"), "error", f"{type(e).__name__}: {e}"))
    return rs

ensure_* reconcilers

Reconcilers are the resource-specific pieces of the program. Each one follows the same idempotent pattern:

  1. inspect current state;
  2. return ok if desired state already holds;
  3. return would_change in status mode if a mutation would be needed;
  4. mutate only when apply=True;
  5. report changed, warn, or error with useful detail.

This section intentionally keeps the functions narrow. Files, users, packages, services, and Docker Compose validation each have different inspection/mutation mechanics, but they all share the same result contract.

packages and commands

Package reconciliation is batch-oriented: inspect all required apt packages, then install the missing set in one apt invocation when applying. Command reconciliation is lighter: it checks whether a command resolves on PATH and reports missing commands as warnings rather than hard errors.

def ensure_pkgs(o: dict, c: dict, apply: bool) -> dict:
    """Ensure required apt packages are installed."""
    miss = apt_missing(o["pkgs"])
    if not miss:
        return res(o["name"])
    if not apply:
        return res(o["name"], "would_change", "install: " + " ".join(miss))
    return apt_install(miss)
def ensure_cmd(o: dict, c: dict, apply: bool) -> dict:
    """Ensure a command is available on PATH."""
    return res(o["name"]) if need(o["cmd"]) else res(o["name"], "warn", "missing from PATH")

users and groups

User and group reconciliation handles the host dev identity. These functions must be careful around conflicts: if another user or group already owns the desired UID/GID, that is not something the script should silently repair. It should report a warning or error with clear detail.

Group membership repair belongs here too: the host dev user should be in sudo and docker.

def ensure_group(o: dict, c: dict, apply: bool) -> dict:
    """Ensure the dev group exists with the desired GID."""
    name, want = o["group"], o["gid"]
    try:
        g = grp.getgrnam(name)
        if g.gr_gid == want: return res(o["name"])
        return res(o["name"], "warn", f"exists with gid {g.gr_gid}, expected {want}")
    except KeyError:
        pass
    try:
        other = grp.getgrgid(want).gr_name
        return res(o["name"], "warn", f"gid {want} already used by group {other}")
    except KeyError:
        pass
    if not apply: return res(o["name"], "would_change", f"groupadd --gid {want} {name}")
    cp = cmd(["groupadd", "--gid", str(want), name])
    return res(o["name"], "changed" if cp.returncode == 0 else "error", cp.stderr.strip())
def ensure_user(o: dict, c: dict, apply: bool) -> dict:
    """Ensure the dev user exists with desired UID, group, home, shell, and memberships."""
    name, want = o["user"], o["uid"]
    groups = set(o.get("groups", []))
    try:
        u = pwd.getpwnam(name)
        details = []
        if u.pw_uid != want: details.append(f"uid {u.pw_uid} != {want}")
        if u.pw_gid != o["gid"]: details.append(f"gid {u.pw_gid} != {o['gid']}")
        if Path(u.pw_dir) != o["home"]: details.append(f"home {u.pw_dir} != {o['home']}")
        if u.pw_shell != o["shell"]: details.append(f"shell {u.pw_shell} != {o['shell']}")
        have = {g.gr_name for g in grp.getgrall() if name in g.gr_mem}
        missing = sorted(groups - have)
        if details: return res(o["name"], "warn", "; ".join(details))
        if not missing: return res(o["name"])
        if not apply: return res(o["name"], "would_change", "add groups: " + ",".join(missing))
        cp = cmd(["usermod", "-aG", ",".join(missing), name])
        return res(o["name"], "changed" if cp.returncode == 0 else "error", cp.stderr.strip())
    except KeyError:
        pass
    try:
        other = pwd.getpwuid(want).pw_name
        return res(o["name"], "warn", f"uid {want} already used by user {other}")
    except KeyError:
        pass
    xs = ["useradd", "--uid", str(want), "--gid", str(o["gid"]), "--create-home", "--home-dir", str(o["home"]), "--shell", o["shell"], "--groups", ",".join(o["groups"]), name]
    if not apply: return res(o["name"], "would_change", " ".join(xs))
    cp = cmd(xs)
    return res(o["name"], "changed" if cp.returncode == 0 else "error", cp.stderr.strip())

shell framework repositories

Shell framework repositories are upstream distributions installed into the dev user's Oh My Zsh tree. The reconciler only clones them if missing. If a target path exists but is not a Git checkout, it reports a warning instead of deleting or overwriting local state.

def ensure_git_repo(o: dict, c: dict, apply: bool) -> dict:
    """Ensure an upstream Git repository exists at the desired path."""
    dst = o["dst"]
    if dst.exists():
        if (dst/".git").is_dir():
            return res(o["name"])
        return res(o["name"], "warn", "exists but is not a git checkout; inspect manually")
    if not apply:
        return res(o["name"], "would_change", f"clone {o['url']}")
    dst.parent.mkdir(parents=True, exist_ok=True)
    u, g = ids(o["owner"])
    os.chown(dst.parent, u, g)
    cp = cmd(["sudo", "-H", "-u", o["owner"].partition(":")[0], "git", "clone", "--depth=1", o["url"], str(dst)])
    if cp.returncode != 0:
        return res(o["name"], "error", cp.stderr.strip())
    for p in [dst, *dst.rglob("*")]:
        os.chown(p, u, g)
    return res(o["name"], "changed")

directories

Directory reconciliation creates missing directories and repairs owner/mode when needed. It should treat an existing non-directory path as an error, because replacing arbitrary user data would be destructive.

def ensure_dir(o: dict, c: dict, apply: bool) -> dict:
    """Ensure a directory exists with desired ownership and mode."""
    p = o["path"]
    if p.exists() and not p.is_dir():
        return res(o["name"], "error", "exists but is not a directory")
    ok = p.is_dir() and same_meta(p, o["owner"], o["mode"])
    if ok: return res(o["name"])
    if not apply: return res(o["name"], "would_change", "create/repair directory")
    p.mkdir(parents=True, exist_ok=True)
    u, g = ids(o["owner"])
    os.chown(p, u, g); os.chmod(p, o["mode"])
    return res(o["name"], "changed")

exact managed files

Exact managed files are owned by the repo. The reconciler reads source bytes, compares them with destination bytes, and writes atomically only when content differs or metadata needs repair.

Because these files are managed, replacing stale destination content is expected. This is different from local create-once files.

def ensure_file(o: dict, c: dict, apply: bool) -> dict:
    """Ensure an exact managed file matches source content, owner, and mode."""
    src, dst = o["src"], o["dst"]
    if not src.exists(): return res(o["name"], "error", f"missing source: {src}")
    b = src.read_bytes()
    ok = read(dst) == b and same_meta(dst, o["owner"], o["mode"])
    if ok: return res(o["name"])
    if not apply: return res(o["name"], "would_change", "copy/repair managed file")
    write(dst, b, o["mode"], o["owner"])
    return res(o["name"], "changed")

rendered managed files

Rendered managed files are also owned by the repo, but their destination content is produced from a template plus config values. The main MVP example is .gitconfig, where $NAME and $EMAIL are replaced with the resolved local identity.

Once rendered, the same exact-file logic applies: compare desired bytes, then atomically replace only when needed.

def ensure_render(o: dict, c: dict, apply: bool) -> dict:
    """Ensure a rendered managed file matches desired content, owner, and mode."""
    src, dst = o["src"], o["dst"]
    if not src.exists(): return res(o["name"], "error", f"missing source: {src}")
    b = render(src.read_text(), c).encode()
    ok = read(dst) == b and same_meta(dst, o["owner"], o["mode"])
    if ok: return res(o["name"])
    if not apply: return res(o["name"], "would_change", "render/repair managed file")
    write(dst, b, o["mode"], o["owner"])
    return res(o["name"], "changed")

create-once local files

Create-once files are local state. If missing, the script may create them from a template or literal content. If present, the script must never overwrite their content; at most it may report metadata drift or repair ownership/mode if we decide that is safe for the specific file.

This is the right category for env.zsh, SSH config templates, and the zsh history file.

def ensure_once(o: dict, c: dict, apply: bool) -> dict:
    """Create a local file only if missing; never overwrite existing content."""
    dst = o["dst"]
    if dst.exists():
        if same_meta(dst, o["owner"], o["mode"]): return res(o["name"])
        if not apply: return res(o["name"], "would_change", "repair metadata only")
        u, g = ids(o["owner"])
        os.chown(dst, u, g); os.chmod(dst, o["mode"])
        return res(o["name"], "changed", "metadata only")
    if o.get("src"):
        if not o["src"].exists(): return res(o["name"], "error", f"missing source: {o['src']}")
        b = o["src"].read_bytes()
    else:
        b = o.get("content", "").encode()
    if not apply: return res(o["name"], "would_change", "create local file")
    write(dst, b, o["mode"], o["owner"])
    return res(o["name"], "changed")

systemd unit and service

Systemd reconciliation is split between the unit file and the service state. The unit file is managed rendered content under /etc/systemd/system; enable/start/restart are service actions.

If the unit file changes, the script should run systemctl daemon-reload before enabling or starting. Runtime actions belong only to start and restart, not to enable.

def ensure_unit(o: dict, c: dict, apply: bool) -> dict:
    """Ensure the managed systemd unit file exists and is current."""
    p = o["path"]
    b = unit_text(c).encode()
    ok = read(p) == b and p.exists()
    if ok: return res(o["name"])
    if not apply: return res(o["name"], "would_change", "install/repair unit; daemon-reload needed")
    write(p, b, 0o644, None)
    cp = cmd(["systemctl", "daemon-reload"])
    if cp.returncode != 0: return res(o["name"], "error", cp.stderr.strip())
    return res(o["name"], "changed", "daemon-reload")
def ensure_svc(o: dict, c: dict, apply: bool) -> dict:
    """Ensure requested systemd service action/state."""
    svc, state = o["svc"], o["state"]
    if state == "enabled":
        cp = cmd(["systemctl", "is-enabled", svc])
        if cp.returncode == 0: return res(o["name"])
        if not apply: return res(o["name"], "would_change", "systemctl enable")
        cp = cmd(["systemctl", "enable", svc])
    elif state == "started":
        cp = cmd(["systemctl", "is-active", svc])
        if cp.returncode == 0: return res(o["name"])
        if not apply: return res(o["name"], "would_change", "systemctl start")
        cp = cmd(["systemctl", "start", svc])
    elif state == "restarted":
        if not apply: return res(o["name"], "would_change", "systemctl restart")
        cp = cmd(["systemctl", "restart", svc])
    else:
        return res(o["name"], "error", f"unknown service state: {state}")
    return res(o["name"], "changed" if cp.returncode == 0 else "error", cp.stderr.strip())

Docker Compose validation

Compose validation checks that the installed runtime directory contains a parseable Compose file. This is a validation step, not a build step. It should report configuration errors early without trying to rebuild the image.

def check_compose(o: dict, c: dict, apply: bool) -> dict:
    """Validate the installed Docker Compose configuration."""
    if not need("docker"):
        return res(o["name"], "warn", "docker command missing")
    yml = o["cwd"]/"docker-compose.yml"
    if not yml.exists():
        return res(o["name"], "warn", f"missing {yml}")
    env = os.environ.copy()
    env.setdefault("SSH_AUTH_SOCK", "/tmp/devbox-ci-ssh-agent.sock")
    cp = subprocess.run(["docker", "compose", "-f", str(yml), "--project-directory", str(o["cwd"]), "config"], env=env, text=True, capture_output=True)
    if cp.returncode != 0:
        return res(o["name"], "warn", cp.stderr.strip())
    return res(o["name"])

SSH keys

SSH key reconciliation handles three cases for sign.pub and dev/dev.pub:

  • Import from file: user provides a path via CLI flag (--ssh-sign-key, --ssh-private-key), environment variable, or interactive prompt. The reconciler copies the key pair to the target location.
  • Generate fresh: if no source is provided, ssh-keygen -t ed25519 produces new keys. --generate-ssh-keys forces this non-interactively.
  • Skip if present: if both private and public key already exist at the target path, the reconciler moves on silently.

The resolution priority matches the name/email pattern: CLI flag → environment variable → interactive prompt (when TTY available) → generate.

def ensure_ssh_keys(o: dict, c: dict, apply: bool) -> dict:
    """Generate or import SSH key pair and signing key if missing."""
    ssh_dir = c["home"] / ".ssh"
    sign_file, sign_pub = ssh_dir / "sign", ssh_dir / "sign.pub"
    auth_file, auth_pub = ssh_dir / "dev", ssh_dir / "dev.pub"
    own = f"{c['dev']}:{c['dev']}"
    u, g = ids(own)
    label_comment = f"dev@{c['cthost']}"

    generated = []

    for label, priv, pub, src in [
        ("signing key", sign_file, sign_pub, c.get("ssh_sign_key")),
        ("auth key", auth_file, auth_pub, c.get("ssh_private_key")),
    ]:
        if pub.exists() and priv.exists():
            continue

        force_gen = c.get("generate_ssh_keys", False)
        source = src if not force_gen else None

        if source:
            src_path = Path(source)
            if not src_path.exists():
                return res(o["name"], "error", f"{label}: source missing: {source}")
            if not apply:
                generated.append(f"{label} (would copy from {source})")
                continue
            shutil.copy2(src_path, priv)
            pub_src = src_path.with_suffix(src_path.suffix + ".pub") if src_path.suffix != ".pub" else src_path
            shutil.copy2(pub_src, pub)
            os.chown(priv, u, g); os.chown(pub, u, g)
            os.chmod(priv, 0o600); os.chmod(pub, 0o644)
            generated.append(label)
        else:
            if not apply:
                generated.append(f"{label} (would generate)")
                continue
            cp = cmd(["ssh-keygen", "-t", "ed25519", "-C", label_comment,
                      "-f", str(priv), "-N", ""])
            if cp.returncode != 0:
                return res(o["name"], "error", f"{label}: {cp.stderr.strip()}")
            os.chown(priv, u, g); os.chown(pub, u, g)
            os.chmod(priv, 0o600); os.chmod(pub, 0o644)
            generated.append(label)

    if not generated:
        return res(o["name"])
    state = "changed" if apply else "would_change"
    return res(o["name"], state, ", ".join(generated))

Repo copy

After first enable, the repo is copied from the admin's clone to /home/dev/git/REPO_NAME. This means dev can immediately cd ~/git/<name> and use make targets without a manual clone step.

The copy uses create-once semantics: if the target already exists, it is skipped entirely. This keeps it idempotent and safe for subsequent enable runs. Symlinks are preserved; .git is included so version history travels with the copy.

def ensure_repo_copy(o: dict, c: dict, apply: bool) -> dict:
    """Copy the repo to ~dev/git/REPO_NAME once after first enable."""
    src = c["root"]
    dst = c["home"] / str("." + src.name)
    own = f"{c['dev']}:{c['dev']}"
    if dst.exists():
        return res(o["name"])
    if not apply:
        return res(o["name"], "would_change", f"copy {src}{dst}")
    shutil.copytree(src, dst, symlinks=True, ignore=shutil.ignore_patterns())
    u, g = ids(own)
    for p in [dst, *dst.rglob("*")]:
        os.chown(p, u, g)
    return res(o["name"], "changed")

low-level helpers

Low-level helpers are the small primitives used by reconcilers. Keeping them separate prevents each reconciler from reimplementing subprocess handling, identity lookup, file IO, template rendering, and ownership logic.

These helpers are also where most careful systems details live: no shell=True, no blind writes, atomic replacement for managed files, and clear conversion between names and numeric UIDs/GIDs.

command execution

cmd is the one subprocess wrapper. It should never use shell=True; callers pass argv lists. It captures stdout and stderr so failures can be reported clearly rather than disappearing into the terminal.

def cmd(xs: list[str], check: bool = False, input: str | None = None) -> subprocess.CompletedProcess:
    """Run a command without shell=True, capturing stdout and stderr."""
    return subprocess.run(xs, input=input, text=True, capture_output=True, check=check)

command/path lookup

Command lookup is used by command checks and by reconcilers that need to know whether tools such as docker, systemctl, or apt-get are available before shelling out. It stays tiny by delegating to shutil.which.

def need(x: str) -> bool:
    """Return True when command x is available on PATH."""
    return shutil.which(x) is not None

user and group lookup

These helpers convert user and group names into numeric IDs using the standard pwd and grp modules. Keeping them separate avoids repeating lookup logic in every ownership-aware reconciler.

def uid(name: str) -> int:
    """Return the UID for a user name."""
    return pwd.getpwnam(name).pw_uid
def gid(name: str) -> int:
    """Return the GID for a group name."""
    return grp.getgrnam(name).gr_gid

file reading and atomic writing

File helpers are responsible for the most important local idempotency pattern: compare first, write only when needed, and replace managed content atomically. read returns None for missing files. write should create a temporary file beside the destination, flush/fsync it, atomically replace the target with os.replace, then apply ownership and permissions.

def read(p: Path) -> bytes | None:
    """Read bytes from p, returning None when p does not exist."""
    return p.read_bytes() if p.exists() else None
def write(p: Path, b: bytes, mode: int = 0o644, owner: str | None = None) -> None:
    """Atomically write bytes to p, then set mode and optional owner."""
    p.parent.mkdir(parents=True, exist_ok=True)
    fd, tmp = tempfile.mkstemp(prefix=f".{p.name}.", suffix=".tmp", dir=p.parent)
    try:
        with os.fdopen(fd, "wb") as f:
            f.write(b); f.flush(); os.fsync(f.fileno())
        os.chmod(tmp, mode)
        if owner:
            os.chown(tmp, *ids(owner))
        os.replace(tmp, p)
    finally:
        if os.path.exists(tmp): os.unlink(tmp)

template rendering

Rendering is intentionally simple. The repo templates use $NAME and $EMAIL style placeholders, so string.Template is enough. The renderer should build the uppercase placeholder mapping from the config dictionary and avoid introducing a heavier templating dependency.

def render(s: str, c: dict) -> str:
    """Render a $PLACEHOLDER template string from config values."""
    m = {k.upper(): str(v) for k, v in c.items() if isinstance(v, (str, int))}
    return Template(s).safe_substitute(m)

ownership and modes

Ownership and mode helpers let reconcilers ask one clear question: does this path already have the desired metadata? owner parses user:group strings into numeric IDs; same_meta checks owner/group/mode without caring about file content.

def ids(spec: str) -> tuple[int, int]:
    """Parse 'user:group' into numeric uid/gid."""
    u, _, g = spec.partition(":")
    return uid(u), gid(g or u)
def same_meta(p: Path, owner: str, mode: int) -> bool:
    """Return True when p has the desired owner/group and permission mode."""
    if not p.exists(): return False
    st, (u, g) = p.stat(), ids(owner)
    return (st.st_uid, st.st_gid, st.st_mode & 0o777) == (u, g, mode)

apt helpers

Apt helpers isolate Debian/Ubuntu package inspection and installation. apt_missing should use package status inspection, not command lookup. apt_install should run one explicit apt-get install -y ... command for the missing set and report captured failures clearly.

def apt_missing(pkgs: list[str]) -> list[str]:
    """Return apt packages from pkgs that are not installed."""
    miss = []
    for p in pkgs:
        cp = cmd(["dpkg-query", "-W", "-f=${Status}", p])
        if cp.returncode != 0 or "install ok installed" not in cp.stdout:
            miss.append(p)
    return miss
def apt_install(pkgs: list[str]) -> dict:
    """Install missing apt packages and return a result dictionary."""
    if not pkgs: return res("apt packages")
    cp = cmd(["apt-get", "update"])
    if cp.returncode != 0: return res("apt packages", "error", cp.stderr.strip())
    cp = cmd(["apt-get", "install", "-y", *pkgs])
    return res("apt packages", "changed" if cp.returncode == 0 else "error", cp.stderr.strip())

systemd unit rendering

The systemd unit is rendered from config because its user, group, and working directory belong to the selected devbox identity. It is still managed exact content: if the rendered unit differs from /etc/systemd/system/dev-container.service, the unit reconciler should replace it and trigger a daemon reload before service actions.

def unit_text(c: dict) -> str:
    """Render the dev-container.service systemd unit text."""
    return f"""[Unit]
Description=devbox Docker Compose workstation container
Requires=docker.service
After=docker.service network-online.target

[Service]
Type=oneshot
User={c['dev']}
Group={c['dev']}
WorkingDirectory={c['run']}
ExecStart=/usr/bin/docker compose up -d
ExecStop=/usr/bin/docker compose down
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target
"""

report/main

The final section turns operation results into useful terminal output and connects all pieces through main. Reporting should be plain, readable, and honest: show what is OK, what would change, what changed, what needs manual attention, and what actually failed.

main is intentionally small. It parses arguments, builds config, chooses target groups, builds the plan, runs operations with the correct apply mode, prints next steps, and returns the process exit code.

report formatting

report is the human-facing summary of reconciliation results. Every operation returns the same small dictionary shape, so reporting can stay generic: print one compact line per result, then print a count summary.

The exit-code policy is intentionally simple. error means the command failed and returns 1; everything else returns 0. Warnings are visible but non-failing, because a fresh host may legitimately be missing manual prerequisites such as SSH keys or GitHub authentication.

def report(rs: list[dict]) -> int:
    """Print operation results and return a process exit code."""
    icons = {"ok": "✓", "would_change": "~", "changed": "+", "warn": "!", "error": "✗"}
    for r in rs:
        detail = f" — {r['detail']}" if r.get("detail") else ""
        print(f"{icons.get(r['state'], '?')} {r['state']:<12} {r['name']}{detail}")
    counts = {s: sum(r["state"] == s for r in rs) for s in ["ok", "would_change", "changed", "warn", "error"]}
    print("\n" + " ".join(f"{k}={v}" for k, v in counts.items() if v))
    return 1 if counts.get("error") else 0

next steps

next_steps is deliberately not part of convergence. It prints short reminders for manual work that should remain explicit in the MVP: create or copy SSH signing material, check GitHub CLI authentication, and enter the container once the service is running.

This keeps sync.py honest. The script can install and verify machine state, but it should not silently generate secrets, impersonate interactive auth flows, or hide important setup rituals.

def next_steps(c: dict, rs: list[dict]) -> None:
    """Print concise manual follow-up steps after the report."""
    home = c["home"]
    steps = []
    for r in rs:
        if r["name"] == "SSH keys for dev user" and "generated" in r.get("detail", ""):
            steps.append(f"add public keys to GitHub: cat {home}/.ssh/sign.pub {home}/.ssh/dev.pub")
            break
    try:
        if not (home/".ssh/sign.pub").exists(): steps.append(f"create/copy SSH signing public key: {home}/.ssh/sign.pub")
    except PermissionError:
        steps.append(f"create/copy SSH signing public key: {home}/.ssh/sign.pub")
    if any(r["name"] == "command gh" and r["state"] == "ok" for r in rs): steps.append(f"as {c['dev']}, run `gh auth status` or `gh auth login` if needed")
    steps.append("after starting the service, enter it with `make shell`")
    if steps:
        print("\nNext steps:")
        for s in steps: print(f"- {s}")

main entrypoint

main is the complete top-level flow. It should remain short enough to read at a glance: parse, configure, select targets, plan, run, report, print next steps, return exit code. The final if __name__ == "__main__" block makes the file directly executable.

def main(argv=None) -> int:
    """Run the devbox sync command."""
    a = cli(argv)
    c = cfg(a)
    ops = plan(c, target(c["cmd"]))
    rs = run_ops(ops, c, c["apply"])
    rc = report(rs)
    next_steps(c, rs)
    return rc
if __name__ == "__main__":
    raise SystemExit(main())

Manual

This manual is for programmers using or extending sync.py. The script is intentionally small, but it has a few conventions that keep it safe and predictable:

  • desired state lives in data tables;
  • commands select resource groups;
  • planning turns groups into operation dictionaries;
  • the runner dispatches operations by kind;
  • reconcilers inspect first and mutate only when apply=True;
  • local/secrets files are create-once, never blindly overwritten.

Usage cheatsheet

Run from the repo root during MVP development:

python host/sync.py status
python host/sync.py enable --name "Your Name" --email you@example.com
python host/sync.py start
python host/sync.py restart

status is always non-mutating. enable, start, and restart all converge host state first. start and restart then perform the corresponding systemd action.

Extending packages or commands

To require another host package, add its apt package name to PKGS. To verify another executable is available after package installation, add the command name to CMDS.

Use package names for PKGS and executable names for CMDS; they are not always the same:

PKGS += ["fd-find"]   # apt package
CMDS += ["fdfind"]    # executable on Ubuntu

Missing packages are installable state. Missing commands are reported as warnings, because command availability can depend on shell aliases, alternatives, package versions, or PATH details.

Extending files and templates

Use the three file classes deliberately:

FILES   exact repo-managed copies; safe to replace
RENDER  repo-managed templates rendered from config; safe to replace
ONCE    local create-once files; never overwrite existing content

Examples:

FILES.append({"src": root/"container/new.conf", "dst": run/"new.conf", "owner": own, "mode": 0o644})
RENDER.append({"src": root/"templates/foo", "dst": home/".foo", "owner": own, "mode": 0o644})
ONCE.append({"src": root/"templates/local.example", "dst": home/".local", "owner": own, "mode": 0o600})

If a file might contain secrets or local machine choices, it belongs in ONCE, not FILES or RENDER.