docs: add deployment log with 2026-04-15 vps.lab migration entry

This commit is contained in:
2026-04-15 15:46:53 +00:00
parent 5e601b1840
commit c5735f3129

96
docs/deployment-log.md Normal file
View File

@@ -0,0 +1,96 @@
# Deployment Log
Running log of notable deployments, migrations, and incidents for
`HangmanLab.Server.T1`. Newest entries on top. Keep entries terse and
strictly non-sensitive (no secrets, tokens, passwords, or internal IPs).
---
## 2026-04-15 — Migrate vps.lab from legacy `HangmanLab` compose to T1
**Goal:** replace the old `~/HangmanLab` docker-compose deployment on
`vps.lab` with `HangmanLab.Server.T1`, preserving existing MySQL data,
and expose the HarborForge stack under two new domains.
**Steps taken**
1. Cloned `HangmanLab.Server.T1` to `/root/HangmanLab.Server.T1` on
`vps.lab`.
2. Authored site-local `.env` (not committed). Set
`COMPOSE_PROJECT_NAME=hangmanlab` so the new project resolves
named volumes to `hangmanlab_mysql_data` and
`hangmanlab_backend_dump`, matching the volumes created by the old
deployment. Added HarborForge vars (`HF_*`) and
`WIZARD_PORT=18080`.
3. Verified volume resolution with `docker compose config` before
touching anything live.
4. `docker compose pull` on the new project to pre-pull all images.
5. `docker compose down` in the old project directory — **without**
`-v`, so named volumes were preserved.
6. `docker compose up -d` in the new project directory. All services
came up: `mysql` healthy, `hf_db_init` exited successfully
(created the `harborforge` database idempotently), `hf_backend`
initially blocked on `Config not ready` as expected.
7. Confirmed data preservation: 13 tables still present in the
`hangmanlab` database; `harborforge` database created fresh with
0 tables.
**Nginx**
- Added two new server blocks to the existing site file:
- `hf.hangman-lab.top``127.0.0.1:${HF_FRONTEND_PORT}`
- `hf-api.hangman-lab.top``127.0.0.1:${HF_BACKEND_PORT}`
- `nginx -t` clean (pre-existing warnings about conflicting server
names are unrelated).
- Certbot `--nginx --expand` run over HTTP-01 to add the two new
domains to the existing Let's Encrypt certificate covering
`hangman-lab.top` and `api.hangman-lab.top`. Cloudflare proxy was
temporarily set to DNS-only on the new subdomains during validation,
then re-enabled.
- Certbot left a backup of the old site file in `sites-enabled/` which
caused duplicate `server_name` warnings on reload; moved it out to
`/root/` and re-reloaded.
**Frontend / setup flow**
- The setup wizard page in `HarborForge.Frontend` was rewritten to
ask for the AbstractWizard port at step 1 (via SSH tunnel), test
`/health`, and persist the chosen port in `localStorage`. No
`VITE_WIZARD_PORT` build-time env var is used anymore.
- Backend URL is collected at step 3 and written to `localStorage` as
`HF_BACKEND_BASE_URL`. For this deployment the browser was pointed
at `https://hf-api.hangman-lab.top`.
**Incident: PUT /api/v1/config/harborforge.json → 500**
- Symptom: during the "Finish setup" step the wizard returned HTTP
500 on the config PUT.
- Root cause: the `abstract-wizard` image runs as `nonroot`
(uid/gid `65532`), but Docker creates the `wizard_config` named
volume with `root:root` ownership on first use, so the container
process had no write permission inside `/config`.
- Hot fix on vps.lab: `chown -R 65532:65532` on the host-side volume
path. Setup then completed successfully.
- Permanent fix (this commit's predecessor `5e601b1`): added a
`wizard_init` one-shot service using `busybox`, running as root,
that chowns `/config` to `65532:65532` every time the project comes
up. The `wizard` service now
`depends_on: { wizard_init: { condition: service_completed_successfully } }`.
The fix is idempotent and covers fresh installs on any host, not
just vps.lab.
**Outcome**
- `https://hangman-lab.top` and `https://api.hangman-lab.top`
unchanged and still serving the legacy stack from the same MySQL
data.
- `https://hf.hangman-lab.top` and `https://hf-api.hangman-lab.top`
serving the HarborForge stack from the same compose project.
- Setup wizard completed end-to-end; admin account created; backend
reachable from the frontend.
**Follow-ups**
- None blocking. If future deployments target a distroless service
that writes to a named volume, use the same `*_init` sidecar
pattern (`busybox` + `chown`) to avoid repeating this incident.