Skip to content

fix(seed-job): content-validating idempotency + Keycloak redirectUris patcher (unblock SSO over LAN)#53

Merged
ciprianiacobescu merged 4 commits into
mainfrom
fix/seed-job-idempotency-and-sso-2026-05-17
May 17, 2026
Merged

fix(seed-job): content-validating idempotency + Keycloak redirectUris patcher (unblock SSO over LAN)#53
ciprianiacobescu merged 4 commits into
mainfrom
fix/seed-job-idempotency-and-sso-2026-05-17

Conversation

@ciprianiacobescu
Copy link
Copy Markdown
Contributor

Summary

Fixes two related bootstrap defects + one cascading mismatch. After this lands, an operator can SSO-login to admin-ui, Grafana, and Jaeger over LAN via Keycloak.

Chunk What
SF-1 (_ensure_secret_file) seed-job's secret writers now validate content (size/format), not just existence. Catches PR #50-style format drift. Idempotent.
SSO-1 (_patch_keycloak_client_redirect_uris) seed-job patches Keycloak client redirectUris after realm import, substituting env-var-derived public URLs. Literal ${...} placeholders removed from realm-mintkey.json. Localhost preserved.
Cascade oauth2-proxy now passes --code-challenge-method=S256 (Keycloak realm enforces PKCE S256 on mintkey-jaeger).

Verification (end-to-end, live local stack at 10.243.1.200)

All 3 SSO chains: curl -sL http://10.243.1.200:{8081,3003,16686}/ → 200, lands on Keycloak login page with correct LAN-IP redirect_uri.

$ docker compose logs seed-job | grep redirectUris
Keycloak: mintkey-admin-api redirectUris already current.
Keycloak: mintkey-grafana redirectUris already current.
Keycloak: mintkey-jaeger redirectUris already current.

$ curl Keycloak admin REST per client:
mintkey-admin-api redirectUris: [localhost:8080..., 10.243.1.200:8080...]
mintkey-grafana redirectUris: [localhost:3003..., 10.243.1.200:3003...]
mintkey-jaeger redirectUris: [10.243.1.200:16686..., localhost:16686...]

Idempotency: 4 stale-→-regen and valid-→-skip live tests pass.

Change Type

  • Remediation session

Provenance

  • Session: team/remediation/2026-05-17-seed-job-idempotency-and-sso/
  • Intake: 9 fields complete
  • 10 offline unit tests for _ensure_secret_file validators

Verification

  • Idempotency: stale file regenerates; valid file preserves
  • Realm patcher: 3 clients updated; LAN IPs added; localhost preserved; idempotent
  • Full SSO chain: admin-ui ✓, Grafana ✓, Jaeger ✓
  • jaeger-auth healthy with PKCE S256 enabled
  • All 17 local containers healthy
  • CI on this PR

Agent/Automation Rules

  • No --no-verify
  • No accepted ADR edited
  • No Co-Authored-By trailer

CiprianSpot added 4 commits May 17, 2026 09:19
Two related bootstrap-pipeline defects:
1. seed-job _ensure_* functions check existence not validity → stale
   secrets persist after format changes (jaeger cookie bug on 2026-05-17).
2. SSO over LAN broken because: (a) .env missing 5 PUBLIC_URL settings,
   (b) realm JSON has literal '${MINTKEY_*_PUBLIC_URL}' placeholders that
   neither Keycloak nor seed-job substitute.

Owner-locked: seed-job patches redirectUris/webOrigins after realm
import using env-var values. Idempotent. Localhost preserved for
local dev.
…redirectUris patcher

Two related defects in the bootstrap pipeline addressed in one commit
because both touch seed-job/main.py.

## SF-1 — idempotency footgun
Added `_ensure_secret_file()` helper that validates existing content
(size + format), not just file existence. Regenerates on validation
failure; preserves on success. Applied to:
- `_ensure_jaeger_cookie_secret`: validates 44-char base64url
- `_write_client_secrets`: validates non-empty + matches live Keycloak
  secret (catches operator rotation via Keycloak UI too)
- `_ensure_admin_password_file` (new wrapper): validates 12-128 chars

Catches the PR #50 64-byte hex format bug AND any future format change
to any seed-managed secret. 10 offline unit tests pass.

## SSO-1 — Keycloak client redirectUris patcher
`seed-job/realm-mintkey.json` previously stored literal placeholder
strings like `${MINTKEY_ADMIN_API_PUBLIC_URL}/v1/auth/oidc/callback`
in client redirectUris. Keycloak does NOT substitute env vars; seed-job
did not patch them after import. Result: operator-facing login over
LAN (e.g. http://10.243.1.200:8081/) was rejected with
`invalid_redirect_uri` because no allow-listed URI matched.

Added `_patch_keycloak_client_redirect_uris(client_id_name,
callback_path, public_url_env)` called after realm import for:
- mintkey-admin-api → /v1/auth/oidc/callback / MINTKEY_ADMIN_API_PUBLIC_URL
- mintkey-grafana → /login/generic_oauth / MINTKEY_GRAFANA_PUBLIC_URL
- mintkey-jaeger → /oauth2/callback / MINTKEY_JAEGER_PUBLIC_URL

Algorithm: GET client → compute desired URIs (preserve localhost, drop
`${...}` placeholders, add public URL if env var set) → PUT only if
different. Idempotent (no-op if already correct). Logs
"patched <client> redirectUris: ..." or "<client> redirectUris already
current."

realm-mintkey.json updated: `${...}` placeholders removed from all 3
clients (the patcher now adds the correct values dynamically).

docker-compose.yml seed-job env extended to forward
MINTKEY_{ADMIN_API,GRAFANA,JAEGER}_PUBLIC_URL.

Verification (local):
- Test 1: existing-valid jaeger cookie preserved across re-runs.
- Test 2: 11-byte stale cookie → detected INVALID → regenerated 44 bytes.
- Test 3: idempotent ("valid — skipping") on second run.
- Test 4: jaeger-auth healthy after seed-job re-run.
- Patcher: 3 clients get LAN-IP URIs added; re-run logs "already current".
- Keycloak REST query confirms: no `${...}` literal remains; LAN IP +
  localhost both in redirectUris.

CI regression: low — seed-job's patcher no-ops when env vars unset
(CI integration test path). Idempotency helper is purely additive over
existing logic.

Out-of-scope follow-ups (documented in 99-report):
- `webOrigins` left at `+` (Keycloak permissive default); explicit LAN
  origin patching deferred — covers same-origin already.
Keycloak realm enforces PKCE S256 on mintkey-jaeger client (per
seed-job _enforce_pkce_on_clients), but oauth2-proxy v7.6.0 doesn't
enable PKCE automatically. Without the flag, the auth flow reached
Keycloak but returned:

  error=invalid_request&error_description=Missing+parameter%3A+code_challenge_method

Added --code-challenge-method=S256 to the exec line. Verified via
live curl chain: /oauth2/sign_in now lands on Keycloak login page
with code_challenge=... in the URL.

Same root-cause-family as PR #48 (otel-collector config drift) and
PR #52 (oauth2-proxy cookie format): config-vs-runtime expectation
mismatch surfaced once the upstream constraint became reachable.
@ciprianiacobescu ciprianiacobescu merged commit 5f397b7 into main May 17, 2026
@github-actions
Copy link
Copy Markdown

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@ciprianiacobescu ciprianiacobescu deleted the fix/seed-job-idempotency-and-sso-2026-05-17 branch May 30, 2026 06:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant