Django-based participant dashboard for SFMIX IX users. Authenticates via Authentik SSO (OIDC) at login.sfmix.org, using PeeringDB-sourced ASN group memberships to control which networks a user can view.
- Host:
web.sfmix.org(Ansible groupsfmix_website), DNS CNAMEportal.sfmix.org→web - Runtime: Django + Gunicorn in Docker, port 8000, behind Nginx with Let's Encrypt TLS
- Auth:
mozilla-django-oidc→ Authentik OIDC atlogin.sfmix.org - Data: NetBox API with proactive in-process cache (background thread per Gunicorn worker, 4-hour refresh cycle with exponential backoff on failure)
- Monitoring: Prometheus metrics at
/metrics/(restricted to trusted networks), admin health dashboard at/admin/netbox-status/ - ASN gating: Authentik groups like
as64500(sourced from PeeringDB federation) are extracted from the OIDCgroupsclaim and stored in the Django session. Each user sees only their own networks. - Install dir:
/opt/ixp_portal/
Create an OIDC provider + application in Authentik for the portal:
| Parameter | Value |
|---|---|
| Client ID | portal |
| Client type | Confidential |
| Redirect URI | https://portal.sfmix.org/oidc/callback/ |
| Scopes | openid profile email groups |
| Authorization flow | default-provider-authorization-implicit-consent |
The groups scope must use the existing SFMIX: OpenID 'groups' ScopeMapping (same one Grafana uses).
Set the resulting client secret as OIDC_RP_CLIENT_SECRET in the portal's environment.
cd portal
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# For local dev without real OIDC, create a superuser:
python manage.py migrate
python manage.py createsuperuser
# Run dev server
python manage.py runserverFor full OIDC testing against login.sfmix.org, set these env vars:
export OIDC_RP_CLIENT_ID=portal
export OIDC_RP_CLIENT_SECRET=<from-authentik>
export OIDC_PROVIDER_URL=https://login.sfmix.org/application/o/portal
export IXP_NETBOX_URL=https://netbox.sfmix.org
export IXP_NETBOX_TOKEN=<read-only-netbox-token>cd ansible
pipenv run ansible-playbook deploy_portal.playbook.yml --vault-password-file ~/.sfmix_ansible_vaultOr as part of the full push:
pipenv run ansible-playbook push_servers.playbook.yml --tags ixp_portal --vault-password-file ~/.sfmix_ansible_vaultThe ixp_portal role (ansible/roles/ixp_portal/) performs these steps:
- Installs Docker and docker-compose
- Rsyncs
portal/source to/opt/ixp_portal/(excludes.venv,__pycache__,db.sqlite3,staticfiles,.env) - Templates
.envfrom vault-encrypted secrets (ansible/roles/ixp_portal/templates/dotenv.j2) - Builds and starts the Docker container (
docker-compose up -d --build) - Runs Django migrations
- Ensures Let's Encrypt cert for
portal.sfmix.orgvia certbot - Deploys and enables Nginx reverse proxy vhost
Stored vault-encrypted in ansible/inventory/host_vars/web.sfmix.org.yml:
| Variable | Description |
|---|---|
ixp_portal_django_secret_key |
Django secret key |
ixp_portal_oidc_client_secret |
Authentik OIDC client secret |
ixp_portal_netbox_token |
NetBox API read-only token |
After deploying, check the container logs for a successful NetBox cache refresh:
ssh web.sfmix.org "sudo docker-compose -f /opt/ixp_portal/docker-compose.yml logs --tail 20 2>&1 | grep -iE 'netbox|error'"Expected output: NetBox cache refreshed in N.Ns: NN tenants, NN IPs, NN ports from each Gunicorn worker.
IX Administrators can also check health in the browser at https://portal.sfmix.org/admin/netbox-status/ after logging in.
Smoke test:
ssh web.sfmix.org "curl -s -o /dev/null -w '%{http_code}' -H 'Host: portal.sfmix.org' http://localhost:8000/login/"
# Expected: 200SSH rate limiting / connection resets during deploy:
web.sfmix.org has PerSourceMaxStartups 10 in sshd. Ansible rsync opens extra SSH connections which can trip this. Admin IPs should be in PerSourcePenaltyExemptList (managed by ansible/roles/sfmix_server/tasks/sshd.yml).
NetBox cache empty (0 tenants, 0 IPs):
Check that IXP_NETBOX_TOKEN in /opt/ixp_portal/.env is a valid NetBox API token. Look for HTTP 403 errors in the container logs. The background cache thread starts when the first HTTP request hits each Gunicorn worker (via dashboard.middleware.NetBoxCacheMiddleware). On failure, the refresh loop retries with exponential backoff (30s → 60s → ... → 1h cap). IX Administrators can view the current error and clear/force-refresh from /admin/netbox-status/.
500 errors on network detail pages:
Check container logs for tracebacks. Common cause: NetBox objects with null nested fields (e.g. custom_fields.participant is null on some peering ports).
Health check DisallowedHost warnings:
The docker-compose health check hits localhost:8000 which isn't in ALLOWED_HOSTS. These log warnings are harmless.
| File | Purpose |
|---|---|
ansible/deploy_portal.playbook.yml |
Deployment playbook |
ansible/roles/ixp_portal/tasks/main.yml |
Ansible tasks (rsync, docker, nginx, certbot) |
ansible/roles/ixp_portal/defaults/main.yml |
Default variables (domain, ports, OIDC, NetBox URL) |
ansible/roles/ixp_portal/templates/dotenv.j2 |
Environment variable template |
ansible/roles/ixp_portal/templates/nginx-portal.conf.j2 |
Nginx vhost template |
ansible/inventory/host_vars/web.sfmix.org.yml |
Host secrets (vault-encrypted) |
portal/dashboard/services.py |
NetBox data fetching, proactive cache, health tracking, Prometheus metrics |
portal/dashboard/middleware.py |
Starts background refresh thread per Gunicorn worker |
portal/ixp_portal/settings.py |
Django settings (OIDC, logging, middleware) |
portal/Dockerfile |
Container image (Python 3.12 + Gunicorn) |
portal/docker-compose.yml |
Docker Compose config |
| Variable | Default | Description |
|---|---|---|
DJANGO_SECRET_KEY |
insecure-dev-key-change-me |
Django secret key |
DJANGO_DEBUG |
true |
Debug mode |
DJANGO_ALLOWED_HOSTS |
* |
Comma-separated allowed hosts |
OIDC_RP_CLIENT_ID |
portal |
Authentik OIDC client ID |
OIDC_RP_CLIENT_SECRET |
(empty) | Authentik OIDC client secret |
OIDC_PROVIDER_URL |
https://login.sfmix.org/application/o/portal |
OIDC provider base URL |
IXP_NETBOX_URL |
https://netbox.sfmix.org |
NetBox API base URL |
IXP_NETBOX_TOKEN |
(empty) | NetBox API read-only token |
PROMETHEUS_TRUSTED_NETWORKS |
127.0.0.0/8,::1/128 |
Comma-separated CIDRs allowed to scrape /metrics/ |
The portal maintains an in-process cache of participant data from NetBox, refreshed by a background thread in each Gunicorn worker.
- Refresh interval: 4 hours after a successful fetch
- On failure: exponential backoff starting at 30s, doubling each attempt, capped at 1 hour. Once backoff reaches the cap it waits until the next normal 4-hour cycle. Stale data continues to be served.
- Startup: the cache starts empty; the first refresh fires immediately when the background thread starts (triggered by the first HTTP request via
NetBoxCacheMiddleware) - Per-worker isolation: each Gunicorn worker has its own independent cache and refresh thread (no shared Redis/memcached)
- Admin dashboard: IX Administrators see a "NetBox Status" link in the nav bar, leading to
/admin/netbox-status/. Shows health badge (Healthy / Degraded / No data), cache age, item counts, refresh duration, success/failure counts, and last error message. Includes a "Clear Cache & Refresh Now" button. - Prometheus metrics at
/metrics/, restricted toPROMETHEUS_TRUSTED_NETWORKS:
| Metric | Type | Description |
|---|---|---|
netbox_refresh_duration_seconds |
Histogram | Time spent fetching data from NetBox |
netbox_refresh_success_total |
Counter | Successful cache refreshes |
netbox_refresh_failure_total |
Counter | Failed cache refreshes |
netbox_cache_age_seconds |
Gauge | Seconds since last successful refresh |
netbox_cache_items{type} |
Gauge | Cached items by type (tenants, ip_addresses, peering_ports) |