Skip to content

Add byok migration script for newly hosted keys#3462

Open
TheodoreSpeaks wants to merge 10 commits intofeat/mothership-copilotfrom
feat/migrate-byok
Open

Add byok migration script for newly hosted keys#3462
TheodoreSpeaks wants to merge 10 commits intofeat/mothership-copilotfrom
feat/migrate-byok

Conversation

@TheodoreSpeaks
Copy link
Collaborator

@TheodoreSpeaks TheodoreSpeaks commented Mar 7, 2026

Summary

Currently, hosted keys will override users' currently provided api keys. Instead, we need to create a script to migrate their provided api keys to the BYOK section so that users continue to use their existing keys.

This handles:

  • Agent tool sub-blocks.
  • Raw inputted keys
  • Keys set via secrets

Note: BYOK only allows for a single api key. This could cause different behavior if a user has multiple api keys provided for different blocks for the same service. The dry run flag will detect these cases and raise them so we can look on a case-by-case basis.

Example usage:

# Dry run - detect duplicate api keys, validates 
bun run packages/db/scripts/migrate-block-api-keys-to-byok.ts \
  --map jina=jina --map perplexity=perplexity --map linkup=linkup \
  --map huggingface=huggingface --map firecrawl=firecrawl --map serper=serper \
  --map elevenlabs=elevenlabs --map exa=exa --map browser_use=browser_use \
  --map google_books=google_cloud --map google_maps=google_cloud \
  --map google_pagespeed=google_cloud --map google_translate=google_cloud --dry-run

# Output

Found 2 candidate blocks

Loaded env vars: 1 users, 1 workspaces

Literal API keys: 1
Env var references: 1
Nested tool keys (agent/HITL): 1
Env var resolution failures: 0
Skipped (no workspace): 0
Skipped (empty key): 0
Total resolved key entries: 2

Unique (workspace, provider) pairs to insert: 2

No conflicts detected.

  [DRY RUN] Would insert BYOK key for workspace 438e6738-c3aa-4fe6-907e-e996e87d5681, provider "firecrawl": ••••••••
  [DRY RUN] Would insert BYOK key for workspace 438e6738-c3aa-4fe6-907e-e996e87d5681, provider "exa": ••••••••

[DRY RUN] No changes were made to the database.
Run without --dry-run to apply changes.

# Real migration
bun run packages/db/scripts/migrate-block-api-keys-to-byok.ts \
  --map jina=jina --map perplexity=perplexity --map linkup=linkup \
  --map huggingface=huggingface --map firecrawl=firecrawl --map serper=serper \
  --map elevenlabs=elevenlabs --map exa=exa --map browser_use=browser_use \
  --map google_books=google_cloud --map google_maps=google_cloud \
  --map google_pagespeed=google_cloud --map google_translate=google_cloud

Found 2 candidate blocks

Loaded env vars: 1 users, 1 workspaces

Literal API keys: 1
Env var references: 1
Nested tool keys (agent/HITL): 1
Env var resolution failures: 0
Skipped (no workspace): 0
Skipped (empty key): 0
Total resolved key entries: 2

Unique (workspace, provider) pairs to insert: 2

Insert batch 1 (1-2 of 2)
  [INSERT] BYOK key for workspace 438e6738-c3aa-4fe6-907e-e996e87d5681, provider "firecrawl": ••••••••
  [INSERT] BYOK key for workspace 438e6738-c3aa-4fe6-907e-e996e87d5681, provider "exa": ••••••••

---
Migration Summary:
  BYOK keys inserted: 2
  BYOK keys skipped (already existed): 0
  BYOK insert errors: 0
  Env var resolution failures: 0

Migration completed successfully!

Done!

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation
  • Other: ___________

Testing

Tested with existing api key, migration moves it to byok uses key
Tested with existing env key, migration moves to byok and uses key
Tested with agent subblock, migration moves to byok and uses key
Tested with multiple same keys, migration moves to byok
Tested with different keys, dry run flags as potential issue.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • [ x I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

Screenshots/Videos

@vercel
Copy link

vercel bot commented Mar 7, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment Mar 10, 2026 7:21am

Request Review

console.log(`Found ${rows.length} candidate blocks\n`)

// 2. Pre-load env vars for resolving {{VAR}} references
const personalEnvRows = await db.select().from(environment)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't load all env vars into memory. Can we first find the blocks that reference these env vars, then map the env rows needed from that?

@TheodoreSpeaks TheodoreSpeaks marked this pull request as ready for review March 8, 2026 00:33
@cursor
Copy link

cursor bot commented Mar 8, 2026

PR Summary

Medium Risk
Medium risk because it reads/decrypts stored secrets and writes new encrypted BYOK rows; incorrect mappings or conflict resolution could migrate the wrong key for a provider despite dry-run safeguards.

Overview
Adds a new self-contained Bun script, migrate-block-api-keys-to-byok.ts, to migrate block-level apiKey values into workspace_byok_keys without modifying the original blocks.

The script supports --dry-run auditing (detects multiple distinct keys per workspace/provider, logs chosen key, and outputs a workspace-id list) and a gated live mode (--from-file) that encrypts and inserts BYOK keys with onConflictDoNothing. It resolves both literal keys and {{ENV_VAR}} references by decrypting values from workspace_environment and user environment, and also scans nested tool API keys inside agent/human_in_the_loop blocks.

Written by Cursor Bugbot for commit efb23da. This will update automatically on new commits. Configure here.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 8, 2026

Greptile Summary

This PR adds a one-shot migration script that reads API keys stored directly on workflow blocks and inserts them into the workspace_byok_keys table, so users continue using their own keys after hosted keys are introduced. The script supports literal key values, environment-variable references ({{VAR_NAME}}), and nested tool API keys inside agent/HITL sub-blocks, and includes a --dry-run mode for auditing conflicts before committing changes.

Key points:

  • Statistics bug: (result as any).rowCount === 0 will always be false for the postgres-js driver (which exposes count, not rowCount). Every conflict-silenced insert is miscounted as a successful insert, making the summary numbers unreliable. Switching to .returning({ id: ... }) and checking array length (as done in production route.ts) would fix this.
  • Silent key misses: The sub-block field name apiKey is hardcoded. Block types that store their credential under a different name will be silently skipped with no counter incremented.
  • Live-run conflict handling: Conflicts are logged but not blocked in the live run. Operators should be aware that running without --dry-run will automatically choose the first resolved key for any provider with multiple distinct values.
  • Workspace query over-inclusion: Adding agent/human_in_the_loop block types to the initial workspace query may pull in workspaces that only have those block types but no mapped tool keys, resulting in unnecessary DB round-trips and noisy log output.

Confidence Score: 3/5

  • Safe to merge as a script-only change, but the statistics reported at the end of a live run will be incorrect due to the rowCount bug.
  • The actual database writes (encrypt + insert with onConflictDoNothing) are correct and idempotent, so re-running the script is safe. However, the rowCount logic bug means operators cannot trust the summary counts (inserted vs. skipped) after a live run, which reduces confidence in the migration's reported outcome. The hardcoded apiKey sub-block field is a secondary concern that could cause silent data gaps.
  • packages/db/scripts/migrate-block-api-keys-to-byok.ts — specifically the insert result handling around line 451 and the sub-block field name assumption around line 290.

Important Files Changed

Filename Overview
packages/db/scripts/migrate-block-api-keys-to-byok.ts New one-shot migration script that reads block-level API keys from workflow sub-blocks and inserts them as workspace BYOK keys. Contains a logic bug where (result as any).rowCount is always undefined for the postgres-js driver, causing the skip/insert statistics to be miscounted. Also hardcodes apiKey as the sub-block field name which could silently miss keys stored under different names.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Parse CLI args\n--map blockType=providerId\n--dry-run] --> B[Query distinct workspaceIds\nwith matching block types]
    B --> C{For each workspace}
    C --> D[Query all candidate blocks\nfor this workspace]
    D --> E[Extract raw API key refs\nper provider]
    E --> F{Has env var\nreferences?}
    F -- Yes --> G[Load workspace env vars\n+ personal env vars]
    F -- No --> H
    G --> H[For each provider:\nresolve all keys]
    H --> I{Multiple distinct\nresolved keys?}
    I -- Yes --> J[Log CONFLICT\nUse first resolved key]
    I -- No --> K[Use single resolved key]
    J --> L{DRY_RUN?}
    K --> L
    L -- Yes --> M[Print preview line\nNo DB write]
    L -- No --> N[encryptSecret key\nInsert into workspace_byok_keys\nonConflictDoNothing]
    N --> O{rowCount check\n⚠️ always undefined}
    O -- undefined === 0 false --> P[Count as inserted\n even if skipped]
    O -- never reached --> Q[Count as skipped]
    M --> R[Print summary]
    P --> R
    Q --> R
Loading

Last reviewed commit: 353ef31

Comment on lines +290 to +296
const blocks = await db
.select({
blockId: workflowBlocks.id,
blockName: workflowBlocks.name,
blockType: workflowBlocks.type,
subBlocks: workflowBlocks.subBlocks,
workflowId: workflow.id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded apiKey sub-block ID may silently miss keys for some providers

The script unconditionally reads API keys via subBlocks?.apiKey?.value. If any of the target block types store their credential under a differently-named sub-block key (e.g., api_key, token, secret), those keys will be silently skipped with no warning — they won't appear in Skipped (empty key) stats either, because the value simply won't be found.

Before running a live migration it is worth verifying (or documenting) that every mapped block type in BLOCK_TYPE_TO_PROVIDER uses exactly apiKey as the sub-block identifier. A quick sanity-check log line listing how many non-empty apiKey values were found per block type would make this more auditable.

Comment on lines +410 to +433

// Detect conflicting values
const distinctKeys = new Set(resolved.map((r) => r.key))
if (distinctKeys.size > 1) {
stats.conflicts++
console.log(` [CONFLICT] provider "${providerId}": ${distinctKeys.size} distinct keys`)
for (const { ref, key } of resolved) {
const display = isEnvVarReference(ref.rawValue)
? `${ref.rawValue} -> ${maskKey(key)}`
: maskKey(ref.rawValue)
console.log(` "${ref.blockName}" in "${ref.workflowName}": ${display}`)
}
console.log(' Using first resolved key')
}

// Use the first resolved key
const chosen = resolved[0]

if (DRY_RUN) {
console.log(
` [DRY RUN] Would insert BYOK for provider "${providerId}": ${maskKey(chosen.key)}`
)
continue
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conflicts are silently resolved in live run; no abort option

The PR description states: "The dry run flag will detect these cases and raise them so we can look on a case-by-case basis." However, in the live run the conflict is logged as [CONFLICT] but the migration immediately proceeds to insert the first resolved key without any pause or opt-in confirmation.

For a one-shot production migration, consider adding an --abort-on-conflict flag (or making the live run abort by default when conflicts are found), so operators are not surprised when their conflict review leads to no action. At minimum, a note in the script header documenting this behavior would be helpful.

Comment on lines +254 to +270

const stats = {
workspacesProcessed: 0,
workspacesSkipped: 0,
conflicts: 0,
inserted: 0,
skippedExisting: 0,
errors: 0,
envVarFailures: 0,
}

try {
// 1. Get distinct workspace IDs that have matching blocks
const mappedBlockTypes = Object.keys(BLOCK_TYPE_TO_PROVIDER)
const agentTypes = Object.keys(TOOL_INPUT_SUBBLOCK_IDS)
const allBlockTypes = [...new Set([...mappedBlockTypes, ...agentTypes])]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial workspace discovery query may include false-positive workspaces

allBlockTypes is the union of mappedBlockTypes (the providers we care about) and agentTypes (agent, human_in_the_loop). This means any workspace that has any agent or HITL block — even one with no tool configured — will be included in the outer loop and generate a per-workspace log line. For large deployments this is noisy and wastes DB round-trips.

Consider restricting the initial query to only mappedBlockTypes, and querying the agent/HITL blocks only inside the workspace loop (or alongside the mapped blocks in a single per-workspace query). That way a workspace that has agents but no matching tool API keys is excluded upfront rather than found and then immediately skipped as "No API keys found".

Comment on lines +451 to +456
if ((result as any).rowCount === 0) {
console.log(` [SKIP] BYOK already exists for provider "${providerId}"`)
stats.skippedExisting++
} else {
console.log(` [INSERT] BYOK for provider "${providerId}": ${maskKey(chosen.key)}`)
stats.inserted++
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rowCount does not exist on postgres-js insert result

With drizzle-orm + postgres-js (the driver used in this project), calling db.insert(...).onConflictDoNothing() without .returning() returns a postgres.js command-result object whose relevant property is count (a BigInt), not rowCount. Because (result as any).rowCount will always be undefined, the expression undefined === 0 is always false — the skip branch is never entered. Every conflict-silenced insert will be incorrectly counted as stats.inserted++ instead of stats.skippedExisting++.

The safest fix is to add .returning({ id: workspaceBYOKKeys.id }) and check whether the returned array is empty — which is how the production route.ts already handles insert results. This makes the statistics trustworthy, which is especially important for a one-shot data migration.

for (const row of personalRows) {
personalEnvCache.set(row.userId, (row.variables as Record<string, string>) || {})
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All env vars loaded into memory per workspace

Low Severity

The PR reviewer specifically flagged that env vars should not all be loaded into memory. While the code now loads per-workspace rather than globally, it still does select() (all columns) from workspaceEnvironment and environment, pulling the entire variables JSON blob for each workspace and user. This includes every env var, not just the ones referenced by blocks. The referenced env var names are known from the providerKeys refs — the script could first collect the needed var names, then only decrypt those specific entries rather than loading all variables into wsEnvVars and personalEnvCache.

Fix in Cursor Fix in Web

@@ -0,0 +1,598 @@
#!/usr/bin/env bun
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd have to include this in pure SQL in an actual migration at the top. We can discuss this on Monday morning. It's hard to tell our self-hosted users to run the script is what we realized.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, this one only applies for hosted -- so actually a script is fine here

@gitguardian
Copy link

gitguardian bot commented Mar 10, 2026

️✅ There are no secrets present in this pull request anymore.

If these secrets were true positive and are still valid, we highly recommend you to revoke them.
While these secrets were previously flagged, we no longer have a reference to the
specific commits where they were detected. Once a secret has been leaked into a git
repository, you should consider it compromised, even if it was deleted immediately.
Find here more information about risks.


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@TheodoreSpeaks TheodoreSpeaks changed the base branch from staging to feat/mothership-copilot March 10, 2026 04:34
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

}

console.log(` [${index}/${total}] Done with workspace ${workspaceId}\n`)
return { stats, shouldWriteWorkspaceId: DRY_RUN }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dry run writes workspace IDs even without insertable keys

Low Severity

shouldWriteWorkspaceId: DRY_RUN is unconditionally true for every processed workspace during a dry run, even when no keys actually resolved successfully (e.g., all env var lookups failed or all were skipped as non-owner personal keys). This causes the output file to include workspace IDs that have no valid keys to migrate. The live run then wastes time re-processing those workspaces with no result, and the dry-run summary inaccurately reports the count of "workspace IDs (with keys)."

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants