Skip to content

Fix parallel storage chunk upload state#12209

Open
TorstenDittmann wants to merge 7 commits into1.9.xfrom
fix/parallel-storage-chunk-upload
Open

Fix parallel storage chunk upload state#12209
TorstenDittmann wants to merge 7 commits into1.9.xfrom
fix/parallel-storage-chunk-upload

Conversation

@TorstenDittmann
Copy link
Copy Markdown
Contributor

Summary

  • Add Redis-backed upload state locks for storage file chunk metadata and finalization.
  • Keep chunk writes outside the lock so same-file chunks can upload concurrently.
  • Add an e2e test for parallel large-file chunk uploads.

Tests

  • docker compose exec appwrite test tests/e2e/Services/Storage --filter='testCreateBucketFile(OutOfOrder|ParallelChunksLargeFile)'
  • docker compose exec appwrite test tests/e2e/Services/Storage

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 4, 2026

Greptile Summary

This PR introduces Redis-backed distributed locks to serialize chunk-upload state reads and finalizations across concurrent requests for storage files, function deployments, and site deployments. Chunk writes themselves remain outside the lock, preserving parallel throughput. End-to-end tests using Swoole coroutines verify the parallel-upload path and include retry logic for 429 responses.

Confidence Score: 4/5

Safe to merge with minor follow-up; the lock design is sound but the checkLock TTL configuration in Functions/Sites is slightly inconsistent.

All findings are P2. The core double-finalization race is correctly prevented by the lock + in-lock DB re-read. LockContention now correctly returns 429. Previous P1 concerns from the thread appear addressed in the current code. The only new finding is that checkLock in Functions/Sites uses ttl:120 matching its timeout:120, which is an edge-case concern with no practical impact for a fast DB-read critical section.

Functions/Http/Deployments/Create.php and Sites/Http/Deployments/Create.php — checkLock TTL configuration.

Important Files Changed

Filename Overview
src/Appwrite/Platform/Modules/Storage/Http/Buckets/Files/Create.php Core change: single Distributed lock (ttl:600, timeout:120) serializes the pre-check and finalization phases; chunk writes remain outside the lock. Lock re-reads DB state inside the second critical section to guard against double-finalization. Event dispatch and early-return paths look correct.
src/Appwrite/Platform/Modules/Functions/Http/Deployments/Create.php Uses two Distributed locks sharing the same Redis key (checkLock ttl:120, stateLock ttl:600). The checkLock's TTL equals its withLock timeout (both 120), creating a narrow edge case if the holder crashes, unlike Storage's lock which uses ttl:600.
src/Appwrite/Platform/Modules/Sites/Http/Deployments/Create.php Mirrors the Functions deployment pattern with the same checkLock(ttl:120)/stateLock(ttl:600) design on a shared key; carries the same checkLock TTL-equals-timeout concern.
tests/e2e/Services/Storage/StorageBase.php Adds testCreateBucketFileParallelChunksLargeFile: fires all 4 chunks concurrently via Swoole coroutines with up to 3 retries on 429 using the Retry-After header, then verifies integrity via download+hash comparison. Cleanup in finally block is solid.
composer.json Adds utopia-php/lock as a dev-main VCS dependency with a custom repositories entry; composer.lock pins to commit 22eda789. The unpinned dev-main ref is a reproducibility concern already flagged in prior review.

Reviews (6): Last reviewed commit: "Fix deployment upload analyze warning" | Re-trigger Greptile

Comment thread src/Appwrite/Platform/Modules/Storage/Http/Buckets/Files/Create.php
Comment thread src/Appwrite/Platform/Modules/Storage/Http/Buckets/Files/Create.php
Comment thread src/Appwrite/Platform/Modules/Storage/Http/Buckets/Files/Create.php
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

✨ Benchmark results

Comparing 1.9.x (before) to fix/parallel-storage-chunk-upload (after).

Before

Scenario P50 (ms) P95 (ms) Requests RPS
API total 14.94 149.82 185 33.02
Account 25.23 184.51 35 6.94
TablesDB 14.19 21.63 35 8.35
Storage 12.67 51.76 75 17.52
Functions 20.13 34.04 40 9.37

After

Scenario P50 (ms) P95 (ms) Requests RPS
API total 15.02 143.83 185 33.49
Account 26.92 213.21 35 6.92
TablesDB 14.74 24.44 35 8.19
Storage 12.66 46.87 75 17.26
Functions 19.94 34.52 40 9.34

Delta

Scenario P95 delta (ms)
API total -5.99
Account +28.69
TablesDB +2.81
Storage -4.89
Functions +0.47
Top API waits
API request Max wait (ms)
account.sessions.email.create 614.43
account.create 213.06
account.password.update 164.58

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

🔄 PHP-Retry Summary

Flaky tests detected across commits:

Commit 9927e4b - 2 flaky tests
Test Retries Total Time Details
UsageTest::testPrepareDatabaseStatsTablesAPI 1 32.85s Logs
TablesDBTransactionsConsoleClientTest::testBulkCreate 1 240.83s Logs
Commit d58929b - 28 flaky tests
Test Retries Total Time Details
UsageTest::testFunctionsStats 1 10.20s Logs
UsageTest::testPrepareSitesStats 1 7ms Logs
UsageTest::testEmbeddingsTextUsageDoesNotBreakProjectUsage 1 5ms Logs
WebhooksCustomServerTest::testDeleteDeployment 1 15ms Logs
WebhooksCustomServerTest::testDeleteFunction 1 10ms Logs
WebhooksCustomServerTest::testCreateCollection 1 9ms Logs
WebhooksCustomServerTest::testCreateAttributes 1 12ms Logs
WebhooksCustomServerTest::testCreateDocument 1 17ms Logs
WebhooksCustomServerTest::testUpdateDocument 1 80ms Logs
WebhooksCustomServerTest::testDeleteDocument 1 13ms Logs
WebhooksCustomServerTest::testCreateTable 1 14ms Logs
WebhooksCustomServerTest::testCreateColumns 1 36ms Logs
WebhooksCustomServerTest::testCreateRow 1 15ms Logs
WebhooksCustomServerTest::testUpdateRow 1 10ms Logs
WebhooksCustomServerTest::testDeleteRow 1 20ms Logs
WebhooksCustomServerTest::testCreateStorageBucket 1 9ms Logs
WebhooksCustomServerTest::testUpdateStorageBucket 1 11ms Logs
WebhooksCustomServerTest::testCreateBucketFile 1 10ms Logs
WebhooksCustomServerTest::testUpdateBucketFile 1 7ms Logs
WebhooksCustomServerTest::testDeleteBucketFile 1 6ms Logs
WebhooksCustomServerTest::testDeleteStorageBucket 1 17ms Logs
WebhooksCustomServerTest::testCreateTeam 1 8ms Logs
WebhooksCustomServerTest::testUpdateTeam 1 13ms Logs
WebhooksCustomServerTest::testUpdateTeamPrefs 1 15ms Logs
WebhooksCustomServerTest::testDeleteTeam 1 6ms Logs
WebhooksCustomServerTest::testCreateTeamMembership 1 16ms Logs
WebhooksCustomServerTest::testDeleteTeamMembership 1 10ms Logs
WebhooksCustomServerTest::testWebhookAutoDisable 1 56ms Logs
Commit 6e19db1 - 28 flaky tests
Test Retries Total Time Details
WebhooksCustomServerTest::testDeleteDeployment 1 204ms Logs
WebhooksCustomServerTest::testDeleteFunction 1 10ms Logs
WebhooksCustomServerTest::testCreateCollection 1 8ms Logs
WebhooksCustomServerTest::testCreateAttributes 1 8ms Logs
WebhooksCustomServerTest::testCreateDocument 1 63ms Logs
WebhooksCustomServerTest::testUpdateDocument 1 12ms Logs
WebhooksCustomServerTest::testDeleteDocument 1 26ms Logs
WebhooksCustomServerTest::testCreateTable 1 15ms Logs
WebhooksCustomServerTest::testCreateColumns 1 17ms Logs
WebhooksCustomServerTest::testCreateRow 1 18ms Logs
WebhooksCustomServerTest::testUpdateRow 1 18ms Logs
WebhooksCustomServerTest::testDeleteRow 1 12ms Logs
WebhooksCustomServerTest::testCreateStorageBucket 1 7ms Logs
WebhooksCustomServerTest::testUpdateStorageBucket 1 8ms Logs
WebhooksCustomServerTest::testCreateBucketFile 1 9ms Logs
WebhooksCustomServerTest::testUpdateBucketFile 1 5ms Logs
WebhooksCustomServerTest::testDeleteBucketFile 1 5ms Logs
WebhooksCustomServerTest::testDeleteStorageBucket 1 12ms Logs
WebhooksCustomServerTest::testCreateTeam 1 6ms Logs
WebhooksCustomServerTest::testUpdateTeam 1 8ms Logs
WebhooksCustomServerTest::testUpdateTeamPrefs 1 10ms Logs
WebhooksCustomServerTest::testDeleteTeam 1 5ms Logs
WebhooksCustomServerTest::testCreateTeamMembership 1 11ms Logs
WebhooksCustomServerTest::testDeleteTeamMembership 1 8ms Logs
WebhooksCustomServerTest::testWebhookAutoDisable 1 21ms Logs
UsageTest::testFunctionsStats 1 10.14s Logs
UsageTest::testPrepareSitesStats 1 7ms Logs
UsageTest::testEmbeddingsTextUsageDoesNotBreakProjectUsage 1 5ms Logs
Commit fd83090 - 4 flaky tests
Test Retries Total Time Details
UsageTest::testFunctionsStats 1 10.12s Logs
UsageTest::testPrepareSitesStats 1 7ms Logs
UsageTest::testEmbeddingsTextUsageDoesNotBreakProjectUsage 1 5ms Logs
FunctionsScheduleTest::testCreateScheduledAtExecution 1 131.11s Logs
Commit 5c7e08f - 4 flaky tests
Test Retries Total Time Details
FunctionsScheduleTest::testCreateScheduledAtExecution 1 127.07s Logs
UsageTest::testFunctionsStats 1 10.19s Logs
UsageTest::testPrepareSitesStats 1 7ms Logs
UsageTest::testEmbeddingsTextUsageDoesNotBreakProjectUsage 1 5ms Logs

Note: Flaky test results are tracked for the last 5 commits

@TorstenDittmann
Copy link
Copy Markdown
Contributor Author

Addressed the Greptile review feedback:

  • Changed lock contention from a 500 to 429 (GENERAL_RATE_LIMIT_EXCEEDED) and added Retry-After: 5.
  • Reduced the distributed lock TTL to 120 to match the lock acquisition timeout/commented behavior.
  • Fixed the double-finalization race by basing finalization on freshly-read DB progress inside the second lock and only finalizing when the stored upload progress is still incomplete.
  • Kept uploaded chunk progress monotonic with max($uploaded, $chunksUploaded).

Verified with:

  • php -l src/Appwrite/Platform/Modules/Storage/Http/Buckets/Files/Create.php
  • vendor/bin/pint --test --config pint.json src/Appwrite/Platform/Modules/Storage/Http/Buckets/Files/Create.php tests/e2e/Services/Storage/StorageBase.php
  • docker compose exec appwrite test tests/e2e/Services/Storage --filter='testCreateBucketFile(OutOfOrder|ParallelChunksLargeFile)'
  • docker compose exec appwrite test tests/e2e/Services/Storage

Comment thread src/Appwrite/Platform/Modules/Storage/Http/Buckets/Files/Create.php
@TorstenDittmann
Copy link
Copy Markdown
Contributor Author

Addressed the latest Greptile TTL finding:

  • Restored the distributed lock hold TTL to 600 seconds so the lock can outlive long finalization work such as AV scan, compression, encryption, and the DB write.
  • Kept the withLock(..., timeout: 120.0) acquisition timeout unchanged so waiting callers still stop after 120 seconds and receive the existing retryable 429 response.
  • Also removed the unused mode injection/action parameter from createFile.

Verified with:

  • php -l src/Appwrite/Platform/Modules/Storage/Http/Buckets/Files/Create.php
  • vendor/bin/pint --test --config pint.json src/Appwrite/Platform/Modules/Storage/Http/Buckets/Files/Create.php
  • docker compose exec appwrite test tests/e2e/Services/Storage --filter='testCreateBucketFile(OutOfOrder|ParallelChunksLargeFile)'
  • docker compose exec appwrite test tests/e2e/Services/Storage

Comment thread src/Appwrite/Platform/Modules/Storage/Http/Buckets/Files/Create.php Outdated
@TorstenDittmann
Copy link
Copy Markdown
Contributor Author

Addressed the latest Greptile review feedback:

  • Guarded queueForEvents so the files.create event is only queued when chunksUploaded === chunks, preventing duplicate create events for intermediate chunks.
  • Added bounded retry handling for 429 responses in testCreateBucketFileParallelChunksLargeFile, retrying only lock-contention responses and respecting Retry-After when present.

Verified locally with:

  • php -l src/Appwrite/Platform/Modules/Storage/Http/Buckets/Files/Create.php
  • php -l tests/e2e/Services/Storage/StorageBase.php
  • vendor/bin/pint --test --config pint.json src/Appwrite/Platform/Modules/Storage/Http/Buckets/Files/Create.php tests/e2e/Services/Storage/StorageBase.php
  • docker compose exec appwrite test tests/e2e/Services/Storage --filter='testCreateBucketFile(OutOfOrder|ParallelChunksLargeFile)'
  • docker compose exec appwrite test tests/e2e/Services/Storage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant