Add CTE strategy for eager loading HasMany/BelongsToMany associations#19318
Add CTE strategy for eager loading HasMany/BelongsToMany associations#19318dereuromark wants to merge 3 commits into5.nextfrom
Conversation
This adds a new 'cte' strategy for eager loading that uses Common Table
Expressions instead of WHERE IN clauses. This is beneficial for large
result sets as it:
- Avoids packet size limits from large WHERE IN clauses
- Reduces PHP memory usage by keeping IDs in the database
- Allows the database to optimize the join more effectively
Usage:
```php
$this->hasMany('Articles', ['strategy' => 'cte']);
// Or at query time
$query->contain(['Articles' => ['strategy' => 'cte']]);
```
Refs #19107
cffd230 to
b873969
Compare
|
This should target |
There was a problem hiding this comment.
Pull request overview
Adds a new eager-loading strategy ('cte') for HasMany and BelongsToMany associations that uses a Common Table Expression + join instead of WHERE IN (...), aiming to improve performance/memory usage for large parent result sets.
Changes:
- Introduces
Association::STRATEGY_CTEand enables it forHasMany/BelongsToMany. - Implements CTE-based filtering in
SelectLoaderviawith()+innerJoin()against the CTE. - Adds
HasManytest coverage for the new strategy (guarded by driver CTE support).
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/ORM/Association.php |
Adds the public STRATEGY_CTE constant. |
src/ORM/Association/HasMany.php |
Allows the cte strategy for hasMany eager loading. |
src/ORM/Association/BelongsToMany.php |
Allows the cte strategy for belongsToMany eager loading. |
src/ORM/Association/Loader/SelectLoader.php |
Implements CTE-based filtering/join for eager loading. |
tests/TestCase/ORM/Association/HasManyTest.php |
Adds tests covering HasMany eager loading using cte. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if ($useSubquery) { | ||
| $filter = $this->_buildSubquery($selectQuery); | ||
| $fetchQuery = $this->_addFilteringJoin($fetchQuery, $key, $filter); | ||
| } elseif ($options['strategy'] === Association::STRATEGY_CTE) { | ||
| $cteQuery = $this->_buildSubquery($selectQuery); | ||
| $fetchQuery = $this->_addFilteringCTE($fetchQuery, $key, $cteQuery); | ||
| } else { |
There was a problem hiding this comment.
The new cte eager-loading branch doesn't validate that the current driver supports CTEs. If users set strategy => 'cte' on an unsupported DB/version, the generated SQL will include a WITH clause and fail at runtime with a database error. Consider failing fast (eg. throw DatabaseException with a clear message) or transparently falling back to subquery/select when !$fetchQuery->getDriver()->supports(DriverFeatureEnum::CTE).
There was a problem hiding this comment.
This is an opt-in configuration - developers explicitly choose this strategy. A reasonable counterargument is that developers should know their database capabilities.
|
@dereuromark Could add performance comparison when using the |
Addresses review feedback to add test coverage for the CTE eager loading strategy with BelongsToMany associations, including: - Basic CTE strategy test with junction table - Multiple parent records test - Limit on parent query test - Conditions on parent query test - Junction table conditions test (using SpecialTags) Also updates testRequiresKeys() to verify CTE strategy does not require keys.
|
Also makes me wonder why we have |
|
@ADmad Updated the PR description with a full benchmark comparing all three strategies. Regarding why Historical reasons:
Modern reality:
The benchmark shows |
In that case is adding the
I would vote to change the default to |
|
Yes, still needed, see above |
Not sure how subquery strategy would be worse in comparison to the cte strategy in that regard.
|
|
Fair.
Either way we should switch the default then to subquery in 5.next as follow-up. |
|
I took a look at this and there are a couple issues, first 1 is pretty trivial.
in the test WITH _cte_Articles AS (
SELECT (Articles.id) FROM articles Articles WHERE Articles.id <> :c0 GROUP BY Articles.id
)
SELECT Tags.id AS Tags__id, Tags.name AS Tags__name, Tags.description AS Tags__description, Tags.created AS Tags__created FROM tags Tags INNER JOIN articles_tags ArticlesTags ON 1 = 1
_cte_Articles is never usedthe cte query here, has no affect since it's not being joined or used for filtering. the cte itself, could be omitted in favor of EXISTS(...) Where Tags are pulled by SELECT ArticlTags.,Tags. FROM tags Tags RIGHT JOIN article_tags ArticleTags ON(ArticleTags.tag_id = Tags.id) WHERE EXISTS(SELECT FROM articles JOIN article_tags ON(tag_id = Tags.id AND ( This would allow reuse the initial WHERE conditions instead of parsing the related ids and using them in IN. With BelongsToMany, we already have a good map of what belongs to what, using Here we can still use EXISTS(...) SELECT * FROM comments Comments WHERE EXISTS(SELECT FROM articles WHERE id = Comments.article_id AND ( To be fair, I haven't yet looked at the |
|
@justindenick Thanks for the detailed analysis, you're right. Looking at the SQL output you captured, the CTE is defined but never actually joined to filter the results - the The tests passing is indeed coincidental - the ORM's result marshalling uses Good catch on the Given that:
I'll probably close this PR. A separate PR for changing the default strategy to @celsowm What do you think? |
|
I agree |
Summary
This PR adds a new
'cte'strategy for eager loading HasMany and BelongsToMany associations that uses Common Table Expressions instead ofWHERE INclauses.Refs #19107
Problem
When eager loading with large result sets, the current
selectstrategy generates queries like:This has several issues:
max_allowed_packet(default 64MB) can be exceededSolution
The new CTE strategy generates:
Performance Comparison (MySQL 8.0)
Benchmark comparing all three eager loading strategies with varying parent record counts:
Key findings:
subqueryandcteare faster thanselectfor large datasetsselectuses significantly more PHP memory (holds all IDs in array)subqueryandcteperform similarlyselectandsubquerywith very large ID setsUsage
When to Use
max_allowed_packetlimits are a concernCompatibility
CTEs are supported in MySQL 8.0+, PostgreSQL, SQLite 3.8.3+, and SQL Server. CakePHP 5.x already requires modern DB versions with CTE support.