Skip to content

refact: Reuse RowGroupPageIndexReader across columns to improve page-level predicate pushdown performance regression on wide tables#316

Merged
lxy-9602 merged 4 commits into
alibaba:mainfrom
zhf999:page-fix
Jun 1, 2026
Merged

refact: Reuse RowGroupPageIndexReader across columns to improve page-level predicate pushdown performance regression on wide tables#316
lxy-9602 merged 4 commits into
alibaba:mainfrom
zhf999:page-fix

Conversation

@zhf999
Copy link
Copy Markdown
Contributor

@zhf999 zhf999 commented May 29, 2026

Purpose

Linked issue: close #137

In PageFilteredRowGroupReader::ReadFilteredRowGroup, the original implementation passed PageIndexReader into ReadFilteredColumn, where each column called page_index_reader->RowGroup(row_group_index) to reconstruct a RowGroupPageIndexReader. The RowGroup() call reads and parses all ColumnIndex and OffsetIndex metadata for every column in that row group. For wide tables (hundreds or even thousands of columns), the same metadata was redundantly read N times (N = number of columns being read), causing page index read time to scale linearly and resulting in a significant performance drop for point-query scenarios where page-level predicate pushdown was supposed to help.

Fix: Simply Hoist the RowGroupPageIndexReader creation out of the per-column loop in ReadFilteredRowGroup — construct it once and reuse it across all columns. Change the ReadFilteredColumn parameter type from PageIndexReader to RowGroupPageIndexReader, eliminating redundant metadata reads.

Tests

On a test case with a wide table containing over 1,000 columns, the original code took significantly longer to execute queries. The flamegraph clearly shows that RowGroup() consumed a substantial portion of the total execution time:
image
image

After applying the changes in this PR, range query performance on the wide table shows a significant improvement:
image

The redundant RowGroup() calls are no longer visible in the flamegraph:
image

API and Format

Not API and format changes.

Documentation

No new features were introduced.

Generative AI tooling

Claude code (Opus 4.6)

zhf999 added 2 commits May 28, 2026 17:42
…edRowGroupReader::ReadFilteredRowGroup to avoid performance drop on wide tables
Copilot AI review requested due to automatic review settings May 29, 2026 02:26
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 29, 2026

CLA assistant check
All committers have signed the CLA.

@zhf999
Copy link
Copy Markdown
Contributor Author

zhf999 commented May 29, 2026

cc @liangjie3138 . This PR optimizes the RowGroupPageIndexReader usage introduced in #232. Please take a look when you have a chance.

I'm wondering if you have any plans to push bitmap filters down to the page level. If not, I'd like to pick this up — it would be a natural next step and could further improve point-query performance on wide tables.

@zhf999 zhf999 changed the title Reuse RowGroupPageIndexReader across columns to fix page-level predicate pushdown performance regression on wide tables fix: Reuse RowGroupPageIndexReader across columns to fix page-level predicate pushdown performance regression on wide tables May 29, 2026
@lxy-9602 lxy-9602 changed the title fix: Reuse RowGroupPageIndexReader across columns to fix page-level predicate pushdown performance regression on wide tables refact: Reuse RowGroupPageIndexReader across columns to improve page-level predicate pushdown performance regression on wide tables Jun 1, 2026
Copy link
Copy Markdown
Collaborator

@lxy-9602 lxy-9602 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@liangjie3138
Copy link
Copy Markdown
Contributor

cc @liangjie3138 . This PR optimizes the RowGroupPageIndexReader usage introduced in #232. Please take a look when you have a chance.

I'm wondering if you have any plans to push bitmap filters down to the page level. If not, I'd like to pick this up — it would be a natural next step and could further improve point-query performance on wide tables.

LGTM 👍 Go ahead with the page-level bitmap pushdown.

@lxy-9602 lxy-9602 merged commit 96d21b3 into alibaba:main Jun 1, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Support page-level bitmap pushdown for parquet format

4 participants