feat: Add metadata system tables#285
Merged
lxy-9602 merged 26 commits intoMay 25, 2026
Merged
Conversation
Collaborator
|
@suxiaogang223 Hi, due to some temporary issues with GitHub Copilot, the Copilot review results could not be displayed directly. I extracted the review results from the Copilot logs for your reference. |
lxy-9602
reviewed
May 18, 2026
Collaborator
|
Thanks for the pr! The system table implementation is shaping up well — great work! |
lxy-9602
reviewed
May 19, 2026
lxy-9602
reviewed
May 21, 2026
lszskye
reviewed
May 21, 2026
lszskye
reviewed
May 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Background
This PR is part of #141 and continues the system table work after the previously merged
options,audit_log, andbinlogsupport.The scope of this PR is table-level metadata system tables. It adds read-only support for metadata that can be served from existing table metadata files, so users can inspect snapshots, schemas, tags, branches, and consumers through the existing system table query path.
Architecture
Introduce
InMemorySystemTablefor system tables whose output can be materialized as a single in-memory ArrowRecordBatch.options,snapshots,schemas,tags,branches, andconsumersall use this execution model.Keep metadata table classes Java-aligned and avoid an extra metadata base class.
OptionsSystemTable,SnapshotsSystemTable,SchemasSystemTable,TagsSystemTable,BranchesSystemTable, andConsumersSystemTabledirectly inheritInMemorySystemTable.MetadataSystemTableContextinstead of a thin inheritance layer.audit_logandbinlogintentionally remain outside this hierarchy because they are data/changelog-backed system tables, not in-memory metadata tables.Group pure metadata system tables in
metadata_system_tables.optionsis now colocated with the other in-memory metadata system tables because it has the same scan/read shape and row-to-Arrow conversion path.optionsdiffers only in data source: it reads latest schema options, while the new tables read metadata through managers.Reuse row-to-Arrow conversion for in-memory system table output.
GenericRowToArrowArrayConverteron top of the existingRowToArrowArrayConverterinfrastructure.GenericRowvalues and convert them through the shared converter instead of maintaining table-local Arrow builders.Refactor
SystemTableLoaderto use a registry/factory table.IsSupportedandLoadnow use the same registry entry list.Add metadata managers/helpers where needed.
BranchManager::ListBranchesis the shared branch listing entry point.ConsumerManagerowns consumer path/list/read logic forconsumer/consumer-*files.System Tables Added
This PR follows the Apache Paimon table-scoped system table model, where table metadata is queried with names like
table$snapshots. Reference semantics: https://paimon.apache.org/docs/master/concepts/system-tables/table$snapshotsSnapshotManager.snapshot_id,schema_id,commit_user,commit_identifier,commit_kind,commit_time,base_manifest_list,delta_manifest_list,changelog_manifest_list,total_record_count,delta_record_count,changelog_record_count,watermark,next_row_id.snapshot_idascending.table$schemasSchemaManager.schema_id,fields,partition_keys,primary_keys,options,comment,update_time.fields,partition_keys,primary_keys, andoptionsare returned as JSON strings in this first C++ version.schema_idascending.table$tagsTagManager.tag_name,snapshot_id,schema_id,commit_time,record_count,create_time,time_retained.table$branchesmain.BranchManager.branch_name,create_time.table$consumersConsumerManager.consumer_id,next_snapshot_id.nextSnapshot.Scope Notes
filesandmanifests; those are planned for a follow-up PR.options,audit_log, andbinlogbehavior is preserved.Tests
cmake --build build-codex-metadata-pr3 -j$(nproc)cmake --build build-codex-metadata-pr3 --target paimon-core-test paimon-read-inte-test -j2./build-codex-metadata-pr3/debug/paimon-core-test --gtest_filter=FileSystemCatalogTest.TestMetadataSystemTableCatalog./build-codex-metadata-pr3/debug/paimon-read-inte-test --gtest_filter=SystemTableReadInteTest.TestReadMetadataSystemTables:SystemTableReadInteTest.TestReadTagBranchAndConsumerSystemTables