Skip to content

feat: Add metadata system tables#285

Merged
lxy-9602 merged 26 commits into
alibaba:mainfrom
suxiaogang223:codex/system-table-metadata-pr3
May 25, 2026
Merged

feat: Add metadata system tables#285
lxy-9602 merged 26 commits into
alibaba:mainfrom
suxiaogang223:codex/system-table-metadata-pr3

Conversation

@suxiaogang223
Copy link
Copy Markdown
Contributor

@suxiaogang223 suxiaogang223 commented May 16, 2026

Background

This PR is part of #141 and continues the system table work after the previously merged options, audit_log, and binlog support.

The scope of this PR is table-level metadata system tables. It adds read-only support for metadata that can be served from existing table metadata files, so users can inspect snapshots, schemas, tags, branches, and consumers through the existing system table query path.

Architecture

  • Introduce InMemorySystemTable for system tables whose output can be materialized as a single in-memory Arrow RecordBatch.

    • It centralizes singleton split scan, one-shot table read, and Arrow C data export.
    • options, snapshots, schemas, tags, branches, and consumers all use this execution model.
  • Keep metadata table classes Java-aligned and avoid an extra metadata base class.

    • OptionsSystemTable, SnapshotsSystemTable, SchemasSystemTable, TagsSystemTable, BranchesSystemTable, and ConsumersSystemTable directly inherit InMemorySystemTable.
    • Shared metadata location state is held by a small MetadataSystemTableContext instead of a thin inheritance layer.
    • audit_log and binlog intentionally remain outside this hierarchy because they are data/changelog-backed system tables, not in-memory metadata tables.
  • Group pure metadata system tables in metadata_system_tables.

    • options is now colocated with the other in-memory metadata system tables because it has the same scan/read shape and row-to-Arrow conversion path.
    • options differs only in data source: it reads latest schema options, while the new tables read metadata through managers.
  • Reuse row-to-Arrow conversion for in-memory system table output.

    • Add GenericRowToArrowArrayConverter on top of the existing RowToArrowArrayConverter infrastructure.
    • In-memory metadata system tables build GenericRow values and convert them through the shared converter instead of maintaining table-local Arrow builders.
  • Refactor SystemTableLoader to use a registry/factory table.

    • IsSupported and Load now use the same registry entry list.
    • This avoids adding each new system table name in multiple places.
  • Add metadata managers/helpers where needed.

    • BranchManager::ListBranches is the shared branch listing entry point.
    • ConsumerManager owns consumer path/list/read logic for consumer/consumer-* files.

System Tables Added

This PR follows the Apache Paimon table-scoped system table model, where table metadata is queried with names like table$snapshots. Reference semantics: https://paimon.apache.org/docs/master/concepts/system-tables/

  • table$snapshots

    • Purpose: exposes snapshot commit history for the data table.
    • Source: snapshot metadata files through SnapshotManager.
    • Columns: snapshot_id, schema_id, commit_user, commit_identifier, commit_kind, commit_time, base_manifest_list, delta_manifest_list, changelog_manifest_list, total_record_count, delta_record_count, changelog_record_count, watermark, next_row_id.
    • Ordering: sorted by snapshot_id ascending.
  • table$schemas

    • Purpose: exposes schema evolution history and allows snapshot schema ids to be resolved to concrete schema versions.
    • Source: schema metadata files through SchemaManager.
    • Columns: schema_id, fields, partition_keys, primary_keys, options, comment, update_time.
    • Note: fields, partition_keys, primary_keys, and options are returned as JSON strings in this first C++ version.
    • Ordering: sorted by schema_id ascending.
  • table$tags

    • Purpose: exposes table tags and the snapshots they reference.
    • Source: tag metadata through TagManager.
    • Columns: tag_name, snapshot_id, schema_id, commit_time, record_count, create_time, time_retained.
  • table$branches

    • Purpose: exposes available table branches, including main.
    • Source: branch metadata paths through BranchManager.
    • Columns: branch_name, create_time.
  • table$consumers

    • Purpose: exposes persisted streaming consumer progress by consumer id.
    • Source: consumer metadata files through ConsumerManager.
    • Columns: consumer_id, next_snapshot_id.
    • Note: supports both plain numeric consumer files and Java-style JSON files with nextSnapshot.

Scope Notes

  • This PR does not implement heavier file/manifest-backed metadata tables such as files and manifests; those are planned for a follow-up PR.
  • Existing options, audit_log, and binlog behavior is preserved.

Tests

  • Fedora: cmake --build build-codex-metadata-pr3 -j$(nproc)
  • Fedora: cmake --build build-codex-metadata-pr3 --target paimon-core-test paimon-read-inte-test -j2
  • Fedora: ./build-codex-metadata-pr3/debug/paimon-core-test --gtest_filter=FileSystemCatalogTest.TestMetadataSystemTableCatalog
  • Fedora: ./build-codex-metadata-pr3/debug/paimon-read-inte-test --gtest_filter=SystemTableReadInteTest.TestReadMetadataSystemTables:SystemTableReadInteTest.TestReadTagBranchAndConsumerSystemTables
  • pre-commit passed

@suxiaogang223 suxiaogang223 changed the title Add metadata system tables feat: Add metadata system tables May 16, 2026
@suxiaogang223 suxiaogang223 marked this pull request as ready for review May 16, 2026 07:32
@zjw1111 zjw1111 requested a review from Copilot May 18, 2026 03:46
@zjw1111 zjw1111 requested review from Copilot and removed request for Copilot May 18, 2026 06:08
@zjw1111
Copy link
Copy Markdown
Collaborator

zjw1111 commented May 18, 2026

@suxiaogang223 Hi, due to some temporary issues with GitHub Copilot, the Copilot review results could not be displayed directly. I extracted the review results from the Copilot logs for your reference.
image
image
image

Comment thread src/paimon/core/table/system/in_memory_system_table.cpp Outdated
Comment thread src/paimon/core/table/system/in_memory_system_table.cpp
Comment thread src/paimon/core/table/system/metadata_system_table.h Outdated
Comment thread src/paimon/core/table/system/metadata_system_tables.cpp
Comment thread src/paimon/core/table/system/metadata_system_tables.cpp Outdated
Comment thread src/paimon/core/table/system/audit_log_system_table.h
Comment thread src/paimon/core/utils/branch_manager.cpp
Comment thread src/paimon/core/utils/consumer_manager.h
Comment thread src/paimon/core/utils/consumer_manager.cpp
Comment thread src/paimon/core/utils/tag_manager.cpp
@lxy-9602
Copy link
Copy Markdown
Collaborator

Thanks for the pr! The system table implementation is shaping up well — great work!

Comment thread src/paimon/core/table/system/metadata_system_tables.cpp
@lxy-9602 lxy-9602 requested a review from lszskye May 19, 2026 02:34
Comment thread src/paimon/core/table/system/metadata_system_tables.cpp
Comment thread src/paimon/core/core_options.cpp
Comment thread src/paimon/core/utils/tag_manager.cpp
Copy link
Copy Markdown
Collaborator

@lxy-9602 lxy-9602 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@lxy-9602 lxy-9602 merged commit a804a3f into alibaba:main May 25, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants