Skip to content

feat: add notebook metadata export with sources list (issue #171)#174

Open
furkankoykiran wants to merge 4 commits intoteng-lin:mainfrom
furkankoykiran:feat/export-notebook-metadata
Open

feat: add notebook metadata export with sources list (issue #171)#174
furkankoykiran wants to merge 4 commits intoteng-lin:mainfrom
furkankoykiran:feat/export-notebook-metadata

Conversation

@furkankoykiran
Copy link

Summary

Implements the export notebook metadata feature requested in issue #171.

Changes

1. New NotebookMetadata dataclass (types.py)

  • Combines notebook details (id, title, created_at, is_owner) with a simplified sources list
  • Includes a to_dict() method for JSON serialization
  • Exported in the public API via __init__.py

2. New NotebooksAPI.get_metadata() method (_notebooks.py)

  • Returns NotebookMetadata with notebook details and simplified sources
  • Reuses existing get() and SourcesAPI.list() methods
  • No new RPC methods required

3. New metadata CLI command (cli/notebook.py)

  • Usage: notebooklm metadata [-n NOTEBOOK_ID] [--no-json]
  • JSON output by default for easy parsing/export
  • --no-json flag for human-readable format
  • Supports partial ID resolution

Example Usage

Python API:

async with await NotebookLMClient.from_storage() as client:
    metadata = await client.notebooks.get_metadata(notebook_id)
    print(json.dumps(metadata.to_dict(), indent=2))

CLI:

# JSON output (default)
notebooklm metadata

# Human-readable format
notebooklm metadata --no-json

# Specific notebook
notebooklm metadata -n abc123

Output:

{
  "id": "abc123",
  "title": "AI Research Notes",
  "created_at": "2026-03-09T16:30:00",
  "is_owner": true,
  "sources": [
    {"type": "pdf", "title": "paper.pdf"},
    {"type": "url", "url": "https://example.com/article"}
  ]
}

Testing

  • ✅ Format: ruff format
  • ✅ Lint: ruff check
  • ✅ Type check: mypy
  • ✅ Tests: pytest (1812 passed, 9 skipped)

Note

The updated_at field requested in the issue is not included because it's not currently available in the Notebook API response parsing. This can be added in a future update if the timestamp location is identified in the raw API response.

Add a new dataclass that combines notebook details with a simplified
sources list for export/overview purposes. Includes a to_dict() method
for JSON serialization matching the output format requested in issue teng-lin#171.
Add a new method to NotebooksAPI that combines notebook details with
a simplified sources list. Reuses existing get() and SourcesAPI.list()
methods, avoiding new RPC calls.
Add a new CLI command for exporting notebook metadata with sources.
Outputs JSON by default for easy parsing, with --no-json flag for
human-readable format. Supports partial ID resolution.
Export NotebookMetadata dataclass in __init__.py to make it
available via the public API for users.
@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new feature that allows users to export comprehensive metadata for a given notebook, including its core details and a simplified list of associated sources. This enhancement provides a structured and easily consumable format for programmatic access or command-line inspection of notebook information, improving data accessibility and integration capabilities.

Highlights

  • New NotebookMetadata Dataclass: A new NotebookMetadata dataclass was added to src/notebooklm/types.py. This class combines notebook details (ID, title, creation timestamp, owner status) with a simplified list of sources and includes a to_dict() method for JSON serialization. It is also exported in the public API via __init__.py.
  • New NotebooksAPI.get_metadata() Method: A new asynchronous method get_metadata() was added to src/notebooklm/_notebooks.py. This method retrieves notebook details and its associated sources, returning them as a NotebookMetadata object. It reuses existing get() and SourcesAPI.list() methods, requiring no new RPC calls.
  • New metadata CLI Command: A new metadata command was added to the CLI in src/notebooklm/cli/notebook.py. This command allows users to export notebook metadata, including sources, from the command line. It defaults to JSON output for easy parsing but also supports a human-readable format via the --no-json flag and partial notebook ID resolution.
Changelog
  • src/notebooklm/init.py
    • Exported the new NotebookMetadata class.
  • src/notebooklm/_notebooks.py
    • Added the get_metadata asynchronous method to retrieve combined notebook and source metadata.
  • src/notebooklm/cli/notebook.py
    • Added a new metadata command to the CLI for exporting notebook metadata with options for JSON or human-readable output.
  • src/notebooklm/types.py
    • Defined the NotebookMetadata dataclass to encapsulate notebook details and a simplified list of sources, including a to_dict method for serialization.
Activity
  • The author performed local testing, including ruff format, ruff check, mypy, and pytest (1812 passed, 9 skipped).
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a feature to export notebook metadata, including a list of sources. The implementation is well-structured, adding a new NotebookMetadata type, a get_metadata API method, and a corresponding metadata CLI command. My review includes suggestions to improve performance by running asynchronous operations concurrently, and to simplify the CLI implementation for better maintainability and user experience.

Comment on lines +224 to +231
# Get notebook details
notebook = await self.get(notebook_id)

# Get sources list
from ._sources import SourcesAPI

sources_api = SourcesAPI(self._core)
sources = await sources_api.list(notebook_id)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The two await calls to get notebook details and the list of sources are independent and can be run concurrently using asyncio.gather. This will improve the performance of the method by reducing the total execution time.

Please also add import asyncio to the top of the file.

Suggested change
# Get notebook details
notebook = await self.get(notebook_id)
# Get sources list
from ._sources import SourcesAPI
sources_api = SourcesAPI(self._core)
sources = await sources_api.list(notebook_id)
# Get notebook details and sources list concurrently
from ._sources import SourcesAPI
import asyncio
sources_api = SourcesAPI(self._core)
notebook, sources = await asyncio.gather(
self.get(notebook_id),
sources_api.list(notebook_id),
)
References
  1. When downloading multiple files asynchronously, use asyncio.gather to execute downloads concurrently, improving performance over sequential downloads.

Comment on lines +211 to +221
@click.option(
"--json", "json_output", is_flag=True, default=True, help="Output as JSON (default: True)"
)
@click.option(
"--no-json",
"human_output",
is_flag=True,
help="Output in human-readable format instead of JSON",
)
@with_client
def metadata_cmd(ctx, notebook_id, json_output, human_output, client_auth):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The --json option is redundant and potentially confusing. The command defaults to JSON output, and the --no-json flag correctly switches to human-readable format. The json_output parameter is not used in the function body. Removing this option will simplify the command's interface and implementation.

    @click.option(
        "--no-json",
        "human_output",
        is_flag=True,
        help="Output in human-readable format instead of JSON",
    )
    @with_client
    def metadata_cmd(ctx, notebook_id, human_output, client_auth):

Comment on lines +274 to +281
url = source.get("url", "")

# Format source line
if url:
console.print(f" {i}. [{source_type}] {title}")
console.print(f" {url}")
else:
console.print(f" {i}. [{source_type}] {title}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is some code duplication in the if/else block for printing source information. You can simplify this by always printing the main source line and then conditionally printing the URL if it exists. This avoids repeating the console.print(f" {i}. [{source_type}] {title}") line.

Using source.get("url") instead of source.get("url", "") is also slightly cleaner as it will return None if the key is not present, which works well with the conditional check.

                            url = source.get("url")

                            # Format source line
                            console.print(f"  {i}. [{source_type}] {title}")
                            if url:
                                console.print(f"     {url}")

Copy link
Owner

@teng-lin teng-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution! The feature is well-scoped and the CLI placement as a top-level command is consistent with existing conventions. A few suggestions to bring it in line with the project's patterns before merge:


Architecture: SourcesAPI instantiation inside NotebooksAPI

get_metadata() does a local import and creates a standalone SourcesAPI(self._core), which bypasses the client's dependency injection pattern. The project uses constructor injection for cross-API references (e.g., ArtifactsAPI receives notes_api via its constructor).

Suggestion: Either move get_metadata() to NotebookLMClient itself (since it composes two sub-APIs), or inject SourcesAPI into NotebooksAPI via the constructor following the ArtifactsAPI/notes_api pattern.


Type design: sources: list[dict] should be a typed structure

This is the only list[dict] in the entire types.py module — every other collection uses typed elements (list[SuggestedTopic], list[SharedUser], etc.). Using untyped dicts means consumers lose type safety and IDE support.

Suggestion: Define a small dataclass for the source summary:

@dataclass
class SourceSummary:
    kind: SourceType
    title: str | None = None
    url: str | None = None

    def to_dict(self) -> dict[str, str | None]:
        d: dict[str, str | None] = {"type": str(self.kind)}
        if self.title:
            d["title"] = self.title
        if self.url:
            d["url"] = self.url
        return d

Then use sources: list[SourceSummary] in NotebookMetadata.

Additionally, consider composing with Notebook rather than duplicating its fields (id, title, created_at, is_owner). This avoids silent divergence if Notebook gains new fields:

@dataclass
class NotebookMetadata:
    notebook: Notebook
    sources: list[SourceSummary] = field(default_factory=list)

With to_dict() flattening the notebook fields for JSON output.


Source dict schema consistency

Currently, title and url keys are conditionally included — some source dicts will have them and some won't. This makes the JSON output harder to consume programmatically.

Suggestion: Always include all keys, using null for absent values, so the schema is consistent across all source entries.


CLI flags: --json / --no-json convention

Every other CLI command uses --json as an opt-in flag (default False) with human-readable output as the default. This command has --json defaulting to True plus a separate --no-json flag, which is inconsistent.

Suggestion: Follow the existing pattern — --json as opt-in, human-readable as default. Remove the --no-json flag. This also eliminates the unused json_output parameter in the function body.


Silent data loss when source listing fails

SourcesAPI.list() returns [] on various failure conditions (malformed response, API changes) with only a logger.warning. Since get_metadata() passes this through, users could see "sources": [] when the notebook actually has sources.

Suggestion: Cross-check notebook.sources_count against len(sources) and log a warning on mismatch:

if notebook.sources_count > 0 and len(sources) == 0:
    logger.warning(
        "Notebook %s reports %d sources but listing returned empty",
        notebook_id, notebook.sources_count,
    )

str(source.kind) behavior

SourceType is a str enum. On Python versions <3.11, str() on a StrEnum member may produce "SourceType.web_page" instead of "web_page". Consider using source.kind.value explicitly for safety.


Missing tests

The project has thorough test coverage at unit/integration/CLI levels for every comparable feature. Recommended additions:

Priority Test Location
1 get_metadata() happy path + empty sources tests/integration/test_notebooks.py
2 NotebookMetadata.to_dict() serialization (with and without created_at) tests/unit/test_types.py
3 CLI metadata JSON output + human-readable output tests/unit/cli/test_notebook.py

Overall this is a nice, focused feature — these suggestions are mostly about aligning with existing project conventions. Happy to help if any of these need further clarification!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants