feat: add notebook metadata export with sources list (issue #171) by furkankoykiran · Pull Request #174 · teng-lin/notebooklm-py

furkankoykiran · 2026-03-09T21:43:03Z

Summary

Implements the export notebook metadata feature requested in issue #171.

Changes

1. New `NotebookMetadata` dataclass (types.py)

Combines notebook details (id, title, created_at, is_owner) with a simplified sources list
Includes a to_dict() method for JSON serialization
Exported in the public API via __init__.py

2. New `NotebooksAPI.get_metadata()` method (_notebooks.py)

Returns NotebookMetadata with notebook details and simplified sources
Reuses existing get() and SourcesAPI.list() methods
No new RPC methods required

3. New `metadata` CLI command (cli/notebook.py)

Usage: notebooklm metadata [-n NOTEBOOK_ID] [--no-json]
JSON output by default for easy parsing/export
--no-json flag for human-readable format
Supports partial ID resolution

Example Usage

Python API:

async with await NotebookLMClient.from_storage() as client:
    metadata = await client.notebooks.get_metadata(notebook_id)
    print(json.dumps(metadata.to_dict(), indent=2))

CLI:

# JSON output (default)
notebooklm metadata

# Human-readable format
notebooklm metadata --no-json

# Specific notebook
notebooklm metadata -n abc123

Output:

{
  "id": "abc123",
  "title": "AI Research Notes",
  "created_at": "2026-03-09T16:30:00",
  "is_owner": true,
  "sources": [
    {"type": "pdf", "title": "paper.pdf"},
    {"type": "url", "url": "https://example.com/article"}
  ]
}

Testing

✅ Format: ruff format
✅ Lint: ruff check
✅ Type check: mypy
✅ Tests: pytest (1812 passed, 9 skipped)

Note

The updated_at field requested in the issue is not included because it's not currently available in the Notebook API response parsing. This can be added in a future update if the timestamp location is identified in the raw API response.

Add a new dataclass that combines notebook details with a simplified sources list for export/overview purposes. Includes a to_dict() method for JSON serialization matching the output format requested in issue teng-lin#171.

Add a new method to NotebooksAPI that combines notebook details with a simplified sources list. Reuses existing get() and SourcesAPI.list() methods, avoiding new RPC calls.

Add a new CLI command for exporting notebook metadata with sources. Outputs JSON by default for easy parsing, with --no-json flag for human-readable format. Supports partial ID resolution.

Export NotebookMetadata dataclass in __init__.py to make it available via the public API for users.

gemini-code-assist · 2026-03-09T21:45:10Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new feature that allows users to export comprehensive metadata for a given notebook, including its core details and a simplified list of associated sources. This enhancement provides a structured and easily consumable format for programmatic access or command-line inspection of notebook information, improving data accessibility and integration capabilities.

Highlights

New NotebookMetadata Dataclass: A new NotebookMetadata dataclass was added to src/notebooklm/types.py. This class combines notebook details (ID, title, creation timestamp, owner status) with a simplified list of sources and includes a to_dict() method for JSON serialization. It is also exported in the public API via __init__.py.
New NotebooksAPI.get_metadata() Method: A new asynchronous method get_metadata() was added to src/notebooklm/_notebooks.py. This method retrieves notebook details and its associated sources, returning them as a NotebookMetadata object. It reuses existing get() and SourcesAPI.list() methods, requiring no new RPC calls.
New metadata CLI Command: A new metadata command was added to the CLI in src/notebooklm/cli/notebook.py. This command allows users to export notebook metadata, including sources, from the command line. It defaults to JSON output for easy parsing but also supports a human-readable format via the --no-json flag and partial notebook ID resolution.

Changelog

src/notebooklm/init.py
- Exported the new NotebookMetadata class.
src/notebooklm/_notebooks.py
- Added the get_metadata asynchronous method to retrieve combined notebook and source metadata.
src/notebooklm/cli/notebook.py
- Added a new metadata command to the CLI for exporting notebook metadata with options for JSON or human-readable output.
src/notebooklm/types.py
- Defined the NotebookMetadata dataclass to encapsulate notebook details and a simplified list of sources, including a to_dict method for serialization.

Activity

The author performed local testing, including ruff format, ruff check, mypy, and pytest (1812 passed, 9 skipped).

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a feature to export notebook metadata, including a list of sources. The implementation is well-structured, adding a new NotebookMetadata type, a get_metadata API method, and a corresponding metadata CLI command. My review includes suggestions to improve performance by running asynchronous operations concurrently, and to simplify the CLI implementation for better maintainability and user experience.

gemini-code-assist · 2026-03-09T21:47:41Z

src/notebooklm/_notebooks.py

+        # Get notebook details
+        notebook = await self.get(notebook_id)
+
+        # Get sources list
+        from ._sources import SourcesAPI
+
+        sources_api = SourcesAPI(self._core)
+        sources = await sources_api.list(notebook_id)


The two await calls to get notebook details and the list of sources are independent and can be run concurrently using asyncio.gather. This will improve the performance of the method by reducing the total execution time.

Please also add import asyncio to the top of the file.

Suggested change

# Get notebook details

notebook = await self.get(notebook_id)

# Get sources list

from ._sources import SourcesAPI

sources_api = SourcesAPI(self._core)

sources = await sources_api.list(notebook_id)

# Get notebook details and sources list concurrently

from ._sources import SourcesAPI

import asyncio

sources_api = SourcesAPI(self._core)

notebook, sources = await asyncio.gather(

self.get(notebook_id),

sources_api.list(notebook_id),

)

References

When downloading multiple files asynchronously, use asyncio.gather to execute downloads concurrently, improving performance over sequential downloads.

gemini-code-assist · 2026-03-09T21:47:42Z

src/notebooklm/cli/notebook.py

+    @click.option(
+        "--json", "json_output", is_flag=True, default=True, help="Output as JSON (default: True)"
+    )
+    @click.option(
+        "--no-json",
+        "human_output",
+        is_flag=True,
+        help="Output in human-readable format instead of JSON",
+    )
+    @with_client
+    def metadata_cmd(ctx, notebook_id, json_output, human_output, client_auth):


The --json option is redundant and potentially confusing. The command defaults to JSON output, and the --no-json flag correctly switches to human-readable format. The json_output parameter is not used in the function body. Removing this option will simplify the command's interface and implementation.

@click.option( "--no-json", "human_output", is_flag=True, help="Output in human-readable format instead of JSON", ) @with_client def metadata_cmd(ctx, notebook_id, human_output, client_auth):

gemini-code-assist · 2026-03-09T21:47:42Z

src/notebooklm/cli/notebook.py

+                            url = source.get("url", "")
+
+                            # Format source line
+                            if url:
+                                console.print(f"  {i}. [{source_type}] {title}")
+                                console.print(f"     {url}")
+                            else:
+                                console.print(f"  {i}. [{source_type}] {title}")


There is some code duplication in the if/else block for printing source information. You can simplify this by always printing the main source line and then conditionally printing the URL if it exists. This avoids repeating the console.print(f" {i}. [{source_type}] {title}") line.

Using source.get("url") instead of source.get("url", "") is also slightly cleaner as it will return None if the key is not present, which works well with the conditional check.

url = source.get("url") # Format source line console.print(f" {i}. [{source_type}] {title}") if url: console.print(f" {url}")

teng-lin

Thanks for this contribution! The feature is well-scoped and the CLI placement as a top-level command is consistent with existing conventions. A few suggestions to bring it in line with the project's patterns before merge:

Architecture: `SourcesAPI` instantiation inside `NotebooksAPI`

get_metadata() does a local import and creates a standalone SourcesAPI(self._core), which bypasses the client's dependency injection pattern. The project uses constructor injection for cross-API references (e.g., ArtifactsAPI receives notes_api via its constructor).

Suggestion: Either move get_metadata() to NotebookLMClient itself (since it composes two sub-APIs), or inject SourcesAPI into NotebooksAPI via the constructor following the ArtifactsAPI/notes_api pattern.

Type design: `sources: list[dict]` should be a typed structure

This is the only list[dict] in the entire types.py module — every other collection uses typed elements (list[SuggestedTopic], list[SharedUser], etc.). Using untyped dicts means consumers lose type safety and IDE support.

Suggestion: Define a small dataclass for the source summary:

@dataclass
class SourceSummary:
    kind: SourceType
    title: str | None = None
    url: str | None = None

    def to_dict(self) -> dict[str, str | None]:
        d: dict[str, str | None] = {"type": str(self.kind)}
        if self.title:
            d["title"] = self.title
        if self.url:
            d["url"] = self.url
        return d

Then use sources: list[SourceSummary] in NotebookMetadata.

Additionally, consider composing with Notebook rather than duplicating its fields (id, title, created_at, is_owner). This avoids silent divergence if Notebook gains new fields:

@dataclass
class NotebookMetadata:
    notebook: Notebook
    sources: list[SourceSummary] = field(default_factory=list)

With to_dict() flattening the notebook fields for JSON output.

Source dict schema consistency

Currently, title and url keys are conditionally included — some source dicts will have them and some won't. This makes the JSON output harder to consume programmatically.

Suggestion: Always include all keys, using null for absent values, so the schema is consistent across all source entries.

CLI flags: `--json` / `--no-json` convention

Every other CLI command uses --json as an opt-in flag (default False) with human-readable output as the default. This command has --json defaulting to True plus a separate --no-json flag, which is inconsistent.

Suggestion: Follow the existing pattern — --json as opt-in, human-readable as default. Remove the --no-json flag. This also eliminates the unused json_output parameter in the function body.

Silent data loss when source listing fails

SourcesAPI.list() returns [] on various failure conditions (malformed response, API changes) with only a logger.warning. Since get_metadata() passes this through, users could see "sources": [] when the notebook actually has sources.

Suggestion: Cross-check notebook.sources_count against len(sources) and log a warning on mismatch:

if notebook.sources_count > 0 and len(sources) == 0:
    logger.warning(
        "Notebook %s reports %d sources but listing returned empty",
        notebook_id, notebook.sources_count,
    )

`str(source.kind)` behavior

SourceType is a str enum. On Python versions <3.11, str() on a StrEnum member may produce "SourceType.web_page" instead of "web_page". Consider using source.kind.value explicitly for safety.

Missing tests

The project has thorough test coverage at unit/integration/CLI levels for every comparable feature. Recommended additions:

Priority	Test	Location
1	`get_metadata()` happy path + empty sources	`tests/integration/test_notebooks.py`
2	`NotebookMetadata.to_dict()` serialization (with and without `created_at`)	`tests/unit/test_types.py`
3	CLI `metadata` JSON output + human-readable output	`tests/unit/cli/test_notebook.py`

Overall this is a nice, focused feature — these suggestions are mostly about aligning with existing project conventions. Happy to help if any of these need further clarification!

furkankoykiran added 4 commits March 9, 2026 21:42

feat(types): add NotebookMetadata dataclass

79b7d5f

Add a new dataclass that combines notebook details with a simplified sources list for export/overview purposes. Includes a to_dict() method for JSON serialization matching the output format requested in issue teng-lin#171.

feat(notebooks): add get_metadata() method

52cf852

Add a new method to NotebooksAPI that combines notebook details with a simplified sources list. Reuses existing get() and SourcesAPI.list() methods, avoiding new RPC calls.

feat(cli): add metadata command

83dd482

Add a new CLI command for exporting notebook metadata with sources. Outputs JSON by default for easy parsing, with --no-json flag for human-readable format. Supports partial ID resolution.

feat(exports): add NotebookMetadata to public API

4cdcefe

Export NotebookMetadata dataclass in __init__.py to make it available via the public API for users.

gemini-code-assist bot reviewed Mar 9, 2026

View reviewed changes

furkankoykiran mentioned this pull request Mar 9, 2026

Feature Request: Export Notebook Metadata and Sources via Python API #171

Open

teng-lin reviewed Mar 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add notebook metadata export with sources list (issue #171)#174

feat: add notebook metadata export with sources list (issue #171)#174
furkankoykiran wants to merge 4 commits intoteng-lin:mainfrom
furkankoykiran:feat/export-notebook-metadata

furkankoykiran commented Mar 9, 2026

Uh oh!

gemini-code-assist bot commented Mar 9, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 9, 2026

Uh oh!

gemini-code-assist bot Mar 9, 2026

Uh oh!

gemini-code-assist bot Mar 9, 2026

Uh oh!

teng-lin left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

furkankoykiran commented Mar 9, 2026

Summary

Changes

1. New NotebookMetadata dataclass (types.py)

2. New NotebooksAPI.get_metadata() method (_notebooks.py)

3. New metadata CLI command (cli/notebook.py)

Example Usage

Testing

Note

Uh oh!

gemini-code-assist bot commented Mar 9, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

teng-lin left a comment

Choose a reason for hiding this comment

Architecture: SourcesAPI instantiation inside NotebooksAPI

Type design: sources: list[dict] should be a typed structure

Source dict schema consistency

CLI flags: --json / --no-json convention

Silent data loss when source listing fails

str(source.kind) behavior

Missing tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. New `NotebookMetadata` dataclass (types.py)

2. New `NotebooksAPI.get_metadata()` method (_notebooks.py)

3. New `metadata` CLI command (cli/notebook.py)

Architecture: `SourcesAPI` instantiation inside `NotebooksAPI`

Type design: `sources: list[dict]` should be a typed structure

CLI flags: `--json` / `--no-json` convention

`str(source.kind)` behavior