feat: add notebook metadata export with sources list (issue #171)#174
feat: add notebook metadata export with sources list (issue #171)#174furkankoykiran wants to merge 4 commits intoteng-lin:mainfrom
Conversation
Add a new dataclass that combines notebook details with a simplified sources list for export/overview purposes. Includes a to_dict() method for JSON serialization matching the output format requested in issue teng-lin#171.
Add a new method to NotebooksAPI that combines notebook details with a simplified sources list. Reuses existing get() and SourcesAPI.list() methods, avoiding new RPC calls.
Add a new CLI command for exporting notebook metadata with sources. Outputs JSON by default for easy parsing, with --no-json flag for human-readable format. Supports partial ID resolution.
Export NotebookMetadata dataclass in __init__.py to make it available via the public API for users.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new feature that allows users to export comprehensive metadata for a given notebook, including its core details and a simplified list of associated sources. This enhancement provides a structured and easily consumable format for programmatic access or command-line inspection of notebook information, improving data accessibility and integration capabilities. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a feature to export notebook metadata, including a list of sources. The implementation is well-structured, adding a new NotebookMetadata type, a get_metadata API method, and a corresponding metadata CLI command. My review includes suggestions to improve performance by running asynchronous operations concurrently, and to simplify the CLI implementation for better maintainability and user experience.
| # Get notebook details | ||
| notebook = await self.get(notebook_id) | ||
|
|
||
| # Get sources list | ||
| from ._sources import SourcesAPI | ||
|
|
||
| sources_api = SourcesAPI(self._core) | ||
| sources = await sources_api.list(notebook_id) |
There was a problem hiding this comment.
The two await calls to get notebook details and the list of sources are independent and can be run concurrently using asyncio.gather. This will improve the performance of the method by reducing the total execution time.
Please also add import asyncio to the top of the file.
| # Get notebook details | |
| notebook = await self.get(notebook_id) | |
| # Get sources list | |
| from ._sources import SourcesAPI | |
| sources_api = SourcesAPI(self._core) | |
| sources = await sources_api.list(notebook_id) | |
| # Get notebook details and sources list concurrently | |
| from ._sources import SourcesAPI | |
| import asyncio | |
| sources_api = SourcesAPI(self._core) | |
| notebook, sources = await asyncio.gather( | |
| self.get(notebook_id), | |
| sources_api.list(notebook_id), | |
| ) |
References
- When downloading multiple files asynchronously, use
asyncio.gatherto execute downloads concurrently, improving performance over sequential downloads.
| @click.option( | ||
| "--json", "json_output", is_flag=True, default=True, help="Output as JSON (default: True)" | ||
| ) | ||
| @click.option( | ||
| "--no-json", | ||
| "human_output", | ||
| is_flag=True, | ||
| help="Output in human-readable format instead of JSON", | ||
| ) | ||
| @with_client | ||
| def metadata_cmd(ctx, notebook_id, json_output, human_output, client_auth): |
There was a problem hiding this comment.
The --json option is redundant and potentially confusing. The command defaults to JSON output, and the --no-json flag correctly switches to human-readable format. The json_output parameter is not used in the function body. Removing this option will simplify the command's interface and implementation.
@click.option(
"--no-json",
"human_output",
is_flag=True,
help="Output in human-readable format instead of JSON",
)
@with_client
def metadata_cmd(ctx, notebook_id, human_output, client_auth):| url = source.get("url", "") | ||
|
|
||
| # Format source line | ||
| if url: | ||
| console.print(f" {i}. [{source_type}] {title}") | ||
| console.print(f" {url}") | ||
| else: | ||
| console.print(f" {i}. [{source_type}] {title}") |
There was a problem hiding this comment.
There is some code duplication in the if/else block for printing source information. You can simplify this by always printing the main source line and then conditionally printing the URL if it exists. This avoids repeating the console.print(f" {i}. [{source_type}] {title}") line.
Using source.get("url") instead of source.get("url", "") is also slightly cleaner as it will return None if the key is not present, which works well with the conditional check.
url = source.get("url")
# Format source line
console.print(f" {i}. [{source_type}] {title}")
if url:
console.print(f" {url}")
teng-lin
left a comment
There was a problem hiding this comment.
Thanks for this contribution! The feature is well-scoped and the CLI placement as a top-level command is consistent with existing conventions. A few suggestions to bring it in line with the project's patterns before merge:
Architecture: SourcesAPI instantiation inside NotebooksAPI
get_metadata() does a local import and creates a standalone SourcesAPI(self._core), which bypasses the client's dependency injection pattern. The project uses constructor injection for cross-API references (e.g., ArtifactsAPI receives notes_api via its constructor).
Suggestion: Either move get_metadata() to NotebookLMClient itself (since it composes two sub-APIs), or inject SourcesAPI into NotebooksAPI via the constructor following the ArtifactsAPI/notes_api pattern.
Type design: sources: list[dict] should be a typed structure
This is the only list[dict] in the entire types.py module — every other collection uses typed elements (list[SuggestedTopic], list[SharedUser], etc.). Using untyped dicts means consumers lose type safety and IDE support.
Suggestion: Define a small dataclass for the source summary:
@dataclass
class SourceSummary:
kind: SourceType
title: str | None = None
url: str | None = None
def to_dict(self) -> dict[str, str | None]:
d: dict[str, str | None] = {"type": str(self.kind)}
if self.title:
d["title"] = self.title
if self.url:
d["url"] = self.url
return dThen use sources: list[SourceSummary] in NotebookMetadata.
Additionally, consider composing with Notebook rather than duplicating its fields (id, title, created_at, is_owner). This avoids silent divergence if Notebook gains new fields:
@dataclass
class NotebookMetadata:
notebook: Notebook
sources: list[SourceSummary] = field(default_factory=list)With to_dict() flattening the notebook fields for JSON output.
Source dict schema consistency
Currently, title and url keys are conditionally included — some source dicts will have them and some won't. This makes the JSON output harder to consume programmatically.
Suggestion: Always include all keys, using null for absent values, so the schema is consistent across all source entries.
CLI flags: --json / --no-json convention
Every other CLI command uses --json as an opt-in flag (default False) with human-readable output as the default. This command has --json defaulting to True plus a separate --no-json flag, which is inconsistent.
Suggestion: Follow the existing pattern — --json as opt-in, human-readable as default. Remove the --no-json flag. This also eliminates the unused json_output parameter in the function body.
Silent data loss when source listing fails
SourcesAPI.list() returns [] on various failure conditions (malformed response, API changes) with only a logger.warning. Since get_metadata() passes this through, users could see "sources": [] when the notebook actually has sources.
Suggestion: Cross-check notebook.sources_count against len(sources) and log a warning on mismatch:
if notebook.sources_count > 0 and len(sources) == 0:
logger.warning(
"Notebook %s reports %d sources but listing returned empty",
notebook_id, notebook.sources_count,
)str(source.kind) behavior
SourceType is a str enum. On Python versions <3.11, str() on a StrEnum member may produce "SourceType.web_page" instead of "web_page". Consider using source.kind.value explicitly for safety.
Missing tests
The project has thorough test coverage at unit/integration/CLI levels for every comparable feature. Recommended additions:
| Priority | Test | Location |
|---|---|---|
| 1 | get_metadata() happy path + empty sources |
tests/integration/test_notebooks.py |
| 2 | NotebookMetadata.to_dict() serialization (with and without created_at) |
tests/unit/test_types.py |
| 3 | CLI metadata JSON output + human-readable output |
tests/unit/cli/test_notebook.py |
Overall this is a nice, focused feature — these suggestions are mostly about aligning with existing project conventions. Happy to help if any of these need further clarification!
Summary
Implements the export notebook metadata feature requested in issue #171.
Changes
1. New
NotebookMetadatadataclass (types.py)to_dict()method for JSON serialization__init__.py2. New
NotebooksAPI.get_metadata()method (_notebooks.py)NotebookMetadatawith notebook details and simplified sourcesget()andSourcesAPI.list()methods3. New
metadataCLI command (cli/notebook.py)notebooklm metadata [-n NOTEBOOK_ID] [--no-json]--no-jsonflag for human-readable formatExample Usage
Python API:
CLI:
Output:
{ "id": "abc123", "title": "AI Research Notes", "created_at": "2026-03-09T16:30:00", "is_owner": true, "sources": [ {"type": "pdf", "title": "paper.pdf"}, {"type": "url", "url": "https://example.com/article"} ] }Testing
ruff formatruff checkmypypytest(1812 passed, 9 skipped)Note
The
updated_atfield requested in the issue is not included because it's not currently available in the Notebook API response parsing. This can be added in a future update if the timestamp location is identified in the raw API response.