Skip to content

Latest commit

 

History

History
312 lines (238 loc) · 10.9 KB

File metadata and controls

312 lines (238 loc) · 10.9 KB

Contributing Guide

Status: Active Last Updated: 2026-01-15

This guide covers everything you need to contribute to notebooklm-py: architecture overview, testing, and releasing.


Architecture

Package Structure

src/notebooklm/
├── __init__.py          # Public exports
├── client.py            # NotebookLMClient main class
├── auth.py              # Authentication handling
├── types.py             # Dataclasses and type definitions
├── _core.py             # Core HTTP/RPC infrastructure
├── _notebooks.py        # NotebooksAPI implementation
├── _sources.py          # SourcesAPI implementation
├── _artifacts.py        # ArtifactsAPI implementation
├── _chat.py             # ChatAPI implementation
├── _research.py         # ResearchAPI implementation
├── _notes.py            # NotesAPI implementation
├── rpc/                 # RPC protocol layer
│   ├── __init__.py
│   ├── types.py         # RPCMethod enum and constants
│   ├── encoder.py       # Request encoding
│   └── decoder.py       # Response parsing
└── cli/                 # CLI implementation
    ├── __init__.py      # CLI package exports
    ├── helpers.py       # Shared utilities
    ├── session.py       # login, use, status, clear
    ├── notebook.py      # list, create, delete, rename
    ├── source.py        # source add, list, delete
    ├── artifact.py      # artifact list, get, delete
    ├── generate.py      # generate audio, video, etc.
    ├── download.py      # download audio, video, etc.
    ├── chat.py          # ask, configure, history
    └── ...

Layered Architecture

┌─────────────────────────────────────────────────────────────┐
│                         CLI Layer                           │
│   cli/session.py, cli/notebook.py, cli/generate.py, etc.    │
└───────────────────────────┬─────────────────────────────────┘
                            │
┌───────────────────────────▼─────────────────────────────────┐
│                      Client Layer                           │
│  NotebookLMClient → NotebooksAPI, SourcesAPI, ArtifactsAPI  │
└───────────────────────────┬─────────────────────────────────┘
                            │
┌───────────────────────────▼─────────────────────────────────┐
│                       Core Layer                            │
│              ClientCore → _rpc_call(), HTTP client          │
└───────────────────────────┬─────────────────────────────────┘
                            │
┌───────────────────────────▼─────────────────────────────────┐
│                        RPC Layer                            │
│        encoder.py, decoder.py, types.py (RPCMethod)         │
└─────────────────────────────────────────────────────────────┘

Layer Responsibilities

Layer Files Responsibility
CLI cli/*.py User commands, input validation, Rich output
Client client.py, _*.py High-level Python API, returns typed dataclasses
Core _core.py HTTP client, request counter, RPC abstraction
RPC rpc/*.py Protocol encoding/decoding, method IDs

Key Design Decisions

Why underscore prefixes? Files like _notebooks.py are internal implementation. Public API stays clean (from notebooklm import NotebookLMClient).

Why namespaced APIs? client.notebooks.list() instead of client.list_notebooks() - better organization, scales well, tab-completion friendly.

Why async? Google's API can be slow. Async enables concurrent operations and non-blocking downloads.

Adding New Features

New RPC Method:

  1. Capture traffic (see RPC Development Guide)
  2. Add to rpc/types.py: NEW_METHOD = "AbCdEf"
  3. Implement in appropriate _*.py API class
  4. Add dataclass to types.py if needed
  5. Add CLI command if user-facing

New API Class:

  1. Create _newfeature.py with NewFeatureAPI class
  2. Add to client.py: self.newfeature = NewFeatureAPI(self._core)
  3. Export types from __init__.py

Testing

Prerequisites

  1. Install dependencies:

    uv pip install -e ".[dev]"
  2. Authenticate:

    notebooklm login
  3. Create read-only test notebook (required for E2E tests):

    • Create notebook at NotebookLM
    • Add multiple sources (text, URL, etc.)
    • Generate artifacts (audio, quiz, etc.)
    • Set env var: export NOTEBOOKLM_READ_ONLY_NOTEBOOK_ID="your-id"

Quick Reference

# Unit + integration tests (no auth needed)
pytest

# E2E tests (requires auth + test notebook)
pytest tests/e2e -m readonly        # Read-only tests only
pytest tests/e2e -m "not variants"  # Skip parameter variants
pytest tests/e2e --include-variants # All tests including variants

Test Structure

tests/
├── unit/           # No network, fast, mock everything
├── integration/    # Mocked HTTP responses + VCR cassettes
└── e2e/            # Real API calls (requires auth)

E2E Fixtures

Fixture Use Case
read_only_notebook_id List/download existing artifacts
temp_notebook Add/delete sources (auto-cleanup)
generation_notebook_id Generate artifacts (CI-aware cleanup)

Rate Limiting

NotebookLM has undocumented rate limits. Generation tests may be skipped when rate limited:

  • Use pytest tests/e2e -m readonly for quick validation
  • Wait a few minutes between full test runs
  • SKIPPED (Rate limited by API) is expected behavior, not failure

VCR Testing (Recorded HTTP)

Record HTTP interactions for offline/deterministic replay:

# Record new cassettes (committed to repo with sensitive data scrubbed)
NOTEBOOKLM_VCR_RECORD=1 pytest tests/integration/test_vcr_*.py -v

# Run with recorded responses
pytest tests/integration/test_vcr_*.py

Sensitive data (cookies, tokens, emails) is automatically scrubbed.

Writing New Tests

Need network?
├── No → tests/unit/
├── Mocked → tests/integration/
└── Real API → tests/e2e/
    └── What notebook?
        ├── Read-only → read_only_notebook_id + @pytest.mark.readonly
        ├── CRUD → temp_notebook
        └── Generation → generation_notebook_id
            └── Parameter variant? → add @pytest.mark.variants

CI/CD

Workflows

Workflow Trigger Purpose
test.yml Push/PR Unit tests, linting, type checking
nightly.yml Daily 6 AM UTC E2E tests with real API
rpc-health.yml Daily 7 AM UTC RPC method ID monitoring (see stability.md)
testpypi-publish.yml Manual dispatch Publish to TestPyPI
verify-package.yml Manual dispatch Verify TestPyPI or PyPI install + E2E
publish.yml Tag push Publish to PyPI

Setting Up Nightly E2E Tests

  1. Get storage state: cat ~/.notebooklm/storage_state.json
  2. Add GitHub secrets:
    • NOTEBOOKLM_AUTH_JSON: Storage state JSON
    • NOTEBOOKLM_READ_ONLY_NOTEBOOK_ID: Your test notebook ID

Maintaining Secrets

Task Frequency
Refresh credentials Every 1-2 weeks
Check nightly results Daily

Troubleshooting CI/CD Auth

First step: Run notebooklm auth check --json in your workflow to diagnose issues.

"NOTEBOOKLM_AUTH_JSON environment variable is set but empty"

Cause: The NOTEBOOKLM_AUTH_JSON env var is set to an empty string.

Solution:

  • Ensure the GitHub secret is properly configured
  • Check the secret isn't empty or whitespace-only
  • Verify the workflow syntax: ${{ secrets.NOTEBOOKLM_AUTH_JSON }}

"must contain valid Playwright storage state with a 'cookies' key"

Cause: The JSON in NOTEBOOKLM_AUTH_JSON is missing the required structure.

Solution: Ensure your secret contains valid Playwright storage state JSON:

{
  "cookies": [
    {"name": "SID", "value": "...", "domain": ".google.com", ...},
    ...
  ],
  "origins": []
}

"Cannot run 'login' when NOTEBOOKLM_AUTH_JSON is set"

Cause: You're trying to run notebooklm login in CI/CD where NOTEBOOKLM_AUTH_JSON is set.

Why: The login command saves to a file, which conflicts with environment-based auth.

Solution:

  • Don't run login in CI/CD - use the env var for auth instead
  • If you need to refresh auth, do it locally and update the secret

Session expired in CI/CD

Cause: Google sessions expire periodically (typically every 1-2 weeks).

Solution:

  1. Re-run notebooklm login locally
  2. Copy the contents of ~/.notebooklm/storage_state.json
  3. Update your GitHub secret

Multiple accounts in CI/CD

Use separate secrets and set NOTEBOOKLM_AUTH_JSON per job:

jobs:
  account-1:
    env:
      NOTEBOOKLM_AUTH_JSON: ${{ secrets.NOTEBOOKLM_AUTH_ACCOUNT1 }}
    steps:
      - run: notebooklm list

  account-2:
    env:
      NOTEBOOKLM_AUTH_JSON: ${{ secrets.NOTEBOOKLM_AUTH_ACCOUNT2 }}
    steps:
      - run: notebooklm list

Debugging CI/CD auth issues

Add diagnostic steps to your workflow:

- name: Debug auth
  run: |
    # Comprehensive auth check (preferred)
    notebooklm auth check --json

    # Check if env var is set (without revealing content)
    if [ -n "$NOTEBOOKLM_AUTH_JSON" ]; then
      echo "NOTEBOOKLM_AUTH_JSON is set (length: ${#NOTEBOOKLM_AUTH_JSON})"
    else
      echo "NOTEBOOKLM_AUTH_JSON is not set"
    fi

The auth check --json output shows:

  • Whether storage/env var is being used
  • Which cookies are present
  • Cookie domains (important for regional users)
  • Any validation errors

Getting Help

  • Check existing implementations in _*.py files
  • Look at test files for expected structures
  • See RPC Development Guide for protocol details
  • Open an issue with captured request/response (sanitized)