The Cetamura Batch Ingest Tool packages ingest-ready ZIP files for two workflows:
Photo: image asset + XML metadata +manifest.iniPatent: PDF + XML metadata + sharedmanifest.ini
The current application is workflow-aware, non-mutating, and designed so staging and production runs never modify source folders. Generated ZIPs, reports, and temporary scratch files are written only under the selected folder's output/ or staging_output/ directory.
- GUI workflow selector for
PhotoandPatent Dry Run,Staging, andProductionrun modes- Non-mutating processing in both staging and production
- Output-side scratch workspace for photo TIFF conversion
- Patent batch packaging with shared manifest validation
- Optional fallback patent PDF lookup via
CETAMURA_PATENT_SEARCH_ROOTS - CSV reporting, technical logs, and user-facing summary logs
- Pre-flight checks, ZIP validation, and reconciliation reporting
- GitHub Actions CI for tests and regression coverage
Expected input:
- image file such as
.jpg,.jpeg,.png,.tif,.tiff, or.pdf - matching XML metadata containing an IID
manifest.ini
Behavior:
- source images are read only
- TIFF conversion happens in an output-side scratch workspace
- ZIP contents are
TIFF + XML + manifest.ini
Expected input:
- one or more patent XML files in a batch directory
- one shared
manifest.iniin that same batch directory - matching patent PDFs in the batch directory, or in configured fallback search roots
Behavior:
- canonical package name comes from XML
<identifier type="IID"> - XML filename stem must match the IID
- normalized XML
document IDmust match the IID - ZIP contents are
PDF + XML + manifest.ini - the shared manifest must include a
[package]section and the required package fields - the application does not enforce specific patent manifest values
- performs discovery, validation, and CSV reporting
- creates no ZIP files
- leaves source and output packages untouched
- writes ZIP files and reports to
staging_output/ - intended for review before production
- writes ZIP files and reports to
output/ - intended for final ingest-ready packaging
- Windows 10 or 11
- Python
3.9+
git clone https://github.com/FSUDRS/cetamura_python_script.git
cd cetamura_python_script
python -m pip install --upgrade pip
python -m pip install -r requirements/requirements.txtpython src/main.pypython -m pip install -r requirements/requirements-dev.txtIf patent PDFs may live outside the selected batch folder, configure fallback search roots with the CETAMURA_PATENT_SEARCH_ROOTS environment variable.
Windows example:
$env:CETAMURA_PATENT_SEARCH_ROOTS = "C:\patents\primary;D:\cetamura\archive"
python src\main.pyUse the platform path separator when providing multiple roots.
For a selected source folder:
- staging ZIPs are written to
staging_output/ - production ZIPs are written to
output/ - CSV reports are written to the active output folder, or to the selected folder during dry run
- technical logs are written to
batch_tool.log - user-facing summary logs are written to
batch_process_summary.log
The application may create a temporary .work/ directory under the active output root during processing. It is cleaned up after successful runs.
Before processing:
- disk space is checked
- output write access is checked
- configured patent fallback roots are validated when patent mode is active
After processing:
- ZIP counts are validated against successful CSV rows
- ZIP contents are validated by workflow
- reconciliation compares XML count, CSV success count, actual ZIP count, and valid ZIP count
Run the local test suite:
python -m pytestRun the project test runner:
python tests/run_tests.pyThe GitHub Actions workflow runs on pushes to main, master, and ci-cd-development.
Release notes are maintained in CHANGELOG.md. Use docs/RELEASE_CHECKLIST.md for verification and packaging.
cetamura_python_script/
.github/
workflows/
ci.yml
docs/
readme.md
RELEASE_CHECKLIST.md
requirements/
requirements.txt
requirements-dev.txt
scripts/
build/
build_exe.ps1
build_exe_macos.sh
build_cross_platform.sh
create_dist_package.ps1
src/
main.py
validation.py
tests/
run_tests.py
test_global_recovery.py
test_main.py
test_validation.py
README.md
pytest.ini
No valid photo sets detected: verify image files, XML files, andmanifest.iniplacement.No valid patent batch directories detected: verify patent XML files and exactly one sharedmanifest.iniin the batch directory.No PDF found for ...: either place the PDF in the selected patent batch folder or configureCETAMURA_PATENT_SEARCH_ROOTS.Post-processing validation failed: review the CSV report andbatch_tool.logfor ZIP or count mismatches.
The current release includes:
- non-mutating staging and production behavior
- patent batch packaging
- workflow-aware GUI refresh
- garnet-and-gold UI theme
- expanded regression coverage and CI compatibility updates