Simplicity and conciseness of R,
with the blazing speed of Rust.
Chainable โข Composable โข Declarative โข Semantic โข Agent-Ready
pydplyr is a modern, expressive, and chainable data manipulation library built for humans and machines. Think:
- ๐ง Like dplyr in R
- โก๏ธ With Python's ecosystem
- ๐ On a path to a Rust-powered backend
- ๐ค Agent-ready for autonomous systems
Whether you're a data scientist, a developer building intelligent systems, or a machine ๐ค looking for clarityโthis library speaks your language.
- โ Composable Verbs โ clear, expressive syntax for fast prototyping & serious work
- ๐งฌ Chainable API โ minimal boilerplate, max readability
- ๐จ Grammar of Graphics โ familiar design for layered visualizations (Coming Soon!)
- ๐ Simplified Regex โ harness RegEx without the brain melt
- ๐ Agent-First Thinking โ semantically rich, logic-oriented operations
- ๐ฆ Rust Ambition โ future versions will compile to Rust for blazing performance
Each verb is purpose-built and plays well with others. Start small, scale infinitely.
| Verb | Description |
|---|---|
arrange() |
Sort your data |
select() |
Pick columns |
filter() |
Subset rows |
mutate() |
Create or modify columns |
summarize() |
Aggregate and reduce |
group_by() |
Enable grouped operations |
distinct() |
Drop duplicate rows |
from pydplyr import *
result = (
Panel(df)
.arrange(desc("score"))
.filter("score > 80")
.mutate(score_plus_age="score + age")
.select("name", "score_plus_age")
.collect()
)Intuitive, readable, chainable. One thought per line.
Just like in ggplot2, our graphics philosophy follows this layered system:
- Data โ the DataFrame
- Aesthetics โ x/y mappings, color, shape
- Geoms โ bar, point, line, etc.
- Stats โ transformations like count or smooth
- Facets โ split plots by category
- Coords โ coordinate systems (polar, cartesian, etc.)
- Themes โ polish for publication or dashboard
๐ Visuals should tell, not yell.
We believe RegEx shouldn't be a dark art.
Panel(df).filter_col("email", pattern=".*@example.com")You get the full power of re, simplified into expressive helpers for real-world usage.
The future of data is semantic, composable, and intelligent.
pydplyr is being designed with agentic AI frameworks in mind โ where the code can be read and written by both humans and agents.
Whether itโs embedded in LLM-based agents or running as the logic core of autonomous data pipelines, pydplyr is made to be interpretable, traceable, and chainable.
- Core verbs (arrange, select, filter, mutate, summarize, distinct)
- Grouped operations
- Grammar of Graphics module
- Rust backend (via
pyo3orpolars) - Natural-language Regex builder
- LLM prompt-to-code interface
- Optional
asyncAPI for distributed computing - Plugin system for custom verbs and visual geoms
pip install pydplyrWe welcome contributors who care about:
- elegant APIs ๐งผ
- expressive code ๐ฌ
- performance ๐ฅ
- semantic richness ๐
- and dreaming big ๐ก
To get started, clone the repo and check the CONTRIBUTING.md guidelines.
Got ideas? Found bugs? Want to build the future of data science?
- Open an issue
- Start a discussion
- Or just drop by with a star โญ๏ธ
"The art of data science is not in the numbers โ itโs in the story they tell, and the tools that let them speak."
โ