Skip to content

Commit 177d70d

Browse files
authored
Merge branch 'master' into ray_azure_integration
2 parents 84a018e + df70d8d commit 177d70d

55 files changed

Lines changed: 4756 additions & 866 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docs/SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -163,6 +163,7 @@
163163
* [\[Alpha\] Vector Database](reference/alpha-vector-database.md)
164164
* [\[Alpha\] Data quality monitoring](reference/dqm.md)
165165
* [\[Alpha\] Streaming feature computation with Denormalized](reference/denormalized.md)
166+
* [OpenLineage Integration](reference/openlineage.md)
166167
* [Feast CLI reference](reference/feast-cli-commands.md)
167168
* [Python API reference](http://rtd.feast.dev)
168169
* [Usage](reference/usage.md)

docs/reference/openlineage.md

Lines changed: 218 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,218 @@
1+
# OpenLineage Integration
2+
3+
This module provides **native integration** between Feast and [OpenLineage](https://openlineage.io/), enabling automatic data lineage tracking for ML feature engineering workflows.
4+
5+
## Overview
6+
7+
When enabled, the integration **automatically** emits OpenLineage events for:
8+
9+
- **Registry changes** - Events when feature views, feature services, and entities are applied
10+
- **Feature materialization** - START, COMPLETE, and FAIL events when features are materialized
11+
12+
**No code changes required** - just enable OpenLineage in your `feature_store.yaml`!
13+
14+
## Installation
15+
16+
OpenLineage is an optional dependency. Install it with:
17+
18+
```bash
19+
pip install openlineage-python
20+
```
21+
22+
Or install Feast with the OpenLineage extra:
23+
24+
```bash
25+
pip install feast[openlineage]
26+
```
27+
28+
## Configuration
29+
30+
Add the `openlineage` section to your `feature_store.yaml`:
31+
32+
```yaml
33+
project: my_project
34+
registry: data/registry.db
35+
provider: local
36+
online_store:
37+
type: sqlite
38+
path: data/online_store.db
39+
40+
openlineage:
41+
enabled: true
42+
transport_type: http
43+
transport_url: http://localhost:5000
44+
transport_endpoint: api/v1/lineage
45+
namespace: feast
46+
emit_on_apply: true
47+
emit_on_materialize: true
48+
```
49+
50+
Once configured, all Feast operations will automatically emit lineage events.
51+
52+
### Environment Variables
53+
54+
You can also configure via environment variables:
55+
56+
```bash
57+
export FEAST_OPENLINEAGE_ENABLED=true
58+
export FEAST_OPENLINEAGE_TRANSPORT_TYPE=http
59+
export FEAST_OPENLINEAGE_URL=http://localhost:5000
60+
export FEAST_OPENLINEAGE_ENDPOINT=api/v1/lineage
61+
export FEAST_OPENLINEAGE_NAMESPACE=feast
62+
```
63+
64+
## Usage
65+
66+
Once configured, lineage is tracked automatically:
67+
68+
```python
69+
from feast import FeatureStore
70+
from datetime import datetime, timedelta
71+
72+
# Create FeatureStore - OpenLineage is initialized automatically if configured
73+
fs = FeatureStore(repo_path="feature_repo")
74+
75+
# Apply operations emit lineage events automatically
76+
fs.apply([driver_entity, driver_hourly_stats_view])
77+
78+
# Materialize emits START, COMPLETE/FAIL events automatically
79+
fs.materialize(
80+
start_date=datetime.now() - timedelta(days=1),
81+
end_date=datetime.now()
82+
)
83+
84+
```
85+
86+
## Configuration Options
87+
88+
| Option | Default | Description |
89+
|--------|---------|-------------|
90+
| `enabled` | `false` | Enable/disable OpenLineage integration |
91+
| `transport_type` | `http` | Transport type: `http`, `file`, `kafka` |
92+
| `transport_url` | - | URL for HTTP transport (required) |
93+
| `transport_endpoint` | `api/v1/lineage` | API endpoint for HTTP transport |
94+
| `api_key` | - | Optional API key for authentication |
95+
| `namespace` | `feast` | Namespace for lineage events (uses project name if set to "feast") |
96+
| `producer` | `feast` | Producer identifier |
97+
| `emit_on_apply` | `true` | Emit events on `feast apply` |
98+
| `emit_on_materialize` | `true` | Emit events on materialization |
99+
100+
## Lineage Graph Structure
101+
102+
When you run `feast apply`, Feast creates a lineage graph that matches the Feast UI:
103+
104+
```
105+
DataSources ──┐
106+
├──→ feast_feature_views_{project} ──→ FeatureViews
107+
Entities ─────┘ │
108+
109+
110+
feature_service_{name} ──→ FeatureService
111+
```
112+
113+
**Jobs created:**
114+
- `feast_feature_views_{project}`: Shows DataSources + Entities → FeatureViews
115+
- `feature_service_{name}`: Shows specific FeatureViews → FeatureService (one per service)
116+
117+
**Datasets include:**
118+
- Schema with feature names, types, descriptions, and tags
119+
- Feast-specific facets with metadata (TTL, entities, owner, etc.)
120+
- Documentation facets with descriptions
121+
122+
## Transport Types
123+
124+
### HTTP Transport (Recommended for Production)
125+
126+
```yaml
127+
openlineage:
128+
enabled: true
129+
transport_type: http
130+
transport_url: http://marquez:5000
131+
transport_endpoint: api/v1/lineage
132+
api_key: your-api-key # Optional
133+
```
134+
135+
### File Transport
136+
137+
```yaml
138+
openlineage:
139+
enabled: true
140+
transport_type: file
141+
additional_config:
142+
log_file_path: openlineage_events.json
143+
```
144+
145+
### Kafka Transport
146+
147+
```yaml
148+
openlineage:
149+
enabled: true
150+
transport_type: kafka
151+
additional_config:
152+
bootstrap_servers: localhost:9092
153+
topic: openlineage.events
154+
```
155+
156+
## Custom Feast Facets
157+
158+
The integration includes custom Feast-specific facets in lineage events:
159+
160+
### FeastFeatureViewFacet
161+
162+
Captures metadata about feature views:
163+
- `name`: Feature view name
164+
- `ttl_seconds`: Time-to-live in seconds
165+
- `entities`: List of entity names
166+
- `features`: List of feature names
167+
- `online_enabled` / `offline_enabled`: Store configuration
168+
- `description`: Feature view description
169+
- `tags`: Key-value tags
170+
171+
### FeastFeatureServiceFacet
172+
173+
Captures metadata about feature services:
174+
- `name`: Feature service name
175+
- `feature_views`: List of feature view names
176+
- `feature_count`: Total number of features
177+
- `description`: Feature service description
178+
- `tags`: Key-value tags
179+
180+
### FeastMaterializationFacet
181+
182+
Captures materialization run metadata:
183+
- `feature_views`: Feature views being materialized
184+
- `start_date` / `end_date`: Materialization window
185+
- `rows_written`: Number of rows written
186+
187+
## Lineage Visualization
188+
189+
Use [Marquez](https://marquezproject.ai/) to visualize your Feast lineage:
190+
191+
```bash
192+
# Start Marquez
193+
docker run -p 5000:5000 -p 3000:3000 marquezproject/marquez
194+
195+
# Configure Feast to emit to Marquez (in feature_store.yaml)
196+
# openlineage:
197+
# enabled: true
198+
# transport_type: http
199+
# transport_url: http://localhost:5000
200+
```
201+
202+
Then access the Marquez UI at http://localhost:3000 to see your feature lineage.
203+
204+
## Namespace Behavior
205+
206+
- If `namespace` is set to `"feast"` (default): Uses project name as namespace (e.g., `my_project`)
207+
- If `namespace` is set to a custom value: Uses `{namespace}/{project}` (e.g., `custom/my_project`)
208+
209+
## Feast to OpenLineage Mapping
210+
211+
| Feast Concept | OpenLineage Concept |
212+
|---------------|---------------------|
213+
| DataSource | InputDataset |
214+
| FeatureView | OutputDataset (of feature views job) / InputDataset (of feature service job) |
215+
| Feature | Schema field |
216+
| Entity | InputDataset |
217+
| FeatureService | OutputDataset |
218+
| Materialization | RunEvent (START/COMPLETE/FAIL) |

docs/reference/type-system.md

Lines changed: 43 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
## Motivation
44

55
Feast uses an internal type system to provide guarantees on training and serving data.
6-
Feast supports primitive types, array types, and map types for feature values.
6+
Feast supports primitive types, array types, set types, and map types for feature values.
77
Null types are not supported, although the `UNIX_TIMESTAMP` type is nullable.
88
The type system is controlled by [`Value.proto`](https://github.com/feast-dev/feast/blob/master/protos/feast/types/Value.proto) in protobuf and by [`types.py`](https://github.com/feast-dev/feast/blob/master/sdk/python/feast/types.py) in Python.
99
Type conversion logic can be found in [`type_map.py`](https://github.com/feast-dev/feast/blob/master/sdk/python/feast/type_map.py).
@@ -40,6 +40,23 @@ All primitive types have corresponding array (list) types:
4040
| `Array(Bool)` | `List[bool]` | List of booleans |
4141
| `Array(UnixTimestamp)` | `List[datetime]` | List of timestamps |
4242

43+
### Set Types
44+
45+
All primitive types (except Map) have corresponding set types for storing unique values:
46+
47+
| Feast Type | Python Type | Description |
48+
|------------|-------------|-------------|
49+
| `Set(Int32)` | `Set[int]` | Set of unique 32-bit integers |
50+
| `Set(Int64)` | `Set[int]` | Set of unique 64-bit integers |
51+
| `Set(Float32)` | `Set[float]` | Set of unique 32-bit floats |
52+
| `Set(Float64)` | `Set[float]` | Set of unique 64-bit floats |
53+
| `Set(String)` | `Set[str]` | Set of unique strings |
54+
| `Set(Bytes)` | `Set[bytes]` | Set of unique binary data |
55+
| `Set(Bool)` | `Set[bool]` | Set of unique booleans |
56+
| `Set(UnixTimestamp)` | `Set[datetime]` | Set of unique timestamps |
57+
58+
**Note:** Set types automatically remove duplicate values. When converting from lists or other iterables to sets, duplicates are eliminated.
59+
4360
### Map Types
4461

4562
Map types allow storing dictionary-like data structures:
@@ -60,7 +77,7 @@ from datetime import timedelta
6077
from feast import Entity, FeatureView, Field, FileSource
6178
from feast.types import (
6279
Int32, Int64, Float32, Float64, String, Bytes, Bool, UnixTimestamp,
63-
Array, Map
80+
Array, Set, Map
6481
)
6582

6683
# Define a data source
@@ -101,6 +118,12 @@ user_features = FeatureView(
101118
Field(name="notification_settings", dtype=Array(Bool)),
102119
Field(name="login_timestamps", dtype=Array(UnixTimestamp)),
103120

121+
# Set types (unique values only)
122+
Field(name="visited_pages", dtype=Set(String)),
123+
Field(name="unique_categories", dtype=Set(Int32)),
124+
Field(name="tag_ids", dtype=Set(Int64)),
125+
Field(name="preferred_languages", dtype=Set(String)),
126+
104127
# Map types
105128
Field(name="user_preferences", dtype=Map),
106129
Field(name="metadata", dtype=Map),
@@ -110,6 +133,24 @@ user_features = FeatureView(
110133
)
111134
```
112135

136+
### Set Type Usage Examples
137+
138+
Sets store unique values and automatically remove duplicates:
139+
140+
```python
141+
# Simple set
142+
visited_pages = {"home", "products", "checkout", "products"} # "products" appears twice
143+
# Feast will store this as: {"home", "products", "checkout"}
144+
145+
# Integer set
146+
unique_categories = {1, 2, 3, 2, 1} # duplicates will be removed
147+
# Feast will store this as: {1, 2, 3}
148+
149+
# Converting a list with duplicates to a set
150+
tag_list = [100, 200, 300, 100, 200]
151+
tag_ids = set(tag_list) # {100, 200, 300}
152+
```
153+
113154
### Map Type Usage Examples
114155

115156
Maps can store complex nested data structures:
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Feast OpenLineage Integration Example
2+
3+
This example demonstrates Feast's **native OpenLineage integration** for automatic data lineage tracking.
4+
5+
For full documentation, see the [OpenLineage Reference](../../docs/reference/openlineage.md).
6+
7+
## Prerequisites
8+
9+
```bash
10+
pip install feast[openlineage]
11+
```
12+
13+
## Running the Demo
14+
15+
1. Start Marquez:
16+
```bash
17+
docker run -p 5000:5000 -p 3000:3000 marquezproject/marquez
18+
```
19+
20+
2. Run the demo:
21+
```bash
22+
python openlineage_demo.py --url http://localhost:5000
23+
```
24+
25+
3. View lineage at http://localhost:3000
26+
27+
## What the Demo Shows
28+
29+
The demo creates a sample feature repository and demonstrates:
30+
31+
- **Entity**: `driver_id`
32+
- **DataSource**: `driver_stats_source` (Parquet file)
33+
- **FeatureView**: `driver_hourly_stats` with features like conversion rate, acceptance rate
34+
- **FeatureService**: `driver_stats_service` aggregating features
35+
36+
When you run the demo, it will:
37+
1. Create the feature store with OpenLineage enabled
38+
2. Apply the features (emits lineage events)
39+
3. Materialize features (emits START/COMPLETE events)
40+
4. Retrieve features (demonstrates online feature retrieval)
41+
42+
## Lineage Graph
43+
44+
After running the demo, you'll see this lineage in Marquez:
45+
46+
```
47+
driver_stats_source ──┐
48+
├──→ feast_feature_views_openlineage_demo ──→ driver_hourly_stats
49+
driver_id ────────────┘ │
50+
51+
feature_service_driver_stats_service ──→ driver_stats_service
52+
```
53+
54+
## Learn More
55+
56+
- [Feast OpenLineage Reference](../../docs/reference/openlineage.md)
57+
- [OpenLineage Documentation](https://openlineage.io/docs)
58+
- [Marquez Project](https://marquezproject.ai)

0 commit comments

Comments
 (0)