Install FrameX and run your first end-to-end dataframe pipeline.

Getting Started

This guide takes you from install to a complete mini pipeline: load data, transform it, aggregate it, and export results.

1. Install

pip install pyframe-xpy

Optional extras:

pip install pyframe-xpy[distributed]  # Dask + Ray runtime integrations
pip install pyframe-xpy[accel]        # numexpr + numba

2. Import and Create a DataFrame

import framex as fx

df = fx.DataFrame(
    {
        "customer_id": [101, 102, 101, 103, 102],
        "country": ["TH", "US", "TH", "JP", "US"],
        "amount": [120.0, 80.5, 45.0, 220.0, 99.5],
        "is_refund": [False, False, True, False, False],
    }
)

print(df.shape)        # (5, 4)
print(df.columns)      # ['customer_id', 'country', 'amount', 'is_refund']

3. Filter and Add a Derived Column

clean = df.filter(~df["is_refund"])

enriched = clean.assign(
    amount_with_tax=lambda d: d["amount"] * 1.07,
)

4. Group and Aggregate

summary = (
    enriched
    .groupby("country")
    .agg({"amount": ["sum", "mean", "count"]})
    .sort("amount_sum", ascending=False)
)

print(summary.to_pandas())

5. Write and Read Parquet

fx.write_parquet(summary, "country_summary.parquet")
roundtrip = fx.read_parquet("country_summary.parquet")

6. Convert to Pandas or Arrow

pdf = roundtrip.to_pandas()
table = roundtrip.to_arrow()

7. Optional Lazy Mode

For longer transformation chains:

lazy_result = (
    df.lazy()
    .filter(lambda d: ~d["is_refund"])
    .with_column("amount_with_tax", lambda d: d["amount"] * 1.07)
    .groupby("country")
    .agg({"amount_with_tax": "sum"})
    .collect()
)

Move on to Tutorial: ETL Pipeline for a realistic scenario.