Storage, execution, memory, and scheduling model.
Architecture
FrameX is structured as a layered local execution engine:
Storage Model
- Core tabular storage is Arrow (
pyarrow.Table/RecordBatch) - DataFrame partitions are represented as
Partitionobjects - Columnar layout improves analytic scans and interoperability
Execution Model
- Eager by default for Pandas-like ergonomics
- Lazy mode records operations in
LazyFrameand executes withcollect() - Lazy operations currently execute as ordered transformations over the source frame
Concurrency Model
FrameX provides multiple execution backends:
- threads for numeric paths that release the GIL
- processes for Python-heavy/object-heavy tasks
- Ray for local object-store-backed task execution
- Dask for task-graph execution and compatibility with Dask runtime tooling
framex.runtime.executor.detect_backend(...) and config controls determine backend behavior.
Array Acceleration
NDArray ufunc execution supports configurable acceleration paths:
numpy(default)numexprfor expression-heavy CPU kernelsnumbafor JIT-assisted functionscupyfor GPU-capable environments
These are selected via set_array_backend(...) or the config(...) context manager.
Memory and Transport
- Arrow buffers are the primary in-memory representation
- shared memory and memory-mapped strategies are used for efficient transfer patterns
- serializer options are configurable (
arrow,pickle5,pickle)
Interchange and Compatibility
FrameX exposes compatibility as an interface layer:
- dataframe interchange via
__dataframe__ - NumPy protocols via
__array_ufunc__and__array_function__ - explicit conversion methods to Pandas/Arrow/Dask/Ray
This allows internal optimization without requiring full internal Pandas semantics at every layer.