You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**Phaeton** is a high-performance, memory-efficient data cleaning engine for Python, powered by **Rust**.
8
+
> ⚠️ **Project Status:** Phaeton is currently in **Experimental Beta (v0.2.0)**.
9
+
> The core streaming engine is functional, but the library is currently under limited maintenance due to the author's personal schedule.
9
10
10
-
It is designed to be the **"Gatekeeper"** of your data pipeline. Phaeton sanitizes, validates, and standardizes massive datasets (GBs/TBs) using a streaming architecture before they enter your analysis tools (like Pandas, Polars, or ML models).
11
11
12
-
> **Why Phaeton?** Because cleaning 10GB of dirty CSVs shouldn't require 32GB of RAM.
12
+
**Phaeton** is a specialized, Rust-powered preprocessing engine designed to sanitize raw data streams before they reach your analytical environment.
13
+
14
+
It acts as the strictly typed **"Gatekeeper"** of your data pipeline. Unlike traditional DataFrame libraries that load entire datasets into RAM, Phaeton employs a **zero-copy streaming architecture**. It processes data chunk-by-chunk—filtering noise, fixing encodings, and standardizing formats ensuring **O(1) memory complexity**.
15
+
16
+
This allows you to process massive datasets (GBs/TBs) on standard hardware without memory spikes, delivering clean, high-quality data to downstream tools like Pandas, Polars, or ML models.
17
+
18
+
> **The Philosophy:** Don't waste memory loading garbage. Clean the stream first, then analyze the gold.
13
19
14
20
---
15
21
@@ -18,7 +24,7 @@ It is designed to be the **"Gatekeeper"** of your data pipeline. Phaeton sanitiz
18
24
***Streaming Architecture:** Processes files chunk-by-chunk. Memory usage remains flat and low regardless of file size.
19
25
***Parallel Execution:** Utilizes all CPU cores via Rayon (Rust) for heavy lifting (Regex, Fuzzy Matching).
20
26
***Strict Quarantine:** Bad data isn't just dropped; it's quarantined into a separate file with a generated `_phaeton_reason` column for auditing.
0 commit comments