Skip to content

Commit 9129ff0

Browse files
update docs wip
1 parent 4d58aca commit 9129ff0

7 files changed

Lines changed: 368 additions & 477 deletions

File tree

docs/guides/features/cli.md

Lines changed: 63 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,26 +6,83 @@
66
The CLI requires [typer](https://typer.tiangolo.com/) to be installed. You can install it with `pip install typer` or `pixi add typer`.
77
```
88

9+
We continue with the supermarket data pipeline scenario from the previous guides. The two data loads have been saved as parquet files:
10+
11+
- `previous_load.parquet` — the previous data load
12+
- `current_load.parquet` — the current data load
13+
914
## Basic usage
1015

1116
```bash
12-
diffly left.parquet right.parquet
17+
diffly previous_load.parquet current_load.parquet
1318
```
1419

15-
This compares two parquet files and prints a formatted summary of the differences.
20+
This compares two parquet files and prints a formatted summary:
21+
22+
```text
23+
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
24+
┃ Diffly Summary ┃
25+
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
26+
Attention: the data frames do not match exactly, but as no primary key columns are
27+
provided, the row and column matches cannot be computed.
28+
29+
Schemas
30+
▔▔▔▔▔▔▔
31+
Schemas match exactly (column count: 10).
32+
33+
Rows
34+
▔▔▔▔
35+
The number of rows matches exactly (row count: 12).
36+
```
37+
38+
Without a primary key, `diffly` can only compare schemas and row counts. To enable row-level comparison, specify a primary key.
1639

1740
## Specifying a primary key
1841

1942
To enable row-level comparison, specify one or more primary key columns:
2043

2144
```bash
22-
diffly left.parquet right.parquet --primary-key id
45+
diffly previous_load.parquet current_load.parquet --primary-key transaction_id
2346
```
2447

25-
For composite keys, use multiple `--primary-key` options:
48+
```text
49+
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
50+
┃ Diffly Summary ┃
51+
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
52+
Primary key: transaction_id
2653
27-
```bash
28-
diffly left.parquet right.parquet --primary-key id --primary-key timestamp
54+
Schemas
55+
▔▔▔▔▔▔▔
56+
Schemas match exactly (column count: 10).
57+
58+
Rows
59+
▔▔▔▔
60+
Left count Right count
61+
12 (no change) 12
62+
63+
┏━┯━┯━┯━┯━┓
64+
┃-│-│-│-│-┃ 2 left only (16.67%)
65+
┠─┼─┼─┼─┼─┨╌╌╌┏━┯━┯━┯━┯━┓╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╮
66+
┃ │ │ │ │ ┃ = ┃ │ │ │ │ ┃ 6 equal (60.00%) │
67+
┠─┼─┼─┼─┼─┨╌╌╌┠─┼─┼─┼─┼─┨╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌├╴ 10 joined
68+
┃ │ │ │ │ ┃ ≠ ┃ │ │ │ │ ┃ 4 unequal (40.00%) │
69+
┗━┷━┷━┷━┷━┛╌╌╌┠─┼─┼─┼─┼─┨╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╯
70+
┃+│+│+│+│+┃ 2 right only (16.67%)
71+
┗━┷━┷━┷━┷━┛
72+
73+
Columns
74+
▔▔▔▔▔▔▔
75+
┌─────────────────┬─────────┐
76+
│ discount │ 70.00% │
77+
│ loyalty_card_id │ 90.00% │
78+
│ product │ 100.00% │
79+
│ quantity │ 100.00% │
80+
│ register_id │ 100.00% │
81+
│ store_id │ 100.00% │
82+
│ timestamp │ 100.00% │
83+
│ total │ 70.00% │
84+
│ unit_price │ 70.00% │
85+
└─────────────────┴─────────┘
2986
```
3087

3188
## Options
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
transaction_id,timestamp,store_id,register_id,product,quantity,unit_price,discount,total,loyalty_card_id
2+
TXN-001,2024-03-01T09:01:00.000000,S1,R1,Milk,2,1.5,0.0,3.0000000001,LC-1001
3+
TXN-002,2024-03-01T09:15:00.000000,S1,R1,Bread,1,2.8,0.0,2.8,
4+
TXN-003,2024-03-01T10:02:00.000000,S1,R2,Eggs,1,3.2,0.0,3.1999999999,LC-1003
5+
TXN-004,2024-03-01T10:30:00.000000,S1,R2,Butter,3,1.9,0.1,5.6,LC-9999
6+
TXN-005,2024-03-01T11:00:00.000000,S1,R1,Cheese,1,4.5,0.0,4.5,
7+
TXN-006,2024-03-01T11:20:00.000000,S2,R4,Apples,4,1.5,0.2,5.8,LC-2001
8+
TXN-007,2024-03-01T11:45:00.000000,S2,R4,Chicken,2,10.8,1.0,20.6,LC-2002
9+
TXN-008,2024-03-01T12:10:00.000000,S2,R4,Rice,1,4.2,0.1,4.1,
10+
TXN-009,2024-03-01T13:00:00.000000,S1,R1,Yogurt,2,1.2,0.0,2.4000000001,LC-1006
11+
TXN-010,2024-03-01T13:30:00.000000,S1,R2,Juice,3,3.0,0.0,9.0,
12+
TXN-013,2024-03-01T14:00:00.000000,S1,R1,Coffee,1,6.5,0.0,6.5,LC-1008
13+
TXN-014,2024-03-01T14:15:00.000000,S1,R2,Bananas,5,1.2,0.0,6.0,
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
transaction_id,timestamp,store_id,register_id,product,quantity,unit_price,discount,total,loyalty_card_id
2+
TXN-001,2024-03-01T09:01:00.000000,S1,R1,Milk,2,1.5,0.0,3.0,LC-1001
3+
TXN-002,2024-03-01T09:15:00.000000,S1,R1,Bread,1,2.8,0.0,2.8,
4+
TXN-003,2024-03-01T10:02:00.000000,S1,R2,Eggs,1,3.2,0.0,3.2,LC-1003
5+
TXN-004,2024-03-01T10:30:00.000000,S1,R2,Butter,3,1.9,0.1,5.6,LC-1004
6+
TXN-005,2024-03-01T11:00:00.000000,S1,R1,Cheese,1,4.5,0.0,4.5,
7+
TXN-006,2024-03-01T11:20:00.000000,S2,R4,Apples,4,0.75,0.0,3.0,LC-2001
8+
TXN-007,2024-03-01T11:45:00.000000,S2,R4,Chicken,2,5.4,0.5,10.3,LC-2002
9+
TXN-008,2024-03-01T12:10:00.000000,S2,R4,Rice,1,2.1,0.0,2.1,
10+
TXN-009,2024-03-01T13:00:00.000000,S1,R1,Yogurt,2,1.2,0.0,2.4,LC-1006
11+
TXN-010,2024-03-01T13:30:00.000000,S1,R2,Juice,3,3.0,0.0,9.0,
12+
TXN-011,2024-03-01T08:00:00.000000,S2,R3,Soap,1,2.5,0.0,2.5,LC-2004
13+
TXN-012,2024-03-01T08:20:00.000000,S2,R3,Pasta,2,1.8,0.0,3.6,LC-2005

0 commit comments

Comments
 (0)