Skip to content

Commit f14e5ed

Browse files
docs(spark): clarify LicenseException behavior, add SparkContext CAUTION
1 parent ffcf4df commit f14e5ed

1 file changed

Lines changed: 17 additions & 21 deletions

File tree

docs/LINQ-to-Spark.md

Lines changed: 17 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,9 @@ using var context = Spark.Connect(SparkMaster.Yarn(), "MyApp", o => {
120120
});
121121
```
122122

123+
> [!CAUTION]
124+
> **One SparkContext per process.** The JVM shares a single `SparkContext`. Disposing any `Spark.Connect()` context kills the shared SparkContext for ALL instances in the same process. Do not create multiple contexts in the same application — reuse a single context throughout.
125+
123126
### Reading Data
124127

125128
```csharp
@@ -137,22 +140,20 @@ var highValue = orders.Where(o => o.Amount > 1000);
137140

138141
### Pushing In-Memory Data
139142

140-
Push local data to Spark for distributed processing. Automatically batches large data for O(1) memory:
143+
Push local data to Spark for distributed processing. Use the fluent `.Push(context)` syntax:
141144

142145
```csharp
143-
// Small data - fast in-memory path
144-
var testData = new[] { new Order { Id = 1, Amount = 100 } };
145-
var query = context.Push(testData);
146+
// Fluent syntax (recommended) — works with any IEnumerable<T>
147+
var query = testData.Push(context).Where(x => x.Active);
146148

147-
// Large data - automatically batched (O(1) memory)
148-
var millionRows = GenerateLargeDataset();
149-
var query = context.Push(millionRows); // Same API, auto-optimized!
149+
// Async sources (IAsyncEnumerable<T>) — returns Task<SparkQuery<T>>
150+
var query = await asyncStream.Push(context);
150151

151-
// Fluent syntax
152-
var enriched = localData.Push(context).Where(x => x.Active);
152+
// Custom batch size for large datasets
153+
var query = largeData.Push(context, batchSize: 50_000);
153154

154-
// Custom batch size
155-
var query = context.Push(data, batchSize: 50_000);
155+
// Context method syntax (equivalent, IEnumerable only)
156+
var query = context.Push(testData);
156157
```
157158

158159
### Grouping and Aggregation
@@ -261,7 +262,7 @@ Process multiple conditions in a single pass using the `Cases` API:
261262

262263
```csharp
263264
// 1. Categorize
264-
var results = query.Cases(
265+
var results = (await query.Cases(
265266
x => x.Amount > 1000, // Premium (category 0)
266267
x => x.Amount > 500 // Standard (category 1)
267268
// Default: Basic (category 2)
@@ -272,17 +273,12 @@ var results = query.Cases(
272273
standard => new { Id = standard.Id, Tag = "Regular" },
273274
basic => new { Id = basic.Id, Tag = "Economy" }
274275
)
275-
// Or use object initializers with concrete types:
276-
// .SelectCase(
277-
// premium => new OrderTag { Id = premium.Id, Label = "VIP" },
278-
// standard => new OrderTag { Id = standard.Id, Label = "Regular" }
279-
// )
280276
// 3. Dispatch (Write to different tables — async lambdas)
281-
await .ForEachCase(
277+
.ForEachCase(
282278
async vip => await vip.WriteTable("VIP_ORDERS", overwrite: true),
283279
async reg => await reg.WriteTable("REG_ORDERS", overwrite: true),
284280
async eco => await eco.WriteTable("ECO_ORDERS", overwrite: true)
285-
)
281+
))
286282
// 4. Extract results (unwraps the tuple to flat items)
287283
.AllCases()
288284
.OrderBy(r => r.Id)
@@ -457,13 +453,13 @@ DataLinq.Spark uses a tiered licensing model:
457453

458454
### Zero-Friction Onboarding
459455

460-
No license key is required to get started. When no license is detected, DataLinq automatically enables the Development tier, which provides full API access with a 1,000-row limit on action methods (`ToArray()`, `Pull()`, `Count()`, `First()`).
456+
No license key is required to get started. When no license is detected, DataLinq automatically enables the Development tier, which provides full API access with a 1,000-row limit on action methods (`ToArray()`, `Pull()`, `Count()`, `First()`). **Exceeding the limit throws a `LicenseException`** — results are not silently truncated.
461457

462458
```csharp
463459
// Works immediately — no license needed!
464460
var orders = context.Read.Parquet<Order>("/data/orders")
465461
.Where(o => o.Amount > 100)
466-
.ToArray(); // Limited to 1,000 rows in Development tier
462+
.ToArray(); // Throws LicenseException if result exceeds 1,000 rows
467463
```
468464

469465
### Production License

0 commit comments

Comments
 (0)