feat: add keyword argument support for train_from_stream (#114)#155
feat: add keyword argument support for train_from_stream (#114)#155Yegorov wants to merge 4 commits into
train_from_stream (#114)#155Conversation
Greptile SummaryThis PR extends
Confidence Score: 3/5The multi-category keyword API is a useful addition, but the LSI implementation rebuilds the index after each category stream and the missing-argument case silently does nothing across all classifiers. The LSI build_index regression means any caller who streams multiple categories while auto_rebuild is enabled will trigger redundant and increasingly expensive index rebuilds. The silent no-op for partial arguments removes a safety net that existed for free under the old required-parameter contract. lib/classifier/lsi.rb needs the auto_rebuild guard hoisted outside the loop; lib/classifier/bayes.rb and logistic_regression.rb need an explicit guard for the partial-argument case; lib/classifier/streaming.rb base stub should be updated to match the optional signature. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["train_from_stream(category=nil, io=nil, **categories)"] --> B{category && io?}
B -- Yes --> C["{ category => io }"]
B -- No --> D["categories hash"]
C --> E[".each do |cat, stream|"]
D --> E
E --> F[Validate stream]
F --> G[LineReader + Progress]
G --> H[each_batch]
H --> I[Train + update progress]
I --> N{More batches?}
N -- Yes --> H
N -- No --> O{More categories?}
O -- Yes --> E
O -- No --> P[Done]
subgraph LSI_BUG ["LSI only"]
Q["save auto_rebuild, set false"]
R["ensure: restore + build_index per-category"]
end
E -.-> Q
Q --> G
G -.-> R
Prompt To Fix All With AIFix the following 3 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 3
lib/classifier/lsi.rb:667-688
**`build_index` called once per category instead of once after all streams**
The `begin/ensure` block (which saves/restores `@auto_rebuild` and calls `build_index`) is now inside the `.each` loop. When multiple categories are provided, `build_index` fires after every stream rather than once at the end. This defeats the original optimisation: if `auto_rebuild` was `true` before the call, you'll pay the full re-index cost for each category, and every intermediate rebuild immediately becomes obsolete. The fix is to hoist the `auto_rebuild` save, the `@auto_rebuild = false` assignment, and the `ensure` block outside the loop so the index is rebuilt exactly once after all categories have been loaded.
### Issue 2 of 3
lib/classifier/bayes.rb:332-333
**Silent no-op when only one of `category`/`io` is supplied**
When a caller passes only `category` (e.g. `train_from_stream(:spam, batch_size: 100)`) or only `io`, the ternary resolves to `categories` which is an empty hash, so the method returns without training or raising. Before this PR, the required positional parameters made such a call a Ruby `ArgumentError`. Consider an explicit guard to preserve that contract.
```suggestion
def train_from_stream(category = nil, io = nil, batch_size: Streaming::DEFAULT_BATCH_SIZE, **categories)
raise ArgumentError, 'Provide either (category, io) or keyword category: io pairs' if category.nil? == io.nil? && category.nil? && categories.empty?
raise ArgumentError, 'Provide both category and io, or use keyword arguments' if category.nil? ^ io.nil?
(category && io ? { category => io } : categories).each do |category, io|
```
### Issue 3 of 3
lib/classifier/streaming.rb:29-30
**Base method signature still requires `category` and `io` as positional arguments**
The `Streaming` module's stub still declares `category` and `io` as required, but every concrete implementation now makes them optional. A caller coding to the base interface would be incorrectly told these are mandatory. Update the stub and its `@rbs` annotation to match.
```suggestion
# @rbs (?(Symbol | String), ?IO, ?batch_size: Integer, **Hash[Symbol, IO]) { (Progress) -> void } -> void
def train_from_stream(category = nil, io = nil, batch_size: DEFAULT_BATCH_SIZE, **categories, &block)
```
Reviews (1): Last reviewed commit: "feat: add keyword argument support for `..." | Re-trigger Greptile |
|
@cardmagic can you take a look, please? |
Closes #114