Skip to content

Conversation

@patchmemory
Copy link
Owner

Summary

This PR includes three completed high-priority tasks from the roadmap:

1. Jupyter Notebook Streaming Parser (RICE 3.5) ✅

  • Refactored ipynb interpreter for true streaming with ijson
  • 97.9% memory reduction for large notebooks (exceeds 40% target)
  • Added comprehensive memory profiling tests
  • Tutorial documentation created

2. Per-Folder Config Precedence (RICE 3.4) ✅

  • Fixed .scidk.toml precedence to honor per-folder configs
  • Added Python 3.10 compatibility (tomllib/tomli fallback)
  • Sibling folders now correctly apply their own rules
  • Test test_folder_config_precedence now passes

3. Operational Logs Endpoint (RICE 3.0) ✅

  • Added GET /api/logs with pagination and filtering
  • Filters: level (INFO/ERROR/etc), since_ts (timestamp)
  • Privacy guardrails: no sensitive data exposed
  • 5 comprehensive tests added

Test Results

✅ 122/122 non-e2e tests pass
✅ All new functionality tested
✅ No regressions detected

Changes Summary

  • scidk/interpreters/ipynb_interpreter.py - Streaming refactor
  • scidk/core/folder_config.py - Python 3.10 compat + precedence
  • scidk/services/scans_service.py - Use load_effective_config
  • scidk/app.py - Added /api/logs endpoint
  • tests/test_ipynb_interpreter.py - Memory profiling tests
  • tests/test_logs_endpoint.py - Logs API tests
  • docs/ipynb-streaming-optimization.md - Tutorial
  • pyproject.toml + requirements.txt - Added ijson dependency

Migration Notes

  • ijson is now a required dependency (install via pip)
  • No API changes for existing code
  • Folder configs now work correctly for sibling directories

Resolves:

  • task:interpreters/refactor/ipynb-streaming
  • task:interpreters/toggles/folder-config
  • task:ops/mvp/metrics-and-logs-endpoints

🤖 Generated with Claude Code

patchmemory and others added 7 commits January 16, 2026 16:00
…ory reduction

Refactored Jupyter notebook interpreter to use pure streaming parsing:
- Made ijson a required dependency (was optional/fallback)
- Removed all full-load fallbacks that defeated streaming purpose
- Optimized streaming parser to count all cells while limiting content sampling
- Version bumped from 0.2.0 to 0.3.0

Memory efficiency improvements:
- Achieved 97.9% memory reduction vs full-load parsing (far exceeds 40% target)
- For 3.6MB notebook: streaming uses ~165KB vs ~8MB for full load
- All cells counted accurately regardless of notebook size

Tests added:
- Small notebook memory efficiency test (< 1MB peak)
- Large notebook memory reduction test (validates >=40% reduction)
- Large notebook cell counting accuracy test (1500 cells)
- Streaming extracts imports and headings correctly

All ipynb-related tests pass. Resolves task:interpreters/refactor/ipynb-streaming.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Implements streaming parser for Jupyter notebooks with 97.9% memory reduction.

- Refactored ipynb interpreter to use ijson streaming (no full-load fallbacks)
- Added comprehensive memory profiling tests
- Added tutorial documentation
- Version bump to 0.3.0

Resolves task:interpreters/refactor/ipynb-streaming
… resolution

Fixed two issues preventing per-folder configuration from working correctly:

1. **Python 3.10 compatibility**: Added tomllib/tomli fallback
   - Python 3.11+ has tomllib built-in
   - Python 3.10 needs tomli backport
   - Config loading was failing silently on Python 3.10

2. **Removed inline config reading in scans_service**:
   - Previously read `.scidk.toml` directly without precedence
   - Now uses `load_effective_config()` consistently
   - Properly walks up directory tree and merges configs
   - Closest config wins (child overrides parent)

Test results:
- ✅ test_folder_config_precedence_includes_excludes now passes
- ✅ All 117 non-e2e tests pass
- ✅ Per-folder rules honored for sibling directories

Resolves task:interpreters/toggles/folder-config

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Fixes per-folder .scidk.toml precedence and Python 3.10 compatibility.

- Added tomllib/tomli fallback for Python 3.10
- Fixed scans_service to use load_effective_config consistently
- Test now passes: sibling directories honor their own configs

Resolves task:interpreters/toggles/folder-config
Added operational logs browsing endpoint alongside existing /api/metrics:

**Endpoint features:**
- GET /api/logs with pagination (limit, offset)
- Filtering by level (INFO, ERROR, etc.)
- Filtering by timestamp (since_ts)
- Returns: ts, level, message, context
- Privacy: No sensitive file paths or user data exposed

**Implementation:**
- Queries logs table from SQLite
- Max limit capped at 1000 entries
- Results ordered by timestamp DESC (most recent first)
- Graceful error handling

**Tests added:**
- Endpoint existence and structure validation
- Pagination functionality
- Level filter verification
- Timestamp filter verification
- Privacy guardrails (no sensitive fields exposed)

All 122 non-e2e tests pass. Resolves task:ops/mvp/metrics-and-logs-endpoints.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Adds /api/logs endpoint for operational log browsing.

- Pagination and filtering (level, since_ts)
- Privacy guardrails (no sensitive data)
- Comprehensive test coverage

Resolves task:ops/mvp/metrics-and-logs-endpoints
@patchmemory patchmemory merged commit 58aabd6 into main Jan 16, 2026
2 checks passed
@patchmemory patchmemory deleted the task/combined-folder-config-and-logs-pr branch January 16, 2026 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants