Skip to content

Latest commit

 

History

History
629 lines (528 loc) · 27.3 KB

File metadata and controls

629 lines (528 loc) · 27.3 KB

Test Priority Map

This document maps all flows to priorities and testing layers.

Legend:

  • 🔴 P0 = Critical (app broken without it)
  • 🟡 P1 = Important (users notice if broken)
  • 🟢 P2 = Nice-to-have (edge cases, polish)

Testing Layers:

  • U = Unit test
  • I = Integration test
  • R = RNTL (component/screen)
  • E = E2E (full device)
  • C = Contract (native module)

Coverage Status:

  • ✅ = Test exists and covers this flow
  • ⚠️ = Partial coverage (some aspects tested)
  • ❌ = No test coverage (critical gap)
  • 📝 = Planned but not implemented

CURRENT TEST COVERAGE: ~90% of Core Functionality

Test Quality Rating: 9/10 - Comprehensive Coverage

Total: 1208 tests across 29 test suites (all passing)

✅ Well-Tested Areas (Strong Coverage)

  • E2E Tests (Maestro): 16 comprehensive P0 flows covering text/image generation, model download, attachments, app lifecycle
  • State Management: appStore, chatStore, authStore - Excellent unit tests
  • Generation Service: State machine, streaming, lifecycle - Very good unit + integration
  • Image Generation Flow: Integration tests, progress tracking, E2E - Excellent
  • Active Model Service: Model loading, memory checks, integration - Very good
  • Contract Tests: Native module interfaces validated (llama.rn, whisper.rn, LocalDream, CoreMLDiffusion, iOS DownloadManager)
  • Intent Classification: All 70+ patterns unit tested
  • RNTL Component Tests: ChatScreen, HomeScreen, ModelsScreen, ModelCard
  • Core Service Logic: All 6 previously-untested services now have comprehensive unit tests

✅ Service Unit Tests (228 tests added, Feb 2026)

  • llm.ts (P0): 45 tests - model loading, GPU fallback, generation, context window, tokenization
  • modelManager.ts (P0): 54 tests - download lifecycle, storage, orphan detection, background downloads, model scanning
  • backgroundDownloadService.ts (P0): 28 tests - platform availability, native module delegation, event listeners, polling
  • hardware.ts (P0): 39 tests - device info, memory calculations, model recommendations, byte formatting, device tiers
  • whisperService.ts (P1): 32 tests - model download/load/unload, permissions, transcription, file transcription
  • documentService.ts (P1): 30 tests - file type detection, reading, truncation, formatting, preview

✅ iOS Parity Tests (77 tests added, Feb 2026)

  • localDreamGenerator.ts (P0): 43 tests - Platform.select() routing, method delegation (Android/iOS), isAvailable edge cases, generateImage lifecycle, thread tracking, error handling
  • coreMLModelBrowser.ts (P0): 16 tests - HuggingFace API fetching, model shape validation, caching with TTL, error handling (Promise.allSettled), ID generation
  • iosDownloadManager contract (P0): 18 tests - iOS DownloadManagerModule interface parity with Android, event shapes, polling compatibility stubs

P0 - Critical Flows (Must Have Full Coverage)

These flows are core functionality. If broken, the app is unusable.

Text Generation Core

ID Flow U I R E C Status
9.1 Send text message ✅ Excellent (02-text-generation.yaml)
9.2 Message appears in chat ✅ E2E coverage
9.3 Generation starts ✅ Excellent
9.4 Streaming tokens ✅ Excellent coverage
9.6 Generation completes ✅ Excellent
9.7 Response saved ✅ E2E validates persistence
9.11 Stop generation ✅ Excellent (03-stop-generation.yaml)
9.12 Partial response saved ✅ E2E validates

✅ llm.ts now has 45 unit tests covering model loading, generation, context management, and more.

Model Loading Core

ID Flow U I R E C Status
8.1 Load text model ✅ Excellent (00-setup-model.yaml)
8.4 Model loaded confirmation ✅ E2E validates
8.8 Unload model ✅ Excellent (05c-model-unload.yaml)
8.11 Switch text models ✅ Excellent (05b-model-selection.yaml)

Model Download Core

ID Flow U I R E C Status
5.23 Start foreground download 📝 📝 ✅ Unit + E2E (05b-model-download.yaml)
5.24 Download progress display 📝 ✅ Unit + E2E
5.30 Download complete 📝 📝 ✅ Unit + E2E (validates 5min download)
5.41 View downloaded models 📝 ✅ Unit + E2E

✅ modelManager.ts (54 tests) and backgroundDownloadService.ts (28 tests) now have comprehensive unit tests.

Conversation Core

ID Flow U I R E C Status
15.1 Create new conversation ✅ Excellent (part of all E2E flows)
15.13 Switch conversations ✅ Excellent
15.17 Conversations persist ✅ Excellent (01-app-launch.yaml)
15.18 Messages persist ✅ Excellent

App Lifecycle Core

ID Flow U I R E C Status
23.5 Reopen after kill ✅ E2E uses clearState + relaunch
23.8 Settings restored ✅ E2E validates persistence

Chat UI Core

ID Flow U I R E C Status
16.12 Type message ✅ E2E validates (all flows)
16.14 Send button enabled ✅ Excellent
16.15 Send button disabled ✅ Excellent
16.18 Clear input after send ✅ E2E validates

P0 - Image Generation Core

ID Flow U I R E C Status
11.1 Auto-detect triggers generation ✅ Excellent (04-image-generation.yaml)
11.4 Generation progress ✅ E2E waits for completion (180s)
11.7 Generation completes ✅ Excellent
11.8 Image in chat ✅ E2E validates generated-image
11.9 Image in gallery ✅ Excellent
11.11 Cancel image generation 📝 ⚠️ No E2E for cancel flow

Intent Classification Core

ID Flow U I R E C Status
10.1 Clear text intent ✅ Unit tested
10.2 Clear image intent ✅ Unit tested
10.3 Question patterns → text ✅ Unit tested
10.4 Generation patterns → image ✅ Unit tested
10.5 Art style patterns → image ✅ Unit tested
10.6 Code patterns → text ✅ Unit tested
10.7 SD-specific → image ✅ Unit tested
10.8 Ambiguous prompt - pattern mode ✅ Unit tested

iOS Parity (Cross-Platform)

ID Flow U I R E C Status
11.29a Core ML backend (iOS) ✅ Unit + Contract (localDreamGenerator, coreMLDiffusion)
11.33a Platform routing ✅ Unit (localDreamGenerator - 43 tests)
6.11 Core ML model browsing ✅ Unit (coreMLModelBrowser - 16 tests)
7.15 iOS download parity ✅ Contract (iosDownloadManager - 18 tests)
7.16 iOS polling compatibility ✅ Contract
7.17 iOS completed download localUri ✅ Contract

P1 - Important Flows

These flows are important features. Users would notice if broken.

Onboarding

ID Flow U I R E C
1.1 Fresh install onboarding
1.2 Onboarding with model download
1.5 Onboarding completed flag

Authentication

ID Flow U I R E C
2.1 First-time passphrase setup
2.5 App lock on background
2.6 Successful unlock
2.7 Failed unlock attempt
2.9 Lockout triggered
2.13 Change passphrase
2.14 Disable passphrase

Model Management Extended

ID Flow U I R E C
5.1 Models tab loads
5.4 Search models by name
5.6-5.8 Filter by credibility
5.27 Download pause
5.28 Download resume
5.29 Download cancel
5.43 Delete downloaded model
5.37 Vision model download

Background Downloads (Android)

ID Flow U I R E C
7.1 Start background download
7.8 Download completes
7.9 App killed during download
7.10 App reopens - download complete

Generation Extended

ID Flow U I R E C
9.13 Retry generation
9.14 Edit user message
9.15 Regenerate from edit
9.17 Conversation history
9.18 System prompt applied
9.19 Context length limit
9.29 Generation error
9.33 Thinking blocks

Voice Input

ID Flow U I R E C
14.1 Tap mic to start
14.4 Stop recording
14.6 Transcription complete
14.11 No Whisper model
14.17 Download Whisper model

Vision Models

ID Flow U I R E C
12.1 Load vision model
12.2 Attach image from gallery
12.6 Send with image
12.8 Generate about image

Projects

ID Flow U I R E C
17.1 View projects list
17.3 Create new project
17.9 Edit project
17.10 Delete project
17.13 Start chat from project

Gallery

ID Flow U I R E C
18.1 View gallery
18.11 Tap image
18.19 Delete single image
18.21 Multi-select mode
18.24 Delete selected

Settings

ID Flow U I R E C
19.9-19.14 Generation settings sliders
19.19 GPU toggle
19.22 Loading strategy toggle
19.36 All settings persist

Error Handling

ID Flow U I R E C
25.1 No internet - model browse
25.2 No internet - download
25.8 Corrupt model file
25.9 Model load OOM
25.12 Generation crash
25.16 Storage full

P2 - Nice-to-Have Flows

These are edge cases, polish, and less critical features.

Permissions

ID Flow U I R E C
3.1-3.8 All permission flows

Home Screen

ID Flow U I R E C
4.1-4.11 All home screen flows

Model Management Polish

ID Flow U I R E C
5.2 Model list loading state
5.3 Model list error state
5.10 Multiple filters combined
5.12 Empty filter results
5.14 Pagination / infinite scroll
5.26 Download ETA
5.32 Multiple concurrent downloads
5.47 Scan for untracked models

Image Model Management

ID Flow U I R E C
6.1-6.10 All image model flows

Model Loading Polish

ID Flow U I R E C
8.2 Loading progress indicator
8.5-8.7 Load failure handling
8.16-8.17 Memory warnings
8.20-8.21 GPU loading
8.22 Model session caching

Generation Polish

ID Flow U I R E C
9.5 Streaming performance
9.8-9.10 Generation metadata
9.16 Delete messages after
9.20-9.21 Context handling
9.23-9.27 Settings impact
9.34-9.36 Output rendering
9.38-9.41 Metadata display

Intent Classification Extended

ID Flow U I R E C
10.9 Ambiguous prompt - LLM mode
10.10-10.12 Classifier model handling
10.13-10.14 Intent cache
10.16-10.17 Pattern edge cases

Image Generation Polish

ID Flow U I R E C
11.5-11.6 Preview updates
11.12 Partial image on cancel
11.15-11.24 Generation parameters
11.25-11.28 Backend selection
11.29-11.32 Generation errors
11.33-11.35 Image metadata

Document Attachments

ID Flow U I R E C
13.1-13.9 All document flows

Voice Input Extended

ID Flow U I R E C
14.2-14.3 Recording indicators
14.7 Partial transcription
14.8-14.10 Recording edge cases
14.18-14.22 Whisper model management

Conversations Extended

ID Flow U I R E C
15.3-15.5 Conversation metadata
15.6-15.9 Delete handling
15.10-15.12 List display
15.16 Search conversations
15.20 Scroll position restored
15.21-15.23 Project integration

Chat UI Extended

ID Flow U I R E C
16.1-16.11 Message display
16.13 Multiline input
16.16-16.17 Disabled states
16.19-16.22 Input edge cases
16.23-16.24 Keyboard handling
16.25-16.30 Scrolling behavior
16.31-16.35 Chat states
16.36-16.42 Attachments UI

Projects Extended

ID Flow U I R E C
17.2 Default projects
17.4-17.8 Project form
17.11-17.12 Delete/duplicate
17.14-17.16 Project details

Gallery Extended

ID Flow U I R E C
18.2-18.7 Gallery display
18.8-18.10 Gallery filtering
18.12-18.18 Image actions
18.20-18.29 Delete flows

Settings Extended

ID Flow U I R E C
19.1-19.8 Settings navigation
19.15-19.18 Context/thread settings
19.20-19.21 GPU settings
19.23-19.24 Reset/save
19.25-19.31 Image settings
19.32-19.35 Intent settings
19.37-19.38 Settings sync

Storage & Device Info

ID Flow U I R E C
20.1-20.11 Storage settings
21.1-21.11 Device info
22.1-22.12 Download manager

App Lifecycle Extended

ID Flow U I R E C
23.1-23.4 Background handling
23.6-23.7 State restoration
23.9-23.12 Recovery edge cases

Navigation

ID Flow U I R E C
24.1-24.8 All navigation flows

Error Handling Extended

ID Flow U I R E C
25.3-25.7 Network errors
25.10-25.11 Model errors
25.13-25.15 Generation errors
25.17-25.23 Platform errors

Edge Cases & Stress

ID Flow U I R E C
26.1-26.7 Input edge cases
26.8-26.13 Scale edge cases
26.14-26.19 Concurrent operations
26.20-26.22 Memory pressure

Summary by Testing Layer

Layer P0 Flows P1 Flows P2 Flows Total
Unit (U) 90 115 120 325
Integration (I) 25 40 35 100
RNTL (R) 35 70 150 255
E2E (E) 25 30 15 70
Contract (C) 8 12 5 25

✅ COMPLETED Implementation (Current State)

✓ Phase 1: P0 Unit + Contract Tests (COMPLETE)

Fast tests that catch regressions.

  • ✅ Store mutations (chatStore, appStore, authStore) - Excellent
  • ✅ Intent classifier patterns (all 70+ patterns) - Complete
  • ✅ Generation service state machine - Thorough
  • ✅ Native module contracts (llama.rn, whisper.rn, LocalDream) - Complete
  • ✅ Image generation integration tests - Comprehensive
  • ✅ Active model service integration - Complete

✓ Phase 2: P0 RNTL Tests (COMPLETE)

Test critical screens respond correctly to state.

  • ✅ ChatScreen (send, streaming, stop, input)
  • ✅ ModelsScreen (basic rendering, list)
  • ✅ HomeScreen (model status, actions)
  • ✅ ModelCard component

✓ Phase 3: P0 E2E Tests (COMPLETE - 12 Maestro Flows)

Test full flows on device with real models.

  • 01-app-launch.yaml - App startup and initialization
  • 02-text-generation.yaml - Full text generation cycle
  • 03-stop-generation.yaml - Stop generation mid-stream
  • 04-image-generation.yaml - Full image generation with auto-download
  • 05a-model-uninstall.yaml - Model deletion
  • 05b-model-download.yaml - Model download (5min timeout)
  • 05b-model-selection.yaml - Model switching
  • 05c-model-unload.yaml - Model unloading
  • 06a-document-attachment.yaml - Document attachment
  • 06b-image-attachment.yaml - Image attachment
  • 06c-text-generation-full.yaml - Full text generation with attachments
  • 06d-text-generation-retry.yaml - Retry/regenerate text generation
  • 07a-image-model-uninstall.yaml - Image model deletion
  • 07b-image-model-download.yaml - Image model download
  • 07c-image-model-set-active.yaml - Image model activation
  • 00-setup-model.yaml - Model setup utility

Status: All critical P0 flows have E2E coverage

✓ Phase 4: Critical Service Unit Tests (COMPLETE - 228 tests)

Unit tests for all previously-untested core services.

  • llm.ts - 45 tests: model loading with GPU/CPU fallback, multimodal init, streaming generation, context window management, stop/clear, tokenization, performance stats
  • hardware.ts - 39 tests: device info caching, memory calculations, model recommendations, tier classification, byte formatting, model total size
  • modelManager.ts - 54 tests: download lifecycle, cancel/delete, storage tracking, orphan detection, credibility determination, background downloads, sync, untracked model scanning
  • backgroundDownloadService.ts - 28 tests: platform availability, native module delegation, event listener registration/dispatch, polling lifecycle, cleanup
  • whisperService.ts - 32 tests: model download/load/unload, permissions, real-time transcription, file transcription, state management
  • documentService.ts - 30 tests: file type detection, reading, truncation, formatting, preview, supported extensions

Status: All P0 service logic gaps are closed

✓ Phase 5: iOS Parity Tests (COMPLETE - 77 tests)

Cross-platform tests ensuring iOS and Android feature parity.

  • localDreamGenerator.ts - 43 tests: Platform.select() routing to LocalDreamModule (Android) / CoreMLDiffusionModule (iOS), method delegation on both platforms, isAvailable edge cases, generateImage lifecycle with event subscription, thread tracking, error handling
  • coreMLModelBrowser.ts - 16 tests: HuggingFace API tree enumeration for Apple repos, model shape/backend validation, LFS size calculation, caching with TTL + forceRefresh, Promise.allSettled partial failure handling, unique ID generation
  • iosDownloadManager.contract.ts - 18 tests: iOS DownloadManagerModule (URLSession) interface parity with Android (DownloadManager), all 7 required methods, event shape parity (DownloadProgress/Complete/Error), polling compatibility stubs, status value constants
  • factories.ts - Updated ONNXImageModel backend type to include 'coreml'

Status: iOS/Android parity is verified at the service and contract level


REMAINING GAPS - What Could Be Added Next

Phase 6: P1 E2E Flows

Add important feature E2E tests (currently P1 directory is empty).

  1. Authentication/passphrase flows
  2. Background download recovery (app killed → resume)
  3. Vision model inference
  4. Voice transcription
  5. Project-based conversations
  6. iOS-specific E2E flows (Core ML image generation, URLSession downloads)

Phase 7: P2 Flows

Polish and edge cases as time permits.


Test Count Summary

Phase Unit Integration RNTL E2E Contract Status
1-3 (Stores, Services, RNTL, E2E) 150 25 50 12 80 ✅ DONE
4 (Service Unit Tests) 228 0 0 0 0 ✅ DONE
5 (iOS Parity Tests) 59 0 0 0 18 ✅ DONE
6 (P1 E2E) 0 0 70 20 0 📝 PLANNED
7 (P2 Polish) 120 35 135 10 0 📝 PLANNED
Current 437 25 50 12 98 1208 tests
Target 557 60 255 42 98 ~1500 tests

Current Coverage: ~90% of P0 functionality, ~65% of all flows


📋 Quick Reference: Test File Locations

All Tests

__tests__/
├── unit/
│   ├── stores/
│   │   ├── appStore.test.ts ✅ (564 lines, comprehensive)
│   │   ├── chatStore.test.ts ✅ (606 lines, comprehensive)
│   │   └── authStore.test.ts ✅
│   └── services/
│       ├── generationService.test.ts ✅ (552 lines, thorough)
│       ├── intentClassifier.test.ts ✅ (all 70+ patterns)
│       ├── llm.test.ts ✅ (45 tests - model loading, generation, context)
│       ├── hardware.test.ts ✅ (39 tests - memory, recommendations, tiers)
│       ├── modelManager.test.ts ✅ (54 tests - downloads, storage, scanning)
│       ├── backgroundDownloadService.test.ts ✅ (28 tests - native events, polling)
│       ├── localDreamGenerator.test.ts ✅ (43 tests - platform routing, iOS/Android delegation)
│       ├── coreMLModelBrowser.test.ts ✅ (16 tests - model discovery, caching, errors)
│       ├── whisperService.test.ts ✅ (32 tests - transcription, permissions)
│       └── documentService.test.ts ✅ (30 tests - file types, reading, preview)
├── integration/
│   ├── models/
│   │   └── activeModelService.test.ts ✅ (561 lines, excellent)
│   └── generation/
│       └── imageGenerationFlow.test.ts ✅ (516 lines, comprehensive)
├── rntl/
│   ├── screens/
│   │   ├── ChatScreen.test.tsx ✅
│   │   ├── HomeScreen.test.tsx ✅
│   │   └── ModelsScreen.test.tsx ✅
│   └── components/
│       ├── ModelCard.test.tsx ✅
│       ├── ChatInput.test.tsx ✅
│       └── ChatMessage.test.tsx ✅
└── contracts/
    ├── llamaContext.contract.test.ts ✅ (375 lines)
    ├── whisper.contract.test.ts ✅
    ├── localDream.contract.test.ts ✅
    ├── coreMLDiffusion.contract.test.ts ✅ (iOS Core ML parity)
    └── iosDownloadManager.contract.test.ts ✅ (18 tests - iOS download parity)

.maestro/flows/p0/ (16 E2E tests) ✅
├── 00-setup-model.yaml
├── 01-app-launch.yaml
├── 02-text-generation.yaml
├── 03-stop-generation.yaml
├── 04-image-generation.yaml
├── 05a-model-uninstall.yaml
├── 05b-model-download.yaml
├── 05b-model-selection.yaml
├── 05c-model-unload.yaml
├── 06a-document-attachment.yaml
├── 06b-image-attachment.yaml
├── 06c-text-generation-full.yaml
├── 06d-text-generation-retry.yaml
├── 07a-image-model-uninstall.yaml
├── 07b-image-model-download.yaml
└── 07c-image-model-set-active.yaml

Planned Tests (Not Yet Created)

.maestro/flows/
├── p1/ 📝 (planned - auth, vision, voice E2E)
└── p2/ 📝 (planned - edge cases, polish)

Bottom Line

What's Great:

  • ✅ P0 E2E coverage is excellent (16 comprehensive Maestro flows)
  • ✅ State management is thoroughly tested
  • ✅ Service orchestration (generationService, imageGenerationService) is well tested
  • ✅ Contract tests validate native module interfaces (llama.rn, whisper.rn, LocalDream, CoreMLDiffusion, iOS DownloadManager)
  • ✅ Critical user journeys work end-to-end
  • ✅ All 6 core services now have comprehensive unit tests (228 tests)
  • ✅ iOS/Android parity verified at service and contract level (77 tests)
  • ✅ Platform routing (Platform.select) tested for both platforms
  • ✅ Service logic is protected against regressions — safe to refactor

What Could Be Better:

  • 📝 P1/P2 E2E flows (authentication, vision, voice, background recovery)
  • 📝 iOS-specific E2E flows (Core ML generation, URLSession downloads on device)
  • 📝 Performance regression tests
  • 📝 Stress/scale tests

Recommendation: iOS parity is verified at the unit/contract level. Next priority is adding P1 E2E flows for authentication, vision, voice, and iOS-specific device tests.