TacitNode is a hybrid edge-to-cloud AI copilot designed to assist industrial field technicians with real-time equipment diagnostics. Built for the Google DeepMind x Cactus Compute Hackathon, it demonstrates intelligent routing between on-device inference and cloud escalation, achieving 3x cost savings and 26x faster response times for routine queries.
Because field technicians often operate in environments with poor or non-existent internet connectivity, TacitNode's core requirement is to run a 100% local AI pipeline for routine tasks, resorting to cloud escalation only for complex diagnostics that require deeper reasoning.
The application is built using Flutter and leverages the Cactus Compute SDK to run large language models (LLMs) and vision-language models (VLMs) natively on Android and iOS devices (tested on Samsung Galaxy S25 Ultra).
Demo Video: Watch the full demo
TacitNode employs a sophisticated Dual-Model Architecture with intelligent routing to balance performance, cost, and capability:
-
Tier 1: The Routing Model (
functiongemma-270m)- Role: Acts as the "brain" and orchestrator. A text-only model optimized specifically for function calling.
- Performance: ~168 tok/s, ~45ms latency
- Behavior: Analyzes user queries and makes routing decisions via tool calls:
validate_routine_step→ Triggers local vision model for component identificationescalate_to_expert→ Routes to cloud for complex diagnosticsanswer_query→ Provides direct responses for simple questions
-
Tier 2: The Vision Model (
lfm2-vl-450m)- Role: Acts as the "eyes". A lightweight (450M parameter) vision-language model by Liquid AI.
- Performance: ~12-15 tok/s for vision inference
- Behavior: Triggered only when routing model calls
validate_routine_stepwithcomponent_name: "unknown". Analyzes camera frames and returns component identification.
-
Tier 3: The Cloud Fallback (Gemini 2.5 Flash API)
- Role: Acts as the "expert consultant".
- Performance: ~1.2s latency, ~$0.0000875 per query
- Behavior: Triggered via
escalate_to_experttool or as automatic fallback when local models fail.
- User: "What do you see?" (points camera at an LED)
- App: Captures photo and saves to temporary file path
- Routing Model: Recognizes identification request, outputs
validate_routine_step({"component_name": "unknown"}) - Copilot Service: Intercepts
"unknown"placeholder, hands image file path to Vision Model - Vision Model: Analyzes image, returns:
"LED. A red light-emitting diode." - App: Displays result with green "LOCAL INFERENCE" badge, showing 45ms latency and 168 tok/s
- Metrics Service: Records query, updates cost savings calculation
The routing decision is made based on:
- Query intent analysis by FunctionGemma
- Keyword detection for visual queries ("what", "identify", "see")
- Automatic fallback if local inference fails
- Offline detection forces local-only mode
-
lib/services/copilot_service.dart(870+ lines)- The core orchestrator managing both Cactus SDK model instances
- Implements the 1-turn architecture with tool interception
- Handles model lifecycle, tool definitions, system prompts
- Contains fallback logic and error recovery mechanisms
-
lib/services/cloud_service.dart- Gemini 2.5 Flash API integration with retry logic
- Handles rate limiting (HTTP 429) with exponential backoff
- Supports both image+text and text-only escalations
- Concise prompt engineering for 2-3 sentence responses
-
lib/services/camera_service.dart- Device camera lifecycle management
- Dual capture: file paths (for local VLMs) and Base64 (for cloud APIs)
- Frame caching and cleanup
-
lib/services/metrics_service.dart- Session-wide statistics tracking
- Cost calculation (Gemini 2.5 Flash pricing: $0.125/$0.50 per 1M tokens)
- Real-time savings computation
- Detailed logging for debugging
-
lib/services/connectivity_service.dart- Network status monitoring via
connectivity_plus - Offline mode simulation for demos
- Stream-based connectivity updates
- Network status monitoring via
-
lib/screens/copilot_screen.dart- Full-screen camera preview with glassmorphism overlays
- Orchestrates all UI widgets and state management
- Handles query processing and response display
- Manages FAB positioning relative to debug console
-
lib/widgets/routing_indicator.dart- Animated pulse indicators (green for local, amber for cloud)
- Smooth transitions between routing states
- Visual feedback during inference
-
lib/widgets/metrics_overlay.dart- Session statistics dashboard
- Cost comparison (cloud-only vs hybrid)
- Displays with 5 decimal precision for accuracy
- Collapsible card with glassmorphism
-
lib/widgets/demo_controls_fab.dart- Expandable FAB with staggered animations
- Three demo presets: Quick ID, Diagnose, Offline Test
- Metrics toggle and reset controls
- Smooth expand/collapse with rotation animation
-
lib/widgets/debug_console.dart- Enhanced JSON viewer with syntax highlighting
- Collapsible entries (120px collapsed, 336px expanded)
- Filter chips (All, Routing, Warnings, Errors)
- Full observability of routing decisions
-
lib/widgets/offline_banner.dart- Displays when offline or simulating offline mode
- Tap-to-disable for simulated offline mode
- Clear visual indicator of network status
-
lib/models/routing_decision.dart- Enhanced with performance metrics (latency, TPS, cost)
- Routing path tracking
- Offline query detection
- Formatted display helpers
-
lib/models/session_metrics.dart- Cumulative statistics (local/cloud/offline query counts)
- Cost calculations with Gemini 2.5 Flash pricing
- Average latency tracking
- Savings percentage computation
-
lib/models/demo_preset.dart- Predefined demo scenarios
- Query templates with offline simulation flags
- Color-coded for visual distinction
- Problem: Initial use of
qwen3-0.6produced inconsistent JSON, wrapping tool calls in conversational text - Solution: Switched to
functiongemma-270m(FunctionGemma), a model fine-tuned specifically for tool calling - Result: Immediate elimination of JSON parsing errors
- Problem: Used
gemma3-270mwhich doesn't exist in Cactus registry - Solution: Corrected to
functiongemma-270m(the official Cactus slug) - Result: Model loaded successfully
- Problem: Passed images to FunctionGemma (text-only model), causing inference failures
- Solution: Split pipeline - FunctionGemma processes only text, vision model gets images separately
- Result: Stable routing decisions, no more crashes
- Problem: Model guessed component names without seeing images
- Solution:
- Made tool arguments
required: true - Added explicit instruction: "If asked to identify, set component_name to 'unknown'"
- Implemented interception logic for
"unknown"placeholder
- Made tool arguments
- Result: Reliable vision model triggering
- Problem: Model reused previous answers instead of analyzing new images
- Solution: Added
_lm.reset()at start of eachprocessQuery()call - Result: Every query treated as fresh interaction
- Problem:
lfm2-vl-450mproduced overly specific, incorrect identifications - Solution:
- Narrowed prompt to: "What electronic component, PCB, or circuit board is this?"
- Added sanitization to remove model tokens (
<|im_end|>,<|im_start|>)
- Result: Accurate, domain-specific identifications
- Problem: Used wrong API version (
v1beta) and model name - Solution:
- Changed to
v1API endpoint - Updated model from
gemini-2.0-flashtogemini-2.5-flash - Verified available models via API
- Changed to
- Result: Successful cloud escalations
- Problem: Gemini returned lengthy responses that got truncated in UI
- Solution:
- Updated prompt: "Provide CONCISE, actionable diagnosis in 2-3 sentences max"
- Reduced
maxOutputTokensfrom 1024 to 300
- Result: Concise, actionable responses that fit in UI
- Problem: Rapid queries hit Gemini API rate limits
- Solution: Implemented exponential backoff with retry delay parsing from error response
- Result: Graceful handling of rate limits
- Problem: Metrics showed
${estimatedCost!.toStringAsFixed(4)}literally - Solution: Fixed string interpolation:
'\$${estimatedCost!.toStringAsFixed(4)}' - Result: Correct cost display
- Problem: 4 decimal places caused rounding errors ($0.00035 showed as $0.0003)
- Solution: Increased precision to 5 decimal places in metrics overlay
- Result: Accurate cost display matching logs
- Problem: Demo controls FAB covered input when debug console expanded
- Solution:
- Positioned FAB relative to console height using
AnimatedPositioned - Calculated:
bottom = (consoleHeight) + inputBarHeight + margin - Collapsed: 136px + 80px + 20px = 236px
- Expanded: 336px + 80px + 20px = 436px
- Positioned FAB relative to console height using
- Result: Smooth animation, no overlap
- Problem: Rounded corners showed camera background through gaps
- Solution:
- Moved console up 16px using
Transform.translate - Increased console heights by 16px to reach screen bottom
- Extended input bar bottom padding to 24px
- Moved console up 16px using
- Result: Seamless visual integration
- Problem: Metrics card partially visible when closed
- Solution: Adjusted hidden position to
top: -400, right: -300 - Result: Completely off-screen when closed
- Problem: Both FAB menu and metrics could be open simultaneously
- Solution:
- Added state tracking for FAB expansion
- Implemented mutual exclusion logic
- Each closes the other when opened
- Result: Clean, focused UI
- Problem: No way to exit simulated offline mode
- Solution:
- Made offline banner tappable
- Added "Tap to disable" hint
- Calls
toggleOfflineSimulation()on tap
- Result: Easy offline mode control
- Problem: Offline queries not distinguished in metrics
- Solution: Added
offlineQueriescounter to session metrics - Result: Accurate offline usage tracking
- Problem: 640x640 source image produced compressed splash screens
- Solution:
- Upscaled to 2048x2048 using
sips - Added 20% padding (2560x2560 canvas) for better composition
- Regenerated all density variants
- Upscaled to 2048x2048 using
- Result: Crisp, high-quality splash screens
- Problem: Old placeholder icons still in use
- Solution:
- Updated
flutter_launcher_iconsconfig with adaptive icon settings - Set background color to
#0F0F23 - Regenerated all icon densities
- Updated
- Result: Consistent branding across all platforms
Decision: Use 1-turn tool interception instead of multi-turn agent loops
Rationale:
- Small models struggle with state management across turns
- Interception is instant (no second LLM call needed)
- Eliminates hallucination from accumulated context
- 26x faster than cloud, 10x faster than multi-turn
Decision: Call _lm.reset() before every query
Rationale:
- Prevents context leakage between queries
- Ensures consistent routing behavior
- Treats each query as isolated interaction
- Eliminates "memory" hallucinations
Decision: Capture frames as both file paths and Base64
Rationale:
- Local VLMs require file paths (Cactus SDK limitation)
- Cloud APIs require Base64 encoding
- Minimal overhead, maximum flexibility
Decision: Use TLog.info() for metrics instead of print()
Rationale:
- Consistent with app's logging framework
- Appears in same stream as other logs
- Easier to filter and debug
- Production-ready approach
- Routing latency: ~45ms
- Vision inference: ~80-120ms (12-15 tok/s)
- Total time: ~165ms for complete identification
- Cost: $0.00
- RAM usage: ~245 MB
- Offline capable: ✅ Yes
- Network latency: ~1,200ms
- Cost: ~$0.0000875 per query
- Offline capable: ❌ No
- Response quality: Higher for complex diagnostics
- Cost savings: 50% (with 50/50 local/cloud split)
- Average latency: ~682ms (50/50 split)
- Typical usage: 67% local, 33% cloud
- Actual savings: 3x cost reduction vs pure cloud
- Multi-turn conversations: Add chat history for follow-up questions
- Hands-free operation: Implement TTS/STT for voice control
- Enhanced error recovery: More sophisticated fallback strategies
- Model caching: Reduce cold-start time
- Additional models: Test Qwen 1.5B, DeepSeek-R1-Distill
- Grammar constraints: JSON-schema enforcement at C++ level
- Streaming responses: Real-time token display
- Batch processing: Multiple component identification
- Fine-tuned routing model: Domain-specific FunctionGemma
- Federated learning: Improve models from field usage
- Multi-modal fusion: Combine vision, thermal, audio sensors
- Edge deployment: Optimize for lower-end devices
- FunctionGemma: Reliable tool calling with proper prompting
- 1-turn architecture: Fast, predictable, easy to debug
- Explicit resets: Eliminated context leakage issues
- Dual-model split: Clear separation of concerns
- Demo presets: Made demos reliable and repeatable
- Small model constraints: Required strict prompting and domain grounding
- C++ engine quirks: Incomplete history clearing, occasional crashes
- Mobile memory limits: Careful model selection required
- API versioning: Gemini API changes required adaptation
- UI polish: Many iterations to get animations smooth
- Small models need boundaries: Explicit instructions > implicit reasoning
- Reset is critical: Don't trust SDK to clear state completely
- Interception > Loops: For small models, simpler is faster
- Domain grounding: Narrow prompts produce better results
- Fallbacks are essential: Always have a backup plan
- Real-world problem: Addresses actual industrial training gap
- Hybrid architecture: Demonstrates intelligent routing
- Production-ready: Robust error handling, offline support
- Demo-optimized: One-tap presets, visual feedback, metrics
- Technical depth: Full observability, detailed logging
- 3x cost savings vs pure cloud
- 26x faster for local queries
- 100% offline capable for routine tasks
- Real-time performance metrics
- Intelligent routing with automatic fallback
- Show local inference speed (Quick ID preset)
- Demonstrate cloud escalation (Diagnose preset)
- Prove offline capability (Offline Test preset)
- Display metrics dashboard (cost savings)
- Show debug console (technical depth)
- Cactus Compute SDK Documentation
- FunctionGemma Model Card
- Gemini API Documentation
- Flutter Camera Plugin
- Connectivity Plus
Built for the Google DeepMind x Cactus Compute Hackathon
Demonstrating the future of hybrid edge-to-cloud AI systems.