Offline AI assistant for Android. Run LLMs, generate images, search documents — all on-device. No cloud. No subscriptions. No data leaves your phone.
Download APK · Discord · Report Issue
- Text generation — Load any GGUF model (Llama, Mistral, Gemma, Phi, Qwen, etc.) and chat with it locally
- Image generation — Stable Diffusion 1.5 on-device, with inpainting support
- Image tools — Upscale and segment images locally (depth, style transfer, inpainting coming soon)
- RAG — Inject PDFs, Word docs, Excel, EPUB into conversations with semantic search
- Plugins — Web search, file manager, calculator, notepad, date/time, system info, dev utils — all callable by the LLM
- AI memory — The AI remembers facts about you across conversations, with deduplication and a forgetting curve
- Text-to-speech — 10 voices, 5 languages, on-device synthesis
- Encrypted storage — AES-256-GCM with hardware-backed keys for all chat data
- System backup — Export everything as an encrypted
.tnbackupfile
| Minimum | Recommended | |
|---|---|---|
| Android | 10 (API 29) | 12+ |
| RAM | 6 GB | 8–12 GB |
| Storage | 4 GB free | 10 GB free |
| CPU | ARM64 or x86_64 | Snapdragon 8 Gen 1+ |
Google Play or GitHub Releases.
From the in-app Model Store (recommended):
- Open the drawer menu → Model Store
- Add a HuggingFace repository (e.g.
bartowski/Phi-3.5-mini-instruct-GGUF) - Pick a quantization and download
Or manually:
- Download a
.gguffile from HuggingFace - Use the model picker in ToolNeuron to load it
Select your model, wait for it to load, start typing. Responses stream in real-time.
| Use case | Model | Size |
|---|---|---|
| Quick test | Qwen3.5 0.8B Q4_K_M | ~600 MB |
| General use | Qwen3.5 4B Q4_K_M | ~2.8 GB |
| Power users | Qwen3.5 9B Q4_K_M | ~5.5 GB |
Pick Q4_K_M for a good balance between quality and size. Use Q6_K if your device has the RAM for it.
- Any GGUF model works — load via file picker (no storage permissions needed, uses SAF)
- Configurable parameters: temperature, top-k, top-p, min-p, repeat penalty, context length
- Function calling with grammar-constrained JSON output
- Thinking mode for models that support it
- Per-model configs saved to database
- Stable Diffusion 1.5 (censored and uncensored variants)
- Text-to-image and inpainting
- Configurable steps, CFG scale, seed, negative prompts, schedulers
| Tool | Status |
|---|---|
| Upscaling | Ready |
| Segmentation (MobileSAM) | Ready |
| Depth estimation | Model pending |
| Style transfer | Model pending |
| LaMa inpainting | Model pending |
Create knowledge bases from:
- Files — PDF, Word (.doc/.docx), Excel (.xls/.xlsx), EPUB, TXT
- Text — Paste any text content
- Chat history — Convert past conversations into searchable knowledge
- Neuron Packets — Import encrypted
.neuronRAG files
The RAG pipeline uses hybrid retrieval: FTS4 BM25 + vector search + Reciprocal Rank Fusion + Maximal Marginal Relevance. Results are injected into the conversation context automatically.
Encrypted RAGs support admin passwords and read-only user access.
7 built-in plugins the LLM can call during conversations:
| Plugin | What it does |
|---|---|
| Web Search | Search the web and scrape content |
| File Manager | List, read, create files |
| Calculator | Math expressions and unit conversion |
| Notepad | Save and retrieve notes |
| Date & Time | Current time, timezone conversion, date math |
| System Info | RAM, battery, storage, device details |
| Dev Utils | Hash, encode, format, text transforms |
Inspired by Mem0. After conversations, the LLM extracts facts about you and stores them for future context. Deduplication via Jaccard similarity, with a forgetting curve so stale memories decay. You can view, edit, and delete memories from the Memory screen.
On-device TTS via Supertonic (ONNX Runtime). 10 voices (5 female, 5 male), 5 languages (EN, KR, ES, PT, FR). Adjustable speed and quality. Auto-speak option reads responses aloud.
Auto-detects CPU topology (P-cores, E-cores) and recommends thread count, context size, and cache settings. Three modes: Performance, Balanced, Power Saver.
Export everything to an encrypted .tnbackup file (PBKDF2 + AES-256-GCM):
- Chat history, AI memories, personas, knowledge graphs
- Model configs and app settings
- RAG files and AI models (optional, can be large)
- Zero data collection. No telemetry, no analytics, no crash reporting.
- Everything stays on-device. Conversations, generated images, documents, TTS audio — none of it leaves your phone.
- Encrypted storage. AES-256-GCM with Android KeyStore. On supported devices, keys live in the Trusted Execution Environment.
- No storage permissions. Models load through Android's file picker (SAF). The app can't access arbitrary files.
- Open source. Read the code yourself.
- Android Studio Meerkat (2025.1.1)+
- JDK 17
- Android SDK 36+, NDK 26.x
git clone https://github.com/Siddhesh2377/ToolNeuron.git
cd ToolNeuron
# Debug
./gradlew assembleDebug
./gradlew installDebug
# Release
./gradlew assembleReleaseAPKs land in app/build/outputs/apk/.
If you hit NDK issues, make sure NDK 26.x is installed via SDK Manager. For memory issues during build, bump the Gradle heap in gradle.properties:
org.gradle.jvmargs=-Xmx4096m| Layer | Technology |
|---|---|
| Language | Kotlin, C++ (JNI) |
| UI | Jetpack Compose |
| Text inference | llama.cpp |
| Image inference | LocalDream (SD 1.5) |
| TTS | Supertonic (ONNX Runtime) |
| Database | Room + UMS (custom binary format) |
| Encryption | AES-256-GCM, Android KeyStore |
| DI | Dagger Hilt |
| Async | Kotlin Coroutines + Flow |
| Module | Purpose |
|---|---|
app |
Main Android application |
ums |
Unified Memory System — binary record storage with JNI |
neuron-packet |
Encrypted RAG packet format with access control |
memory-vault |
Legacy encrypted storage (read-only, used for migration) |
system_encryptor |
Native encryption primitives |
file_ops |
Native file operations |
See CONTRIBUTORS.md for the project ecosystem and related repos.
- Fork the repo
- Create a feature branch:
git checkout -b feature/your-feature - Make focused commits with clear messages
- Test on a real device — emulators don't reflect real performance
- Open a PR with a description of what you changed and how you tested it
- Bug fixes and stability
- Device compatibility testing (especially mid-range phones)
- Performance improvements
- Documentation and translations
- New plugins
- Don't submit untested code
- Don't add cloud dependencies or telemetry
- Don't break offline functionality
- Don't add broad storage permissions
If you find a security vulnerability:
- Do not open a public GitHub issue
- Email siddheshsonar2377@gmail.com
- Include reproduction steps
- Allow reasonable time for a fix before disclosure
- llama.cpp — LLM inference engine
- LocalDream — Stable Diffusion on Android
- ONNX Runtime — TTS inference
- Mem0 — AI memory architecture inspiration
- SillyTavern — Character card format reference
- Apache POI, PDFBox-Android, EpubLib — Document parsing
- Jetpack Compose, Room, Hilt, OkHttp, Coil 3, Jsoup
Apache License 2.0 — use it, modify it, distribute it. Attribution appreciated.
Built by Siddhesh Sonar