NickStr11 · NickStr11 · Feb 26, 2026 · Feb 26, 2026
diff --git a/research/personal-style-ai.md b/research/personal-style-ai.md
@@ -0,0 +1,76 @@
+# Research: PersonalStyleAI Framework
+
+Evaluation of the [PersonalStyleAI-Framework-](https://github.com/whyzhow/PersonalStyleAI-Framework-) for implementation in **Personal OS v2**.
+
+## 1. Pipeline Analysis
+
+The framework follows a modular "Data Alchemy -> Adaptor -> Evolution" pipeline designed to capture and reproduce personal communication styles.
+
+### A. Log Processing (Data Alchemy)
+- **Tool**: `src/utils/cleaner.py` and `preprocess_data.py`.
+- **Method**: Uses regular expressions to strip noise (URLs, system messages like `[图片]`, `[表情]`, and excessive whitespace).
+- **Format**: Currently assumes a simple alternating chat format (odd lines = user, even lines = personal response).
+- **Output**: Generates a standard OpenAI ChatML JSONL format:
+  ```json
+  {"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}
+  ```
+
+### B. Adaptation Layer (Adaptor Centre)
+- **Tool**: `src/core/factory.py` and `src/core/providers/`.
+- **Method**: A factory pattern provides a unified interface (`LLMProvider`) for both local (Ollama) and cloud (OpenAI, Claude) models.
+- **Benefit**: Allows easy switching between style-infused local models and high-reasoning cloud models via a simple configuration change.
+
+### C. Fine-tuning (Style Evolution)
+- **Tool**: `Fine-tuning module (src/trainers/` (specifically the `LocalLoraTrainer` class).
+- **Method**: Uses `peft` for LoRA (Low-Rank Adaptation).
+- **Optimization**: Implements 4-bit quantization via `bitsandbytes` and `accelerate` for memory-efficient training on consumer hardware.
+- **Weights**: Produces a "style adapter" (PEFT weights) that can be loaded onto a base model (e.g., Llama 3).
+
+---
+
+## 2. Applicability to Obsidian Diary (600+ entries)
+
+The framework is highly applicable to the goal of "teaching an AI to think like you" using an Obsidian diary, though the pre-processing layer requires customization.
+
+### A. Data Strategy
+With 600+ entries, there is sufficient signal to capture linguistic nuances, recurring themes, and personal philosophy.
+
+- **Obsidian Parser**: Needs a custom script to handle Markdown (stripping YAML frontmatter, wikilinks, and callouts).
+- **Synthesizing Conversation Pairs**: Since diaries are monologues, training data should be generated by:
+  1. **Self-Questioning**: Using a high-reasoning model (like GPT-4o) to generate relevant "questions" or "prompts" based on the content of each diary entry.
+  2. **Topic Extraction**: Using the entry's title or tags as the `user` prompt.
+  3. **Contextual Continuity**: Training the model to "complete" an entry or "reflect" on a past event.
+
+### B. Suggested Conversion Pipeline
+1. Parse `.md` files in `Obsidian/Diary/`.
+2. Extract text, cleaning Markdown syntax.
+3. For each entry, generate a "System/User" context (e.g., "Reflect on [Topic] from [Date]").
+4. Set the diary text as the "Assistant" response.
+5. Compile into the framework's JSONL format.
+
+---
+
+## 3. Supported Models & Hardware Requirements
+
+### A. Supported Models
+- **Local (Fine-tunable)**: Any Causal LM supported by Hugging Face `transformers`.
+  - **Llama 3 (8B)**: Explicitly mentioned and recommended for style transfer.
+  - **Mistral/Qwen**: Compatible via the PEFT/LoRA trainer.
+- **Cloud (API)**: OpenAI (GPT-4), Claude (v3+), and any Ollama-compatible local endpoint.
+
+### B. Hardware Requirements
+- **Training (Fine-tuning)**:
+  - **GPU**: NVIDIA CUDA-capable GPU is mandatory.
+  - **VRAM**: Due to 4-bit quantization (`load_in_4bit=True`), training a 7B-8B parameter model (like Llama 3) typically requires:
+    - **Minimum**: 10GB-12GB VRAM (e.g., RTX 3060 12GB).
+    - **Recommended**: 16GB+ VRAM (e.g., RTX 3080 16GB / 4060 Ti 16GB) for larger context windows or faster training.
+- **Inference**: Can run on much lower specs (CPU/GPU) via Ollama or quantization.
+
+## Conclusion
+
+PersonalStyleAI-Framework is a **strong foundation for Personal OS v2**. Its strength lies in its "Style Evolution" (LoRA) module, which can turn a raw Obsidian diary into a persona-aligned digital twin.
+
+**Next Steps for Cortex Integration:**
+1. Develop an `obsidian-to-style-ai` parser in `tools/`.
+2. Integrate the LoRA trainer into a automated "Style-Update" workflow.
+3. Use the `Adaptor Centre` to provide a consistent persona across different LLM backends.