This repository is the official repository of "HapRepair: Learn to Repair OpenHarmony Apps".
This is an automated ArkTS code defect repair system based on large language models. The system uses Retrieval-Augmented Generation (RAG) technology combined with multiple large language models to detect and fix performance defects in ArkTS code.
Here's the framework of the system:
- Code defect detection
- RAG-based code repair suggestion generation
- Multi-round code repair
- Code functionality verification
- Multi-model support (GPT, Deepseek, Qwen, etc.)
The system consists of the following main modules:
- Code Repair Module (
fix.py,fix_projects.py) - Vector Retrieval Module (
save_defects_to_database.py) - Prompt Generation Module (
get_prompt.py) - Output Processing Module (
output_handler.py) - Context Extraction Module (
get_surrounding_context.py)
The control flow graph is generated by ArkAnalyzer.
You can get the CFG of the projects by running the script arkanalyzer/tests/CFGTest.ts.
- OpenAI GPT Series
- Deepseek Chat
- Qwen
- Ollama
- LLaMA
- GPTGod
- Configure environment variables:
OPENAI_API_KEY=<your_key>
OPENAI_API_BASE=<api_base>
DEEPSEEK_API_KEY=<your_key>
DEEPSEEK_API_BASE=<api_base>
PINECONE_API_KEY=<your_key>
- Install dependencies:
pip install -r requirements.txt
- Run defect detection:
python RQ1.py
- Run code repair:
python fix.py
Besides, you can use CodeLinter in Huawei DevEco Studio to detect code defects.
And use ArkAnalyzer to obtain the CFG of the code to check the functionality of the code.
We evaluate HapRepair on a curated benchmark of real-world OpenHarmony apps:
- Benchmark: 8,664 performance/security defects across 35 OpenHarmony projects, detected by HomeCheck (the ArkAnalyzer-based static checker also ported from Huawei CodeLinter rules) and tracked in
revision/target_projects_haprepair.json. - Rule taxonomy: 37 performance rules triggered across the benchmark; 15 (40.5%) are context-dependent and 22 (59.5%) are local/template-sufficient (see
summary/gpt-5.1_rq2_rule_type_context_ratio.md). - Pipeline: (1) HomeCheck scans each project and emits performance/security findings, (2)
get_surrounding_context.pyextracts the surrounding context for every finding, (3)save_defects_to_database.pyindexes curated fix exemplars into a Pinecone vector store, (4)get_prompt.pyretrieves the top-k nearest exemplars via RAG and assembles a repair prompt, and (5)fix.py/fix_projects.pydrive an iterative multi-round repair loop withoutput_handler.pyvalidating each patch. - Models evaluated: gpt-5.1 (main), gpt-5-mini, deepseek-chat, qwen3-coder-plus, qwen3-30b-a3b.
- Ablation axes: RAG top-k ∈ {0, 1, 3, 5}, diff strategy ∈ {difflib, gpt-diff, no-diff}, context scope ∈ {surrounding, full-file}.
- Protocol: Up to 6 repair rounds per project; a defect is counted as fixed only when HomeCheck no longer reports it on the rewritten code.
Reproduce the main table with:
python3 revision/code/delta_check_summarize.py --allow-missing-final-logs
Main result across LLMs. All five models converge within five repair iterations. GPT-5.1 / GPT-5-mini / DeepSeek-Chat / Qwen3-Coder-Plus drop from 8,664 initial defects to 236 / 166 / 247 / 353 respectively (97–98% resolution); Qwen3-30B-A3B plateaus higher at 1,348 (84%), underscoring that model capacity still matters for hard, context-heavy rules.
Category-level resolution (gpt-5.1). Performance rules drop from 8,150 → 234 (97%) and security rules from 514 → 2 (100%) after five iterations.
Per-project progression (sampled). HapRepair wipes out all 36 defects in PullLinking on round 1, takes flutter_embedding from 123 → 1, and drives the overall 35-project benchmark from 8,664 → 236 (97%).
Delta-check against "fix-by-deletion". A conservative audit filtering every resolved finding plausibly attributable to large-scale code deletion still leaves a net fix rate of 96.11% (8,327/8,664) — confirming the gains come from real repairs, not code removal (summary/gpt-5.1_delta_check.md).
Ablation study (gpt-5.1, round 1). RAG is the dominant factor; Top-3 retrieval is the sweet spot; surrounding context beats full-file; and providing a structural diff is essential.
Full breakdown: summary/ablation/ablation_summary_gpt-5.1_round1.md. The auto-generated per-project bars and top-10 rule charts (scripts/plot_readme_figures.py) provide an additional view of the same data.
- Requires configuration of relevant API keys
- Recommended Python version: 3.10+
- Requires sufficient GPU memory for running large language models
- Recommended to backup code before repair
Issues and Pull Requests are welcome to help improve the project.
This project is licensed under the MIT License - see the LICENSE file for details.




