项目概览

本项目是一个正在开发中的流水线，旨在自动化 Lean 4 定理的证明过程。它涵盖了从问题生成到证明分解，再到验证的整个流程。

核心功能

1. 证明生成

功能: AI 根据给定的问题生成定理证明。

2. 定理名修正 (待完善)

功能: 修复 AI 在生成证明时可能出现的定理名"幻觉"问题。
当前方法: 基于相关定理的 BLEU 分数进行近义词修复。

3. 数据集集成

功能: 将已解决的问题及其 AI 生成的证明添加到数据集中。

4. 证明分解 (Decomposition)

功能: 将 AI 生成的证明分解为"骨架"形式和一系列待填充的"空洞（hole）"。
当前策略: 基于 LeanRepl 的 all_tactics 获取 have by 块之间的关系。每个 by 块的"空洞"定义为其最后一个子 have 块之后的所有单句策略（tactic）。

5. 空洞填充与枚举

功能: 针对每个"空洞"，枚举所有可能的证明程序。
当前算法: 使用 N-gram 搜索算法。
可用策略（tactics）: norm_num, linarith, nlinarith, omega, ring, ring_nf, simp, simpa, field_simp, positivity, norm_cast 或 rw[相关定理]。
终端策略（Terminal Tactics）: linarith, nlinarith, omega。这些策略不能出现在"空洞"中非最后一个策略的位置。
实现细节 采用lean repl的Proofstate来管理枚举到的状态，这样就不用每次都从头跑了。

6. 证明状态管理

功能: 使用定制的 Lean REPL 和 Lean Interact 版本清理证明状态，防止搜索过程中出现内存溢出。
优化: 如果搜索到已遇到的证明状态，则进行剪枝。此外，LLM 会判断当前步骤是否更接近目标，若否，则进行剪枝。

TODOs

增大搜索空间。
改进解析方式（all_tactic 的解析存在不足）。

如何运行

快速开始 (Demo数据集)

如果你想快速测试系统功能，建议使用demo数据集：

# 1. 确保demo数据迁移到统一结构
python migrate_demo.py

# 2. 生成holes（处理前5个问题）
python decompose_hole_merge_pipeline.py dataset demo 5

# 3. 运行N-gram枚举流水线（处理前3个问题）
python minimal_verification_pipeline_ngram.py dataset demo 3

这个流程大约需要几分钟时间，会在 decomposition_results/demo/ 中生成完整的分析结果。

详细步骤

生成 AI 证明: 使用 generate_putnam.py 生成 AI 证明。

基本用法:
```
python generate_putnam.py
```
注意:
- 该脚本会自动从 dataset/putnam.jsonl 读取问题
- 生成的证明保存到 dataset/putnam/ 目录
- 如果文件已存在，会自动跳过以避免重复生成
- 使用多线程处理，默认最多50个并发请求
- 内置API频率限制：每60秒最多50次请求
输出示例:
- 生成的文件：dataset/putnam/putnam_YYYY_XX.lean
- 控制台会显示跳过和成功生成的文件信息
修复幻觉定理 (待修复): 使用 replace_unknown.py 来修复幻觉定理。
数据集迁移: 通过 migrate_demo.py 或 dataset_migration.py 将问题和 AI 生成的答案添加到数据集中。

Demo数据集迁移 (小规模测试):
```
python migrate_demo.py
```
- 读取 demo/ 目录中的所有 .lean 文件
- 自动拆分为 header 和 problem 部分
- 输出到 unified_problems/demo/ 结构
正式数据集迁移 (大规模数据集):
```
# 迁移 minif2f 数据集
python dataset_migration.py minif2f

# 迁移 putnam 数据集
python dataset_migration.py putnam

# 迁移 proverbench 数据集
python dataset_migration.py proverbench

# 迁移所有数据集
python dataset_migration.py all
```
迁移结果:
- 统一存储在 unified_problems/<dataset_name>/ 目录
- 每个问题包含 header.lean 和 problem.lean 文件
- 生成元数据文件 unified_problems_metadata.json

证明分解: 通过 decompose_hole_merge_pipeline.py 将 AI 生成的答案分解为骨架加空洞的形式。

处理整个数据集 (建议限制数量以避免耗时过久):

# Demo数据集（处理前5个文件）
python decompose_hole_merge_pipeline.py dataset demo 5

# Putnam数据集（处理前10个文件）
python decompose_hole_merge_pipeline.py dataset putnam 10

# Minif2f数据集（处理前3个文件）
python decompose_hole_merge_pipeline.py dataset minif2f 3

处理单个问题:

# 处理demo数据集中的特定问题
python decompose_hole_merge_pipeline.py problem demo demo_complex_p1

# 处理putnam数据集中的特定问题
python decompose_hole_merge_pipeline.py problem putnam putnam_2007_b6

输出结果:

结果保存在 decomposition_results/<dataset_name>/decomposed/<problem_id>/
包含文件：
- header.lean: 导入和声明
- problem.lean: 原始问题
- hole_version.lean: 带hole_X占位符的版本
- decomposition.json: 包含原始策略信息的元数据
生成汇总报告：<dataset_name>_pipeline_results.json

运行对每个hole进行枚举的流水线: 在拥有带空洞形式的数据集后，使用 N-gram 搜索算法填充空洞。

处理整个数据集:

# Demo数据集（处理前3个问题）
python minimal_verification_pipeline_ngram.py dataset demo 3

# 启用LLM剪枝功能（可提高效率但增加成本）
python minimal_verification_pipeline_ngram.py dataset demo 3 --llm-pruning

# 不使用恢复功能（从头开始）
python minimal_verification_pipeline_ngram.py dataset demo 3 --no-resume

# 强制重新处理特定问题
python minimal_verification_pipeline_ngram.py dataset demo --force-reprocess demo_complex_p1,demo_complex_p2

处理单个问题:

# 处理特定问题
python minimal_verification_pipeline_ngram.py problem demo demo_complex_p2

# 处理putnam问题
python minimal_verification_pipeline_ngram.py problem putnam putnam_2007_b6

处理单个空洞:

# 针对特定空洞进行调试
python minimal_verification_pipeline_ngram.py hole demo demo_complex_p1 hole_3

输出结果:

N-gram搜索结果：ngram_search_results/
检查点文件：ngram_checkpoints/<problem_id>_ngram_checkpoint.json
汇总报告：<dataset_name>_minimal_verification_summary_ngram.json
详细日志：包含每个空洞的搜索过程和结果

可用选项:

--llm-pruning: 启用LLM基的搜索空间剪枝
--no-resume: 禁用检查点恢复功能
--force-reprocess: 强制重新处理指定问题（逗号分隔）

查看完整帮助:

python minimal_verification_pipeline_ngram.py --help

重要注意事项

数据集规模很大：minif2f、putnam和proverbench数据集包含大量文件，全量处理会耗费大量时间
单个问题也耗时：即使是单个问题的处理也可能需要几分钟到几十分钟
建议使用限制：在测试阶段使用 limit 参数限制处理的文件数量
检查点功能：系统支持中断恢复，中断后可以从上次停止的地方继续

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
.cursor/rules		.cursor/rules
.devcontainer		.devcontainer
.vscode		.vscode
dataset		dataset
decomposition_results		decomposition_results
demo		demo
docs		docs
examples		examples
minif2f-dspv2		minif2f-dspv2
ngram_checkpoints		ngram_checkpoints
ngram_search_results		ngram_search_results
proverbench		proverbench
unified_problems		unified_problems
.gitignore		.gitignore
.gitmodules		.gitmodules
CLAUDE.md		CLAUDE.md
PRE_REFACTOR_BASELINE_REPORT.md		PRE_REFACTOR_BASELINE_REPORT.md
PROOFSTATE_MEMORY_MANAGEMENT.md		PROOFSTATE_MEMORY_MANAGEMENT.md
README.md		README.md
a.ipynb		a.ipynb
analyze_results.py		analyze_results.py
batch_fix_unknown.py		batch_fix_unknown.py
clean_ngram_final_report.md		clean_ngram_final_report.md
clean_ngram_searcher.py		clean_ngram_searcher.py
collectpath.py		collectpath.py
component_isolation_results.json		component_isolation_results.json
dataset_migration.py		dataset_migration.py
decompose_hole_merge_pipeline.log		decompose_hole_merge_pipeline.log
decompose_hole_merge_pipeline.py		decompose_hole_merge_pipeline.py
decompose_solver.py		decompose_solver.py
demo_dataset_testing_guide.md		demo_dataset_testing_guide.md
demo_pipeline_solve.log		demo_pipeline_solve.log
demo_test_results.md		demo_test_results.md
dpv2_solver.py		dpv2_solver.py
dynamic_theorem_extractor.py		dynamic_theorem_extractor.py
extract_related_theorems.py		extract_related_theorems.py
filter_json_theorems.py		filter_json_theorems.py
findpath.lean		findpath.lean
findpathbackward.lean		findpathbackward.lean
generate_proofnet.py		generate_proofnet.py
generate_proverbench.py		generate_proverbench.py
generate_putnam.py		generate_putnam.py
global_config.py		global_config.py
integrate_ngram_pipeline.py		integrate_ngram_pipeline.py
lake-manifest.json		lake-manifest.json
lakefile.lean		lakefile.lean
lean-toolchain		lean-toolchain
lean_theorem_analyzer.py		lean_theorem_analyzer.py
llm.log		llm.log
llm_pruner.py		llm_pruner.py
mathlib_analyzer.py		mathlib_analyzer.py
migrate_demo.py		migrate_demo.py
minif2f_analysis_results.csv		minif2f_analysis_results.csv
minif2f_final_analysis.csv		minif2f_final_analysis.csv
minif2f_full_unigram.log		minif2f_full_unigram.log
minif2f_minimal_analysis_results.csv		minif2f_minimal_analysis_results.csv
minif2f_pipeline.log		minif2f_pipeline.log
minif2f_pipeline_solve.log		minif2f_pipeline_solve.log
minimal_verification_pipeline.py		minimal_verification_pipeline.py
minimal_verification_pipeline_ngram.py		minimal_verification_pipeline_ngram.py
ngram_demo_validation_results.json		ngram_demo_validation_results.json
ngram_integration_test_results.json		ngram_integration_test_results.json
ngram_memory_manager.py		ngram_memory_manager.py
ngram_pickle_pipeline.py		ngram_pickle_pipeline.py
ngram_pipeline_integration.py		ngram_pipeline_integration.py
ngram_pipeline_test_report.json		ngram_pipeline_test_report.json
ngram_system_acceptance_report.md		ngram_system_acceptance_report.md
ngram_tactic_search_design.md		ngram_tactic_search_design.md
ngram_tactic_searcher.py		ngram_tactic_searcher.py
ngram_types.py		ngram_types.py
proofstate_cache.py		proofstate_cache.py
proofstep_dependency_verification.md		proofstep_dependency_verification.md
proofstep_explorer.py		proofstep_explorer.py
proofstep_integration.py		proofstep_integration.py
proofstep_integration_demo.lean		proofstep_integration_demo.lean
proofstep_lean_integration.py		proofstep_lean_integration.py
readmem.sh		readmem.sh
replace_unknown.py		replace_unknown.py
requirements.txt		requirements.txt
run_ngram_demo.py		run_ngram_demo.py
server_restart_test_results.json		server_restart_test_results.json
similarity_utils.py		similarity_utils.py
static_theorem_filter.py		static_theorem_filter.py
substep_saver_decompose.py		substep_saver_decompose.py
task2_proofstep_integration_summary.md		task2_proofstep_integration_summary.md
theorem_extractor.py		theorem_extractor.py
theorem_sourcing_interface.py		theorem_sourcing_interface.py
theorem_sourcing_workflow.md		theorem_sourcing_workflow.md
unified_batch_processor.py		unified_batch_processor.py
unified_lean_environment.py		unified_lean_environment.py
unified_problem_manager.py		unified_problem_manager.py
unified_problems_metadata.json		unified_problems_metadata.json
verify_consistency.py		verify_consistency.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

项目概览

核心功能

1. 证明生成

2. 定理名修正 (待完善)

3. 数据集集成

4. 证明分解 (Decomposition)

5. 空洞填充与枚举

6. 证明状态管理

TODOs

如何运行

快速开始 (Demo数据集)

详细步骤

重要注意事项

About

Uh oh!

Releases

Packages

Uh oh!

Languages

shanjiaming/lean-pl-fix

Folders and files

Latest commit

History

Repository files navigation

项目概览

核心功能

1. 证明生成

2. 定理名修正 (待完善)

3. 数据集集成

4. 证明分解 (Decomposition)

5. 空洞填充与枚举

6. 证明状态管理

TODOs

如何运行

快速开始 (Demo数据集)

详细步骤

重要注意事项

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages