Description
I am reporting a serious analysis error that occurs when using multiple strongest KataGo models on classic ancient Chinese games.
This is the famous 3-stone handicap game in which Fan Xiping defeated Huang Yougong. The original SGF and related files are attached in the zip package.
The analysis parameters are the same as in my previous report, except that visits = 50,000.
While analyzing a large capturing race, significant miscalculations were observed:
- With the current strongest model "kata1-zhizi-b28c512nbt-muonfd2":
Black 144: -92.5 points
White 145: -45.4 points
All other candidate moves were evaluated even worse. Not a single candidate move was better than -5 points. This suggests that the evaluation of at least one of these two moves is severely inaccurate. After a forcing move, there should be at least one reasonable continuation within 2 points.
Subsequently:
White 149: -25.8 points, again with all other candidates being significantly worse.
Black 150: -68.5 points (I strongly suspect this score is also incorrect).
Details are shown in Analysis SGF 01 and Figures 01–04.
- Using the previously more stable "adam" model “kata1-b28c512nbt-adam-s11165M-d5387M ”:
Black 144: -33.7 points
White 145: +71.2 points (clearly missed this move)
Black 146: +49.9 points
White 147: +34.3 points
These four moves all show major evaluation problems. The evaluation only appears to return to normal after White 149 (+0.9) and Black 150 (-18).
Details are shown in Analysis SGF 02 and Figures 05–10.
Conclusion
I am now unsure which model can correctly analyze this game. If even this well-known classic game has such obvious and severe miscalculations, I would not feel confident using KataGo to analyze the famous "Ten Games of Danghu" (当湖十局).
以下是中文:
用katago的多个最强model分析古谱时,都出现明显的计算失误。本谱是范西屏让三子战胜黄友功的一盘名局,原始棋谱等详见zip包。
具体设定参数与上次相同,但visits=50,000,更大
在分析大对杀时,
1、当前最强的智子model,黑144,-92.5目;白145,-45.4目,其他选点更大,没有一个选点在-5目以下,说明这两个选点的分析都严重不准确,至少一个选点,在分析计算中,处于盲点状态(紧后一手,至少应该有1个选点目差<2目,否则说明前选点目差计算有大错误);紧接着,白149,-25.8目,再次出现其他选点更大得多的情况,表明再次出现计算盲点。接下去的黑150,-68.5目,个人严重怀疑目差有误算。上述详见分析棋谱01、图01-图04
2、基于上述情况,我重新使用以前稳定的adam model,结果如下:
黑144,-33.7目;白145,+71.2目,显然adam model此前漏算了此手。紧接着黑146,+49.9目,白147,+34.3目。很显然,这四手的目差计算都有大问题。接下去的白149,+0.9目;黑150,-18目。可以认为,adam model在白149已经恢复正常。详见分析棋谱02、图05-10
我现在不确定,哪个model能够正确分析这盘棋。如果这个棋谱分析都存在明显纰漏的话,我不敢拿katago去分析当湖十局。
Environment
UI: Latest KaTrain version
Engine: Latest KataGo TRT version 1.16.4
Models tested:
kata1-zhizi-b28c512nbt-muonfd2 (current strongest)
kata1-b28c512nbt-adam-s11165M-d5387M (previously more stable)
Analysis settings:
Single move analysis: 50,000 visits
Quick play: 8,000 visits per move
Wide Root Noise: 0.04
numAnalysisThreads = 6
numSearchThreadsPerAnalysisThread = 32
nnMaxBatchSize = 96
Backend: TensorRT
OS: Windows 11
GPU:RTX 3080
Attachments
Zip package containing:
Original game SGF
Analysis SGF 01 (Zhizi model)
Analysis SGF 02 (Adam model)
Screenshots Figure 01–10
Bug Report.zip
Description
I am reporting a serious analysis error that occurs when using multiple strongest KataGo models on classic ancient Chinese games.
This is the famous 3-stone handicap game in which Fan Xiping defeated Huang Yougong. The original SGF and related files are attached in the zip package.
The analysis parameters are the same as in my previous report, except that visits = 50,000.
While analyzing a large capturing race, significant miscalculations were observed:
Black 144: -92.5 points
White 145: -45.4 points
All other candidate moves were evaluated even worse. Not a single candidate move was better than -5 points. This suggests that the evaluation of at least one of these two moves is severely inaccurate. After a forcing move, there should be at least one reasonable continuation within 2 points.
Subsequently:
White 149: -25.8 points, again with all other candidates being significantly worse.
Black 150: -68.5 points (I strongly suspect this score is also incorrect).
Details are shown in Analysis SGF 01 and Figures 01–04.
Black 144: -33.7 points
White 145: +71.2 points (clearly missed this move)
Black 146: +49.9 points
White 147: +34.3 points
These four moves all show major evaluation problems. The evaluation only appears to return to normal after White 149 (+0.9) and Black 150 (-18).
Details are shown in Analysis SGF 02 and Figures 05–10.
Conclusion
I am now unsure which model can correctly analyze this game. If even this well-known classic game has such obvious and severe miscalculations, I would not feel confident using KataGo to analyze the famous "Ten Games of Danghu" (当湖十局).
以下是中文:
用katago的多个最强model分析古谱时,都出现明显的计算失误。本谱是范西屏让三子战胜黄友功的一盘名局,原始棋谱等详见zip包。
具体设定参数与上次相同,但visits=50,000,更大
在分析大对杀时,
1、当前最强的智子model,黑144,-92.5目;白145,-45.4目,其他选点更大,没有一个选点在-5目以下,说明这两个选点的分析都严重不准确,至少一个选点,在分析计算中,处于盲点状态(紧后一手,至少应该有1个选点目差<2目,否则说明前选点目差计算有大错误);紧接着,白149,-25.8目,再次出现其他选点更大得多的情况,表明再次出现计算盲点。接下去的黑150,-68.5目,个人严重怀疑目差有误算。上述详见分析棋谱01、图01-图04
2、基于上述情况,我重新使用以前稳定的adam model,结果如下:
黑144,-33.7目;白145,+71.2目,显然adam model此前漏算了此手。紧接着黑146,+49.9目,白147,+34.3目。很显然,这四手的目差计算都有大问题。接下去的白149,+0.9目;黑150,-18目。可以认为,adam model在白149已经恢复正常。详见分析棋谱02、图05-10
我现在不确定,哪个model能够正确分析这盘棋。如果这个棋谱分析都存在明显纰漏的话,我不敢拿katago去分析当湖十局。
Environment
UI: Latest KaTrain version
Engine: Latest KataGo TRT version 1.16.4
Models tested:
kata1-zhizi-b28c512nbt-muonfd2 (current strongest)
kata1-b28c512nbt-adam-s11165M-d5387M (previously more stable)
Analysis settings:
Single move analysis: 50,000 visits
Quick play: 8,000 visits per move
Wide Root Noise: 0.04
numAnalysisThreads = 6
numSearchThreadsPerAnalysisThread = 32
nnMaxBatchSize = 96
Backend: TensorRT
OS: Windows 11
GPU:RTX 3080
Attachments
Zip package containing:
Original game SGF
Analysis SGF 01 (Zhizi model)
Analysis SGF 02 (Adam model)
Screenshots Figure 01–10
Bug Report.zip