-
Notifications
You must be signed in to change notification settings - Fork 15
Open
Description
Hi again,
In this wrapper, I wonder why for some language besides English, Tesseract API with tessdata-best.traineddata gives the result in the format of character base not word base like English. For example:
Thai
69 confidence: 93.2952651977539 - [63, 74, 74, 85]; ห
70 confidence: 93.29107666015625 - [77, 74, 83, 85]; า
71 confidence: 93.30585479736328 - [75, 64, 100, 93]; ให
72 confidence: 93.0483627319336 - [101, 70, 105, 85]; ้
73 confidence: 93.2821044921875 - [111, 69, 116, 85]; ร
Eng
0 confidence: 96.37889099121094 - [358, 42, 443, 66]; FOCUS
1 confidence: 95.37885284423828 - [147, 263, 328, 294]; LEADERS
2 confidence: 95.37885284423828 - [341, 266, 653, 294]; CONCENTRATE
3 confidence: 90.43708801269531 - [116, 315, 506, 342]; SINGLE-MINDEDLY
Do you have any suggestions of setting configs in order to making the result in word base or text line base format?
Metadata
Metadata
Assignees
Labels
No labels