-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
The current bigram model was built from a corpus with just over 1 millions tokens (words and punctuation). We need probably something five times this size for the production version. The current sparseness is illustrated by the prediction context "The j" where "Jewish" and "Jews" are both in the prediction list, something that would not be expected after analysis of a larger, more representative English corpus.
Metadata
Metadata
Assignees
Labels
No labels