Add --ner-label-index option to validate#9
Conversation
ConstantineLignos
left a comment
There was a problem hiding this comment.
Some quick comments to look at before we meet tomorrow. Can you check what is causing the test failures as well?
| label = splits[-1] | ||
| other_fields = tuple(splits[1:-1]) | ||
| label = splits[ner_label_index] | ||
| other_fields = tuple(splits[1:ner_label_index] + splits[ner_label_index + 1:]) |
There was a problem hiding this comment.
This is unfortunately going to mean we need a larger change and might need to abandon the current behavior for other_fields. Let's go over it tomorrow.
There was a problem hiding this comment.
The logic here is what was causing the test failures (when ner_label_index = -1, ner_label_index + 1 = 0, so all the fields get copied into other_fields). I fixed that issue, but I'm guessing you're talking about the other issue where the order of `other_fields' might be incorrect?
There was a problem hiding this comment.
Yes, I think the original logic is going to need to be updated since it makes a lot of assumptions.
| runner = CliRunner() | ||
| result = runner.invoke( | ||
| validate, | ||
| ["--labels", "BIO", "--ner-label-index", 1, os.path.join("tests", "conll_annotation", "labels_not_last_col.bio")], |
There was a problem hiding this comment.
Add another test with the index as -2 instead so we can test positive and negative indices.
|
We’ll meet this afternoon, but before then please sort out why tests only pass on some Python versions and make sure to explicitly block the NER label index being zero (that is reserved for the token at the moment, but we’ll revisit). |
Modifies the
validatemethod by adding an optional--ner-label-indexflag, which specifies the column for the NER label. The default value is -1.