Skip to content

scispacy entity results are hit and miss #32

@EmanuelFaria

Description

@EmanuelFaria

I just noticed something strange...

I filtered the scispacy.csv to show only rows containing:

  1. ((sentences containing TNF) AND (entities containing TNF)) (see scispacy_match.pdf attached)
  2. ((sentences containing TNF) AND (entities **NOT** containing TNF)) (see scispacy_mismatch.pdf attached)

The latter turned up a bunch of results where TNF was not recognized as an entity in the sentence. I don't see why it should detect entities sometimes and not others.


Another thing I noticed was I found a bunch of sentences with this typo: TNF-<space>𝛼 (TNF- 𝛼)
scispacy caught the "TNF-" but left out the alpha because of the space after the dash. (See scispacy_TNF-space.pdf attached). I don't know if there's anything we can do about that, but I thought it should be noted.

scispacy_TNF-space.pdf
scispacy_mismatch.pdf
scispacy_match.pdf

scispacy.csv

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions