Skip to content

Conversation

@Raphael-Gazzotti
Copy link
Member

@Raphael-Gazzotti Raphael-Gazzotti commented Oct 29, 2025

Extracts metadata from CITATION.cff including:

  • DOI
  • authors (with ORCID, email, affiliation)
  • license
  • title -fullName, shortName-
  • versionIdentifier

Also lays groundwork for extracting additional information such as preferred-citation.

Closes #89

…, license, title -fullName, shortName-, versionIdentifier
@Raphael-Gazzotti Raphael-Gazzotti self-assigned this Oct 29, 2025
@Raphael-Gazzotti Raphael-Gazzotti added the enhancement New feature or request label Oct 29, 2025
@Peyman-N
Copy link
Member

The pull request seems reasonable. Can you add some tests for it, please?

if 'orcid' in person:
person_orcid = omcore.ORCID(identifier=person['orcid'])
if 'affiliation' in person:
person_affiliation = omcore.Affiliation(person['affiliation'])
Copy link
Member

@apdavison apdavison Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems incomplete. The affiliation should contain one or more Organization or Consortium instances, in the "member_of" property.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I’ve updated it. However, there’s an implementation choice to make: should we treat everything in the affiliation field as a single, unique organization, or should we implement rules to distinguish between them?

For now, I’ve gone with the first option, considering everything in affiliation as a single organization, but it’s probably safer to not consider the affiliation field at the moment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the format seems to expect a single affiliation. I think it's reasonable to assume a single organisation.

It will be very difficult to reliably detect multiple organisations - maybe implement something simple like splitting the string on ";" - this character would not normally be used within the name/address of a single organisation.

@Raphael-Gazzotti
Copy link
Member Author

@Peyman-N I added some tests to cover the new extraction.

Copy link
Member

@lzehl lzehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apdavison or @Peyman-N please one of you double check. It looks good from my side and can be merged by the next one confirming the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Metadata extraction from CITATION.cff file

4 participants