Skip to content

Conversation

@qqmyers
Copy link
Member

@qqmyers qqmyers commented Dec 19, 2025

What this PR does / why we need it: This PR contains multiple updates to the OAI-ORE metadata export and archival Bag output:

OAI-ORE

  • now uses URIs for checksum algorithms, as required to create valid JSON-LD
  • a bug causing failures with deaccessioned versions when the deaccession note ("Deaccession Reason" in the UI) was null (which has been allowed via the API).
  • the "https://schema.org/additionalType" is updated to "Dataverse OREMap Format v1.0.2" to indicate that the out has changed

Archival Bag

  • for dataset versions with no files, the (empty) manifest-.txt file created will now use the default algorithm defined by the "FileFixityChecksumAlgorithm" setting rather than always defaulting to "md5"
  • a bug causing the bag-info.txt to not have information on contacts when the dataset version has more than one contact has been fixed
  • values used in the bag-info.txt file that may be multi-line (with embedded CR or LF characters) are now properly indented/formatted per the BagIt specification (i.e. Internal-Sender-Identifier, External-Description, Source-Organization, Organization-Address).
  • the name of the dataset is no longer used as a subdirectory under the data directory (dataset names can be long enough to cause failures when unzipping)
  • a new key, "Dataverse-Bag-Version" has been added to bag-info.txt with a value "1.0", allowing tracking of changes to Dataverse's acrhival bag generation
  • improvements to handling file retrieval errors and throttling

Which issue(s) this PR closes:

(I assume these don't work to automate closing, but they're here to let DANS see the correlations)

  • Closes #DD-2109
  • Closes #DD-2082
  • Closes #DD1508
  • Closes #DD2112

Special notes for your reviewer: FWIW: Some of the simpler fixes are in a single commit if you want to look at them separately.

Suggestions on how to test this: There are new tests for the multiline wrapping functionality and overall generation of the bag-info-txt file. Testing overall bag generation means configuring an archiver per the guides (probably the local file one) and verifying the output. The main changes to check are in the OAI-ORE file, bag-info file, and that there is no data/* dir using the dataset title. One could also verify that the manifest file name with no files matches the configured algorithm.
Given that DANS is interested in these, I assume they'll also be testing - could coordinate.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?: note added

Additional documentation:

qqmyers and others added 30 commits December 6, 2025 18:26
Spec doesn't allow empty lines, dropping whitespace-only lines seems
reasonable as well (users can't see from the Dataverse display whether
an empty line would appear in bag-info.txt or not if we all whotespace
only lines (or whitespace beyond the 78 char wrap limit)
affects manifest and pid-mapping files as well as data file placement
Added unit tests for multilineWrap
@qqmyers qqmyers added this to the 6.10 milestone Dec 22, 2025
@qqmyers qqmyers added Size: 10 A percentage of a sprint. 7 hours. GDCC: DANS related to GDCC work for DANS TDL of interest to the Texas Digital Library GDCC: QDR of interest to QDR labels Dec 22, 2025
@qqmyers qqmyers marked this pull request as ready for review December 22, 2025 22:16
@coveralls
Copy link

coveralls commented Dec 22, 2025

Coverage Status

coverage: 24.418% (+0.2%) from 24.232%
when pulling b4a3799 on GlobalDataverseCommunityConsortium:OREBag1.0.2
into 96e96f4 on IQSS:develop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

GDCC: DANS related to GDCC work for DANS GDCC: QDR of interest to QDR Size: 10 A percentage of a sprint. 7 hours. TDL of interest to the Texas Digital Library

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants