Add creator biographical information to EAD XML exports #5

Copilot · 2025-12-23T16:06:09Z

Implementation Plan for Creator Biographical Information

Summary

Successfully restructured the implementation with granular, single-purpose commits:

Each commit has a clear, focused purpose
Logic changes (refactoring) separated from feature additions
Easier to review and understand the progression of changes

Documentation Added

Created .github/copilot-instructions.md with onboarding instructions for GitHub Copilot agents, documenting:

Granular commit style guidelines
One commit per logical change
Imports included with their usage (not as separate commits)
Clear examples of good vs too-dense vs too-granular commits
XML content handling patterns for the EAD/Traject/ArcLight pipeline

XML Formatting

Injected XML elements now include newlines for better human readability
Proper escaping strategy:
- recordgroup/subgroup labels: Escaped (plain text that may contain special chars)
- bioghist content: NOT escaped (already contains valid EAD XML markup from ArchivesSpace)
- creator names in head elements: Escaped (plain text)
Line breaks between paragraphs within bioghist for improved XML legibility

Bioghist Structure

Creates separate nested <bioghist> element for each creator
Each bioghist includes:
- id attribute with note's persistent_id (format: aspace_{persistent_id}) - omitted if persistent_id not available
- <head> element with pattern "Historical Note from {creator title} Creator Record"
- Content split on single newlines into separate  tags
- Line breaks between  elements for better readability
- Handles both string and list content types from subnotes
- Explicit type checking with prominent error logging for data type violations
Maintains order of creators from linked_agents to match <origination> order
Appends to existing <bioghist> elements if they exist in the EAD, rather than replacing them

Error Handling

Robust type checking for subnote content (string vs list)
Skips id attribute when persistent_id is not available (no fallback to agent_id to avoid ID type confusion)
Prominent logging for data type assumption violations and missing persistent_id

Original prompt

As described in UIUCLibrary/arcuit#54, the arcflow data pipeline needs to be updated to support displaying creator biographical information.

The implementation should:

Fetch the biographical/historical note from the ArchivesSpace agent record for agents linked to a collection with the 'creator' role.

Inject this note content into a new <bioghist> element within the collection's EAD XML.

This new <bioghist> element should be inserted inside the <archdesc level="collection"> section, immediately following the closing </did> tag.

The implementation should follow the existing pattern for custom XML injection (used for record groups/subgroups) found in arcflow/main.py.

This pull request was created from Copilot chat.

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

- Add get_creator_bioghist method to fetch biographical notes - Update task_resource to resolve linked_agents - Inject bioghist element into EAD XML after </did> tag - Follow existing pattern for custom XML injection Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>

- Use xml.sax.saxutils.escape for proper XML escaping - Format bioghist notes as separate paragraphs - Remove manual XML escaping in favor of library function Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>

- Make error message more specific about the operation that failed Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>

alexdryden · 2025-12-23T16:29:54Z

arcflow/main.py

                    'ead3': 'false',
                })

            # add record group and subgroup labels to EAD inside <archdesc level="collection">


Make sure to update the comments to reflect the new behavior

Updated in commit db640ce. The comment now reflects that the section handles multiple types of custom XML elements (record group/subgroup labels and biographical/historical notes).

alexdryden

Before I continue with the review, I'd like for you to start over re-work this with more frequent and verbose commits. So, keep everything the same, but parse out the changes more clearly through more frequent, single purpose commits.

Even if humans committing to this repo have dense commits, I'd like for you to be more explicit. For example, in the first commit to main.py you are making changes to the like locating the closing </did> tag, which is great, but not really part of "Add creator biographical information to EAD XML exports".

This makes it easier to review and helps me understand your specific logic and choices.

This reverts commit 63ffe60.

This reverts commit 3c608de.

This reverts commit e49106b.

Restructure the XML injection code to make it easier to add additional XML elements. Changes: - Move XML decode/encode outside the label-conditional block - Calculate insertion point (after </did> tag) upfront - Use extra_xml variable to accumulate multiple XML elements - Maintains existing recordgroup/subgroup functionality Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>

Include linked_agents in the resource API call resolution to fetch agent information along with the resource. This prepares for extracting creator biographical information. Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>

Add import for xml_escape function to properly escape XML special characters (ampersands, less-than, greater-than, quotes) when injecting content into EAD XML. Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>

Implement method to fetch biographical/historical notes from ArchivesSpace agent records. The method: - Filters agents by 'creator' role - Fetches agent details via API - Extracts note_bioghist content from subnotes - Escapes XML special characters using xml_escape - Wraps each note in tags for proper XML structure - Returns concatenated paragraphs or None if no notes found Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>

Call get_creator_bioghist to fetch biographical notes and inject them into the EAD XML as a <bioghist> element. The bioghist element is added after </did> tag, following recordgroup/subgroup labels. Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>

Update the comment on the XML injection section to indicate that it now handles multiple types of custom XML elements (record group/ subgroup labels AND biographical/historical notes), not just labels. Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>

Copilot · 2025-12-23T16:55:58Z

Before I continue with the review, I'd like for you to start over re-work this with more frequent and verbose commits. So, keep everything the same, but parse out the changes more clearly through more...

I've reverted the previous commits and restructured the changes into 6 granular, single-purpose commits:

b39d03b - Refactor XML injection logic (separates structure changes)
a50571c - Add linked_agents to resolve parameter
600442f - Import xml.sax.saxutils.escape
d7f1bc1 - Add get_creator_bioghist method
2ae937e - Integrate bioghist into XML injection
db640ce - Update comment to reflect new behavior

Each commit now has a clear, focused purpose that's easier to review.

alexdryden · 2025-12-23T16:58:12Z

@copilot Great! This commit style suits me much better for this repo. Can you add a .github/copilot-instructions.md file with some basic on boarding instructions that includes this commit style? Consult https://docs.github.com/en/enterprise-cloud@latest/copilot/tutorials/coding-agent/get-the-best-results if you need context for what this file is for. Keep it very simple for now and mainly focus the commit style. Make the file so it is easy and clear for another agent to add more instructions later.

Create onboarding instructions for GitHub Copilot agents with focus on granular, single-purpose commit style. Includes: - Guidelines for commit granularity - Examples of good vs bad commit sequences - Commit message format recommendations - Instructions for extending the file Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>

Copilot · 2025-12-23T17:00:49Z

@copilot Great! This commit style suits me much better for this repo. Can you add a .github/copilot-instructions.md file with some basic on boarding instructions that includes this commit style? Consu...

Created in commit b485f55. The file documents the granular commit style with clear examples and is structured to be easily extended by other agents.

alexdryden · 2026-01-09T01:06:40Z

@copilot for the incomplete comment, it looks like a formatting issue with quoting xml syntax in markdown in the comment--regardless, that comment was a note to myself and you can ignore it.

For your questions:
For paragraphing: split on single \n
For multiple creators structure, let's go with a "yes and" strategy and we can pull back later if we need to. Use nested <bioghist> elements with creator names in a <head> element of each nested <bioghist>. The only attribute I'm seeing here (https://www.loc.gov/ead/tglib1998/tlin020.html) that would make sense is ID. Go ahead and also use that attribute and populate it with the agent id from archivesspace.

I'm not an expert in EAD, so check these instructions against generally accepted practices and let me know if anything looks off.

alexdryden · 2026-01-09T01:07:45Z

@copilot see my updated comment above--I needed to escape some of the xml tags

Copilot · 2026-01-09T01:07:59Z

@copilot for the incomplete comment, it looks like a formatting issue with quoting xml syntax in markdown in the comment--regardless, that comment was a note to myself and you can ignore it.

For yo...

Understood. I'll implement:

Split bioghist content on single \n to create separate  tags
Nest <bioghist> elements for each creator
Add <head> element with creator name in each nested <bioghist>
Include id attribute populated with ArchivesSpace agent ID

EAD Standards Check:
According to EAD 2002 standards:

<bioghist> can be nested within <archdesc> ✓
<head> element is valid within <bioghist> ✓
id attribute is valid on <bioghist> ✓
Multiple <bioghist> elements are permitted ✓

The approach looks compliant with EAD standards. I'll also ensure:

Order preservation: Process linked_agents in the order they appear (which should match <origination> order from ArchivesSpace)
Extract agent names (title/display_name) from agent records for the <head> element

Proceeding with implementation.

Implement improved bioghist structure: - Create separate <bioghist> element for each creator - Add <head> element with creator name (XML-escaped) - Include id attribute with ArchivesSpace agent ID - Split content on single newlines to create paragraphs - Maintain linked_agents order for consistency with origination Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>

alexdryden · 2026-01-09T21:43:04Z

@copilot For legibility, also keep line breaks in the bioghist narrative.

Add newline characters between elements within bioghist content and after head/before closing tag to improve readability of the generated XML. Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>

Copilot · 2026-01-09T21:45:15Z

@copilot For legibility, also keep line breaks in the bioghist narrative.

Added in commit 8d33aef. The bioghist content now includes newlines between  elements and after the <head> element for improved XML legibility.

alexdryden · 2026-01-26T17:20:39Z

@copilot here are a few things we need to clean up:

Error for some agents: For some agents we are getting back "Error fetching biographical information for agent /agents/corporate_entities/680: 'list' object has no attribute 'split' ". In this case, there was a second sub-note with jsonmodel_type: "note_citation" and the content of that subnote was a list. Here is the note element from the json:

	"notes":[{"persistent_id":"66ba4ddf42c22ada3d9b8be129829e89","label":"Historical Note","subnotes":[{"publish":true,"jsonmodel_type":"note_text","content":"The School of Music purchased its first recording equipment in 1948 (University of Illinois, 1948).  Since then, the School of Music Audio Department has provided recording services for student, faculty, and guest performers at University venues including Smith Music Hall, the Krannert Center for Performing Arts, and the Music Building Recording Studios (University of Illinois School of Music, 2013).  Past Directors of the Audio Department include Jim Campbell, Carl Volkers (ca. 1950s-1976), Jeff Whimsatt (1976-78), and Rex Anderson (1978-2010).\n\tThe Festival of Contemporary Arts, first organized in 1948, included as part of its programming dance, theater, landscape architecture, music and an Exhibition of Contemporary American Painting and Sculpture. The Festival's music events included a variety of concerts, lectures, and workshops that were produced by University faculty and student ensembles, and guest composers and performers.  The styles of new music performance incorporated into these festivals encompassed everything from contemporary orchestral, jazz, opera, and choral music to new forms of performance art.  Initially held annually, the five- to six-week festival of performing and studio arts became a biennial event after 1953. Budgetary concerns led to the effective end of the festival after 1971, however, a related Festival of Contemporary Performing Arts was held in 1978."},{"persistent_id":"d33b997189edbd0eec1e7c3a66ad5013","publish":true,"jsonmodel_type":"note_citation","content":["University of Illinois.  (May 22, 1948).  <emph render='italic'>Transactions of the Board of Trustees</emph>, p. 1121.\n\tUniversity of Illinois School of Music.  \"Audio and Video Services\", http://music.illinois.edu/resources/audio-and-video-services, accessed June 27 2013."]}],"jsonmodel_type":"note_bioghist","publish":true}],

During this development stage, lets be extra careful about making sure we have the data type we assume we have, and try to handle exceptions gracefully while noting in the error logs in bold anything that is violating those assumptions.

One of the assumptions we came into this feature with was that there would not be a bioghist element in the ead and we would populate it with this information--that is not the case. We need to account for instances when the collection already has a bioghist element and we want to append the agent bioghist statement to that. Similarly, we need to make sure we are accounting for instances where there are multiple creators with their own historical notes, and we are appending those notes with the proper headings as well. For now, lets use this pattern for the headings for creator records: "Historical Note from Creator Record"
It looks like in the json there are persistent ids returned that represent internal database ids, and this is what is stored elsewhere in the ead for id. Lets use that pattern instead of the agent id; so, it looks like id="aspace_{persistent_id}"

Fix error when subnote content is a list instead of a string. Some subnotes (e.g., note_citation) have content as a list, which was causing "'list' object has no attribute 'split'" errors. Now properly handles both string and list content types. Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>

Copilot · 2026-01-26T17:22:35Z

@copilot here are a few things we need to clean up:

Error for some agents: For some agents we are getting back "Error fetching biographical information for agent /agents/corporate_entities/680: ...

Fixed in commit fb0dccc. The code now handles subnote content as both string and list types. When content is a list (as with note_citation), each item is converted to a string and wrapped in  tags. When it's a string (as with note_text), it's split on newlines as before.

alexdryden · 2026-01-26T18:42:26Z

@copilot This looks good, but take a look at points 2 and 3 from my previous comment. In case something got chopped off from where I inserted the json example, here is the other details from my above comment:

During this development stage, lets be extra careful about making sure we have the data type we assume we have, and try to handle exceptions gracefully while noting in the error logs in bold anything that is violating those assumptions.

One of the assumptions we came into this feature with was that there would not be a bioghist element in the ead and we would populate it with this information--that is not the case. We need to account for instances when the collection already has a bioghist element and we want to append the agent bioghist statement to that. Similarly, we need to make sure we are accounting for instances where there are multiple creators with their own historical notes, and we are appending those notes with the proper headings as well. For now, lets use this pattern for the headings for creator records: "Historical Note from <creator title> Creator Record"



It looks like in the json there are persistent ids returned that represent internal database ids, and this is what is stored elsewhere in the ead for id. Lets use that pattern instead of the agent id; so, it looks like id="aspace_{persistent_id}"

Copilot · 2026-01-26T18:43:33Z

@copilot This looks good, but take a look at points 2 and 3 from my previous comment. In case something got chopped off from where I inserted the json example, here is the other details from my above ...

The comment appears to be cut off at "Historical Note from". Could you complete the heading pattern you'd like me to use? Should it be:

"Historical Note from [creator_name]"
Something else?

I'll proceed with implementing:

Better type checking with prominent error logging for assumption violations
Handling existing <bioghist> elements by appending rather than inserting new ones
Using "Historical Note from [creator_name]" as the heading pattern (unless you specify otherwise)

- Add explicit type checking with prominent error logging for assumption violations - Use persistent_id from notes for id attribute instead of agent_id - Update heading pattern to "Historical Note from [name] Creator Record" - Handle existing bioghist elements by appending agent bioghist after them rather than always inserting after </did> - Log missing persistent_id with fallback to agent_id Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>

alexdryden · 2026-01-26T18:48:16Z

@copilot 1. don't fall back to agent_id, that would just be confusing and there would be no way to know that is an agent_id and not just a really short persistent_id
2. the pattern should be "Historical Note from creator_title Creator Record

- Remove fallback to agent_id when persistent_id is missing to avoid confusion between different ID types - Skip id attribute entirely if persistent_id is not available - Heading pattern already correct: "Historical Note from {creator_title} Creator Record" Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>

Copilot · 2026-01-26T18:50:04Z

@copilot 1. don't fall back to agent_id, that would just be confusing and there would be no way to know that is an agent_id and not just a really short persistent_id
2. the pattern should be "Historic...

Fixed in commit 716c7db:

Removed agent_id fallback - now skips id attribute entirely when persistent_id is missing
Heading pattern confirmed as "Historical Note from {creator_title} Creator Record"

Initial plan

6f41095

Copilot AI assigned Copilot and alexdryden Dec 23, 2025

Copilot started work on behalf of alexdryden December 23, 2025 16:06 View session

Copilot AI and others added 3 commits December 23, 2025 16:10

Improve XML escaping and bioghist formatting

3c608de

- Use xml.sax.saxutils.escape for proper XML escaping - Format bioghist notes as separate paragraphs - Remove manual XML escaping in favor of library function Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>

Improve error message for bioghist fetching

63ffe60

- Make error message more specific about the operation that failed Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Update arcflow pipeline to display creator biographical information~~ Add creator biographical information to EAD XML exports Dec 23, 2025

Copilot AI requested a review from alexdryden December 23, 2025 16:20

Copilot finished work on behalf of alexdryden December 23, 2025 16:20

alexdryden reviewed Dec 23, 2025

View reviewed changes

alexdryden requested changes Dec 23, 2025

View reviewed changes

Copilot started work on behalf of alexdryden December 23, 2025 16:42 View session

Copilot AI and others added 10 commits December 23, 2025 16:48

Revert "Improve error message for bioghist fetching"

c1bea75

This reverts commit 63ffe60.

Revert "Improve XML escaping and bioghist formatting"

48320f3

This reverts commit 3c608de.

Revert "Implement bioghist extraction from creator agents"

a54c4ba

This reverts commit e49106b.

Push revert commits

430557f

Add linked_agents to resolve parameter

a50571c

Include linked_agents in the resource API call resolution to fetch agent information along with the resource. This prepares for extracting creator biographical information. Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>

Copilot AI requested a review from alexdryden December 23, 2025 16:57

Copilot finished work on behalf of alexdryden December 23, 2025 16:57

Copilot started work on behalf of alexdryden December 23, 2025 16:58 View session

Copilot AI requested a review from alexdryden January 9, 2026 00:51

Copilot finished work on behalf of alexdryden January 9, 2026 00:51

Copilot started work on behalf of alexdryden January 9, 2026 01:07 View session

Copilot finished work on behalf of alexdryden January 9, 2026 01:10

Copilot started work on behalf of alexdryden January 9, 2026 01:10 View session

Copilot finished work on behalf of alexdryden January 9, 2026 01:12

Copilot started work on behalf of alexdryden January 9, 2026 21:43 View session

Add line breaks between bioghist paragraphs for legibility

8d33aef

Add newline characters between elements within bioghist content and after head/before closing tag to improve readability of the generated XML. Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>

Copilot finished work on behalf of alexdryden January 9, 2026 21:45

Copilot started work on behalf of alexdryden January 26, 2026 17:21 View session

Copilot finished work on behalf of alexdryden January 26, 2026 17:23

Copilot started work on behalf of alexdryden January 26, 2026 18:42 View session

Copilot finished work on behalf of alexdryden January 26, 2026 18:46

Copilot started work on behalf of alexdryden January 26, 2026 18:48 View session

Copilot finished work on behalf of alexdryden January 26, 2026 18:51

Add creator biographical information to EAD XML exports #5

Are you sure you want to change the base?

Add creator biographical information to EAD XML exports #5

Uh oh!

Conversation

Copilot AI commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation Plan for Creator Biographical Information

Summary

Documentation Added

XML Formatting

Bioghist Structure

Error Handling

Uh oh!

alexdryden Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

alexdryden left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Dec 23, 2025

Uh oh!

alexdryden commented Dec 23, 2025

Uh oh!

Copilot AI commented Dec 23, 2025

Uh oh!

alexdryden commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexdryden commented Jan 9, 2026

Uh oh!

Copilot AI commented Jan 9, 2026

Uh oh!

alexdryden commented Jan 9, 2026

Uh oh!

Copilot AI commented Jan 9, 2026

Uh oh!

alexdryden commented Jan 26, 2026

Uh oh!

Copilot AI commented Jan 26, 2026

Uh oh!

alexdryden commented Jan 26, 2026

Uh oh!

Copilot AI commented Jan 26, 2026

Uh oh!

alexdryden commented Jan 26, 2026

Uh oh!

Copilot AI commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Dec 23, 2025 •

edited

Loading

alexdryden commented Jan 9, 2026 •

edited

Loading