-
Notifications
You must be signed in to change notification settings - Fork 0
Add creator biographical information to EAD XML exports #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add get_creator_bioghist method to fetch biographical notes - Update task_resource to resolve linked_agents - Inject bioghist element into EAD XML after </did> tag - Follow existing pattern for custom XML injection Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
- Use xml.sax.saxutils.escape for proper XML escaping - Format bioghist notes as separate paragraphs - Remove manual XML escaping in favor of library function Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
- Make error message more specific about the operation that failed Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
arcflow/main.py
Outdated
| 'ead3': 'false', | ||
| }) | ||
|
|
||
| # add record group and subgroup labels to EAD inside <archdesc level="collection"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure to update the comments to reflect the new behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in commit db640ce. The comment now reflects that the section handles multiple types of custom XML elements (record group/subgroup labels and biographical/historical notes).
alexdryden
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before I continue with the review, I'd like for you to start over re-work this with more frequent and verbose commits. So, keep everything the same, but parse out the changes more clearly through more frequent, single purpose commits.
Even if humans committing to this repo have dense commits, I'd like for you to be more explicit. For example, in the first commit to main.py you are making changes to the like locating the closing </did> tag, which is great, but not really part of "Add creator biographical information to EAD XML exports".
This makes it easier to review and helps me understand your specific logic and choices.
This reverts commit 63ffe60.
This reverts commit 3c608de.
This reverts commit e49106b.
Restructure the XML injection code to make it easier to add additional XML elements. Changes: - Move XML decode/encode outside the label-conditional block - Calculate insertion point (after </did> tag) upfront - Use extra_xml variable to accumulate multiple XML elements - Maintains existing recordgroup/subgroup functionality Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
Include linked_agents in the resource API call resolution to fetch agent information along with the resource. This prepares for extracting creator biographical information. Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
Add import for xml_escape function to properly escape XML special characters (ampersands, less-than, greater-than, quotes) when injecting content into EAD XML. Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
Implement method to fetch biographical/historical notes from ArchivesSpace agent records. The method: - Filters agents by 'creator' role - Fetches agent details via API - Extracts note_bioghist content from subnotes - Escapes XML special characters using xml_escape - Wraps each note in <p> tags for proper XML structure - Returns concatenated paragraphs or None if no notes found Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
Call get_creator_bioghist to fetch biographical notes and inject them into the EAD XML as a <bioghist> element. The bioghist element is added after </did> tag, following recordgroup/subgroup labels. Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
Update the comment on the XML injection section to indicate that it now handles multiple types of custom XML elements (record group/ subgroup labels AND biographical/historical notes), not just labels. Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
I've reverted the previous commits and restructured the changes into 6 granular, single-purpose commits:
Each commit now has a clear, focused purpose that's easier to review. |
|
@copilot Great! This commit style suits me much better for this repo. Can you add a .github/copilot-instructions.md file with some basic on boarding instructions that includes this commit style? Consult https://docs.github.com/en/enterprise-cloud@latest/copilot/tutorials/coding-agent/get-the-best-results if you need context for what this file is for. Keep it very simple for now and mainly focus the commit style. Make the file so it is easy and clear for another agent to add more instructions later. |
Create onboarding instructions for GitHub Copilot agents with focus on granular, single-purpose commit style. Includes: - Guidelines for commit granularity - Examples of good vs bad commit sequences - Commit message format recommendations - Instructions for extending the file Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
Created in commit b485f55. The file documents the granular commit style with clear examples and is structured to be easily extended by other agents. |
|
@copilot for the incomplete comment, it looks like a formatting issue with quoting xml syntax in markdown in the comment--regardless, that comment was a note to myself and you can ignore it. For your questions: I'm not an expert in EAD, so check these instructions against generally accepted practices and let me know if anything looks off. |
|
@copilot see my updated comment above--I needed to escape some of the xml tags |
Understood. I'll implement:
EAD Standards Check:
The approach looks compliant with EAD standards. I'll also ensure:
Proceeding with implementation. |
Implement improved bioghist structure: - Create separate <bioghist> element for each creator - Add <head> element with creator name (XML-escaped) - Include id attribute with ArchivesSpace agent ID - Split content on single newlines to create <p> paragraphs - Maintain linked_agents order for consistency with origination Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
|
@copilot For legibility, also keep line breaks in the bioghist narrative. |
Add newline characters between <p> elements within bioghist content and after head/before closing tag to improve readability of the generated XML. Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
|
@copilot here are a few things we need to clean up:
During this development stage, lets be extra careful about making sure we have the data type we assume we have, and try to handle exceptions gracefully while noting in the error logs in bold anything that is violating those assumptions.
|
Fix error when subnote content is a list instead of a string. Some subnotes (e.g., note_citation) have content as a list, which was causing "'list' object has no attribute 'split'" errors. Now properly handles both string and list content types. Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
Fixed in commit fb0dccc. The code now handles subnote content as both string and list types. When content is a list (as with note_citation), each item is converted to a string and wrapped in |
|
@copilot This looks good, but take a look at points 2 and 3 from my previous comment. In case something got chopped off from where I inserted the json example, here is the other details from my above comment: During this development stage, lets be extra careful about making sure we have the data type we assume we have, and try to handle exceptions gracefully while noting in the error logs in bold anything that is violating those assumptions. |
The comment appears to be cut off at "Historical Note from". Could you complete the heading pattern you'd like me to use? Should it be:
I'll proceed with implementing:
|
- Add explicit type checking with prominent error logging for assumption violations - Use persistent_id from notes for id attribute instead of agent_id - Update heading pattern to "Historical Note from [name] Creator Record" - Handle existing bioghist elements by appending agent bioghist after them rather than always inserting after </did> - Log missing persistent_id with fallback to agent_id Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
|
@copilot 1. don't fall back to agent_id, that would just be confusing and there would be no way to know that is an agent_id and not just a really short persistent_id |
- Remove fallback to agent_id when persistent_id is missing
to avoid confusion between different ID types
- Skip id attribute entirely if persistent_id is not available
- Heading pattern already correct: "Historical Note from
{creator_title} Creator Record"
Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
Fixed in commit 716c7db:
|
Implementation Plan for Creator Biographical Information
Summary
Successfully restructured the implementation with granular, single-purpose commits:
Documentation Added
Created
.github/copilot-instructions.mdwith onboarding instructions for GitHub Copilot agents, documenting:XML Formatting
Bioghist Structure
<bioghist>element for each creatoridattribute with note's persistent_id (format:aspace_{persistent_id}) - omitted if persistent_id not available<head>element with pattern "Historical Note from {creator title} Creator Record"<p>tags<p>elements for better readabilitylinked_agentsto match<origination>order<bioghist>elements if they exist in the EAD, rather than replacing themError Handling
idattribute when persistent_id is not available (no fallback to agent_id to avoid ID type confusion)Original prompt
This pull request was created from Copilot chat.
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.