Skip to content

Commit d90f93b

Browse files
jason-raitzawilfoxanarchivist
authored
AP-616 add better tind search to bulk xml file(#4)
AP-616 - added client.write_search_results_to_file() - added method to iterate through search xml results - added some xml fixtures for a first and last result for a sample search as well as the expected output xml for said search. Co-authored-by: Anna Wilcox <AWilcox@Wilcox-Tech.com> Co-authored-by: maría a. matienzo <73732+anarchivist@users.noreply.github.com>
1 parent 144d4aa commit d90f93b

8 files changed

Lines changed: 585 additions & 30 deletions

File tree

CHANGELOG.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,30 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [Unreleased]
9+
10+
### Added
11+
- client method to write search results to an XML file, with validation against expected number of records to be written
12+
- client method to return an iterator of XML records from a search, to support streaming results for large result sets
13+
- xml fixture files for testing
14+
- tests for the new client methods, including edge cases for validation
15+
16+
### Changed
17+
- README examples
18+
19+
### Deprecated
20+
- N/A
21+
22+
### Removed
23+
- N/A
24+
25+
### Fixed
26+
- N/A
27+
28+
### Security
29+
- N/A
30+
31+
832
## [0.1.1]
933

1034
### Added

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,8 +75,12 @@ ids = client.fetch_ids_search("collection:'Disabled Students Program Photos'")
7575
records = client.fetch_search_metadata("collection:'Disabled Students Program Photos'")
7676

7777
# return raw XML or PyMARC records from a paginated search
78+
# NOTE: for large result sets, use the write_search_results_to_file() method and then parse that file
7879
xml_results = client.search("collection:'Disabled Students Program Photos'", result_format="xml")
7980
pymarc_results = client.search("collection:'Disabled Students Program Photos'", result_format="pymarc")
81+
82+
# search Tind with a query and write results to an XML file in the default storage directory
83+
records_written = client.write_search_results_to_file("Old Emperor Norton", "full_norton_results.xml")
8084
```
8185

8286
## Running tests

pyproject.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,3 +61,5 @@ allow-init-docstring = true
6161
skip-checking-raises = true
6262
style = "sphinx"
6363

64+
[tool.pylint.format]
65+
max-line-length = 100
Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
<response>
2+
<total>3</total>
3+
<search_id>FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFm04dVFEUk1JUjVpaHgzV2VIdTBOclEAAAAAAbR0aRZZQVkyRloyLVJtRzVJT1ZZTmdmMFpn</search_id>
4+
<collection xmlns="http://www.loc.gov/MARC21/slim">
5+
<record>
6+
<controlfield tag="001">27320</controlfield>
7+
<controlfield tag="005">20250429135007.0</controlfield>
8+
<datafield tag="024" ind1="8" ind2="0">
9+
<subfield code="a">BANC PIC 1996.003:Volume 24:42a--fALB</subfield>
10+
</datafield>
11+
<datafield tag="035" ind1=" " ind2=" ">
12+
<subfield code="a">calher_cubanc_25_132_00180699</subfield>
13+
</datafield>
14+
<datafield tag="245" ind1=" " ind2=" ">
15+
<subfield code="a">Old Emperor Norton in 1876</subfield>
16+
</datafield>
17+
<datafield tag="336" ind1=" " ind2=" ">
18+
<subfield code="a">Image</subfield>
19+
</datafield>
20+
<datafield tag="540" ind1=" " ind2=" ">
21+
<subfield code="a">Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).</subfield>
22+
</datafield>
23+
<datafield tag="852" ind1=" " ind2=" ">
24+
<subfield code="c">The Bancroft Library</subfield>
25+
</datafield>
26+
<datafield tag="856" ind1="4" ind2="1">
27+
<subfield code="u">http://www.oac.cdlib.org/findaid/ark:/13030/tf129005j4</subfield>
28+
<subfield code="y">View collection guide</subfield>
29+
</datafield>
30+
<datafield tag="856" ind1="4" ind2=" ">
31+
<subfield code="9">c6cf7b70-4acd-466f-ab1a-3ca53b4a2789</subfield>
32+
<subfield code="s">196176</subfield>
33+
<subfield code="u">https://digicoll.lib.berkeley.edu/record/27320/files/I0051200A.jpg</subfield>
34+
</datafield>
35+
<datafield tag="901" ind1=" " ind2=" ">
36+
<subfield code="a">ark:/13030/tf9m3nb8s8</subfield>
37+
<subfield code="f">ark:/13030/tf129005j4</subfield>
38+
<subfield code="g">25:132</subfield>
39+
</datafield>
40+
<datafield tag="902" ind1=" " ind2=" ">
41+
<subfield code="f">cubanc_25_132_00180699.xml</subfield>
42+
</datafield>
43+
<datafield tag="909" ind1="C" ind2="O">
44+
<subfield code="o">oai:digicoll.lib.berkeley.edu:27320</subfield>
45+
<subfield code="p">sfg</subfield>
46+
<subfield code="p">calher:cookscrapbook</subfield>
47+
<subfield code="q">mcleanCalisphere_oai</subfield>
48+
</datafield>
49+
<datafield tag="951" ind1=" " ind2=" ">
50+
<subfield code="a">ark:/13030/m5s5429m</subfield>
51+
<subfield code="b">Merritt</subfield>
52+
</datafield>
53+
<datafield tag="980" ind1=" " ind2=" ">
54+
<subfield code="a">CalHer: Cook Scrapbook</subfield>
55+
</datafield>
56+
<datafield tag="982" ind1=" " ind2=" ">
57+
<subfield code="a">Jesse Brown Cook Scrapbooks</subfield>
58+
<subfield code="b">Jesse Brown Cook Scrapbooks Documenting San Francisco History and Law Enforcement</subfield>
59+
</datafield>
60+
</record>
61+
<record>
62+
<controlfield tag="001">28819</controlfield>
63+
<controlfield tag="005">20250429135229.0</controlfield>
64+
<datafield tag="024" ind1="8" ind2="0">
65+
<subfield code="a">BANC PIC 1996.003:Volume 24:41b--fALB</subfield>
66+
</datafield>
67+
<datafield tag="035" ind1=" " ind2=" ">
68+
<subfield code="a">calher_cubanc_25_132_00180698</subfield>
69+
</datafield>
70+
<datafield tag="245" ind1=" " ind2=" ">
71+
<subfield code="a">Old Emperor Norton in 1876</subfield>
72+
</datafield>
73+
<datafield tag="336" ind1=" " ind2=" ">
74+
<subfield code="a">Image</subfield>
75+
</datafield>
76+
<datafield tag="540" ind1=" " ind2=" ">
77+
<subfield code="a">Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).</subfield>
78+
</datafield>
79+
<datafield tag="852" ind1=" " ind2=" ">
80+
<subfield code="c">The Bancroft Library</subfield>
81+
</datafield>
82+
<datafield tag="856" ind1="4" ind2="1">
83+
<subfield code="u">http://www.oac.cdlib.org/findaid/ark:/13030/tf129005j4</subfield>
84+
<subfield code="y">View collection guide</subfield>
85+
</datafield>
86+
<datafield tag="856" ind1="4" ind2=" ">
87+
<subfield code="9">5e1f7508-117f-4120-9acf-88f09c2c20d8</subfield>
88+
<subfield code="s">170081</subfield>
89+
<subfield code="u">https://digicoll.lib.berkeley.edu/record/28819/files/I0051199A.jpg</subfield>
90+
</datafield>
91+
<datafield tag="901" ind1=" " ind2=" ">
92+
<subfield code="a">ark:/13030/tf496nb4j6</subfield>
93+
<subfield code="f">ark:/13030/tf129005j4</subfield>
94+
<subfield code="g">25:132</subfield>
95+
</datafield>
96+
<datafield tag="902" ind1=" " ind2=" ">
97+
<subfield code="f">cubanc_25_132_00180698.xml</subfield>
98+
</datafield>
99+
<datafield tag="909" ind1="C" ind2="O">
100+
<subfield code="o">oai:digicoll.lib.berkeley.edu:28819</subfield>
101+
<subfield code="p">sfg</subfield>
102+
<subfield code="p">calher:cookscrapbook</subfield>
103+
<subfield code="q">mcleanCalisphere_oai</subfield>
104+
</datafield>
105+
<datafield tag="951" ind1=" " ind2=" ">
106+
<subfield code="a">ark:/13030/m57159jp</subfield>
107+
<subfield code="b">Merritt</subfield>
108+
</datafield>
109+
<datafield tag="980" ind1=" " ind2=" ">
110+
<subfield code="a">CalHer: Cook Scrapbook</subfield>
111+
</datafield>
112+
<datafield tag="982" ind1=" " ind2=" ">
113+
<subfield code="a">Jesse Brown Cook Scrapbooks</subfield>
114+
<subfield code="b">Jesse Brown Cook Scrapbooks Documenting San Francisco History and Law Enforcement</subfield>
115+
</datafield>
116+
</record>
117+
<record>
118+
<controlfield tag="001">29563</controlfield>
119+
<controlfield tag="005">20250429135339.0</controlfield>
120+
<datafield tag="024" ind1="8" ind2="0">
121+
<subfield code="a">BANC PIC 1996.003:Volume 24:41a--fALB</subfield>
122+
</datafield>
123+
<datafield tag="035" ind1=" " ind2=" ">
124+
<subfield code="a">calher_cubanc_25_132_00180697</subfield>
125+
</datafield>
126+
<datafield tag="245" ind1=" " ind2=" ">
127+
<subfield code="a">Old Emperor Norton in 1876</subfield>
128+
</datafield>
129+
<datafield tag="336" ind1=" " ind2=" ">
130+
<subfield code="a">Image</subfield>
131+
</datafield>
132+
<datafield tag="540" ind1=" " ind2=" ">
133+
<subfield code="a">Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).</subfield>
134+
</datafield>
135+
<datafield tag="852" ind1=" " ind2=" ">
136+
<subfield code="c">The Bancroft Library</subfield>
137+
</datafield>
138+
<datafield tag="856" ind1="4" ind2="1">
139+
<subfield code="u">http://www.oac.cdlib.org/findaid/ark:/13030/tf129005j4</subfield>
140+
<subfield code="y">View collection guide</subfield>
141+
</datafield>
142+
<datafield tag="856" ind1="4" ind2=" ">
143+
<subfield code="9">7f7df611-bbe6-4855-8725-2fbd8f9e3d90</subfield>
144+
<subfield code="s">199139</subfield>
145+
<subfield code="u">https://digicoll.lib.berkeley.edu/record/29563/files/I0051198A.jpg</subfield>
146+
</datafield>
147+
<datafield tag="901" ind1=" " ind2=" ">
148+
<subfield code="a">ark:/13030/tf7g5010k5</subfield>
149+
<subfield code="f">ark:/13030/tf129005j4</subfield>
150+
<subfield code="g">25:132</subfield>
151+
</datafield>
152+
<datafield tag="902" ind1=" " ind2=" ">
153+
<subfield code="f">cubanc_25_132_00180697.xml</subfield>
154+
</datafield>
155+
<datafield tag="909" ind1="C" ind2="O">
156+
<subfield code="o">oai:digicoll.lib.berkeley.edu:29563</subfield>
157+
<subfield code="p">sfg</subfield>
158+
<subfield code="p">calher:cookscrapbook</subfield>
159+
<subfield code="q">mcleanCalisphere_oai</subfield>
160+
</datafield>
161+
<datafield tag="951" ind1=" " ind2=" ">
162+
<subfield code="a">ark:/13030/m5bp7bwt</subfield>
163+
<subfield code="b">Merritt</subfield>
164+
</datafield>
165+
<datafield tag="980" ind1=" " ind2=" ">
166+
<subfield code="a">CalHer: Cook Scrapbook</subfield>
167+
</datafield>
168+
<datafield tag="982" ind1=" " ind2=" ">
169+
<subfield code="a">Jesse Brown Cook Scrapbooks</subfield>
170+
<subfield code="b">Jesse Brown Cook Scrapbooks Documenting San Francisco History and Law Enforcement</subfield>
171+
</datafield>
172+
</record>
173+
</collection>
174+
</response>
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
<response>
2+
<total>3</total>
3+
<search_id>FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFm04dVFEUk1JUjVpaHgzV2VIdTBOclEAAAAAAbR0aRZZQVkyRloyLVJtRzVJT1ZZTmdmMFpn</search_id>
4+
<collection xmlns="http://www.loc.gov/MARC21/slim"/>
5+
</response>

0 commit comments

Comments
 (0)