Skip to content

Commit 72b530e

Browse files
author
ehanson8
committed
2 parents 77e52a6 + 1ee6271 commit 72b530e

File tree

2 files changed

+35
-14
lines changed

2 files changed

+35
-14
lines changed

README.md

Lines changed: 34 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -5,43 +5,64 @@ All of these scripts require a secrets.py file in the same directory that must c
55
baseURL='https://dspace.myuni.edu'
66
email='dspace_user@.myuni.edu'
77
password='my_dspace_password'
8-
filePath = '/Users/dspace_user/dspace-data-collection/data/' # directory into which to store output files
9-
handlePrefix = 'http://dspace.myuni.edu/handle/' # handlePrefix may vary from your dspace url (or may not)
8+
filePath = '/Users/dspace_user/dspace-data-collection/data/'
9+
handlePrefix = 'http://dspace.myuni.edu/handle/'
1010
```
11-
This secrets.py file will be ignored according to the repository's .gitignore file so that DSpace login details will not be inadvertently exposed through Github
11+
The 'filePath' is directory into which output files will be written and 'handlePrefix' may or may not vary from your DSpace URL depending on your configuration. This secrets.py file will be ignored according to the repository's .gitignore file so that DSpace login details will not be inadvertently exposed through Github
1212

13-
*Note that all of these scripts skip collection '24' for local reasons. To change this, edit the following portion of the script (typically between line 27-39)
13+
**Note**: All of these scripts skip collection '24' for local reasons. To change this, edit the following portion of the script (typically between line 27-39)
1414

15-
Skips collection 24
1615

17-
for j in range (0, len (collections)):
18-
collectionID = collections[j]['id']
19-
if collectionID != 24:
20-
offset = 0
16+
Skips collection 24:
17+
18+
for j in range (0, len (collections)):
19+
collectionID = collections[j]['id']
20+
if collectionID != 24:
21+
offset = 0
22+
2123
2224
No collections skipped:
2325

24-
for j in range (0, len (collections)):
25-
collectionID = collections[j]['id']
26-
if collectionID != 0:
27-
offset = 0
26+
for j in range (0, len (collections)):
27+
collectionID = collections[j]['id']
28+
if collectionID != 0:
29+
offset = 0
30+
31+
2832
#### [compareTwoKeysInCommunity.py](compareTwoKeysInCommunity.py)
33+
Based on user input, this script extracts the values of two specified keys from a specified community to a CSV file for comparison.
2934

3035
#### [findBogusUris.py](findBogusUris.py)
36+
This script extracts the item ID and the value of the key 'dc.identifier.uri' to a CSV file when the value does not begin with the handlePrefix specified in the secrets.py file.
3137

3238
#### [findDuplicateKeys.py](findDuplicateKeys.py)
39+
Based on user input, this script extracts item IDs to a CSV file where there are multiple instances of the specified key in the item metadata.
3340

3441
#### [getCollectionMetadataJson.py](getCollectionMetadataJson.py)
42+
Based on user input, this script extracts all of the item metadata from the specified collection to a JSON file.
3543

3644
#### [getCompleteAndUniqueValuesForAllKeys.py](getCompleteAndUniqueValuesForAllKeys.py)
45+
This script creates a 'completeValueLists' folder and for all keys used in the repository, extracts all values for a particular key to a CSV with item IDs. It also creates a 'uniqueValueLists' folder, that writes a CSV file for each key with all unique values and a count of how many times the value appears.
3746

3847
#### [getGlobalLanguageValues.py](getGlobalLanguageValues.py)
48+
This script extracts all unique language values used by metadata entries in the repository to a CSV file.
3949

4050
#### [getLanguageValuesForKeys.py](getLanguageValuesForKeys.py)
51+
This script extracts all unique pairs of keys and language values used by metadata entries in the repository to a CSV file.
4152

4253
#### [getRecordsAndValuesForKey.py](getRecordsAndValuesForKey.py)
54+
Based on user input, this script extracts the ID and URI for all items in the repository with the specified key, as well as the value of the specified key, to a CSV file.
4355

4456
#### [getRecordsWithKeyAndValue.py](getRecordsWithKeyAndValue.py)
57+
Based on user input, this script extracts the ID and URI for all items in the repository with the specified key-value pair to a CSV file.
4558

4659
#### [metadataOverview.py](metadataOverview.py)
60+
This script produces several CSV files containing different information about the structure and metadata of the repository:
4761

62+
|File Name |Description|
63+
|--------------------------|--------------------------------------------------------------------------|
64+
|collectionMetadataKeys.csv | A list of all keys used in each collection with collection name, ID, and handle.|
65+
|dspaceIDs.csv | A list of every item ID along with the IDs of the collection and community that contains that item.|
66+
|dspaceTypes.csv | A list of all unique values for the key 'dc.type.'|
67+
|keyCount.csv | A list of all unique keys used in the repository, as well as a count of how many times it appear.|
68+
|collectionStats.csv | A list of all collections in the repository with the collection name, ID, handle, and number of items.|

getCompleteAndUniqueValuesForAllKeys.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
filePath = secrets.filePath
1313

1414
filePathComplete = filePath+'completeValueLists/'
15-
filePathUnique = filePath+'/uniqueValueLists/'
15+
filePathUnique = filePath+'uniqueValueLists/'
1616

1717
startTime = time.time()
1818
data = json.dumps({'email':email,'password':password})

0 commit comments

Comments
 (0)