Skip to content

Commit f8d8840

Browse files
authored
Merge pull request #11 from QuantGov/dev
Version 0.2
2 parents 8fc27a8 + 5536917 commit f8d8840

File tree

6 files changed

+310
-34
lines changed

6 files changed

+310
-34
lines changed

.gitignore

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,8 @@
11
.DS_Store
2-
32
*.egg-info
4-
5-
__pycache__
6-
3+
*__pycache__
74
build/
8-
9-
dist/
5+
dist/
6+
.coverage
7+
*tox*
8+
.python-version

README.md

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
_The current version of RegCensusAPI is only compatible with Python 3.6 and newer._
2+
13
# RegCensus API
24

35
## Introduction
@@ -13,7 +15,7 @@ The RegCensus Python library is pip installable:
1315
$ pip install regcensus
1416
```
1517

16-
Once installed, import the library, using the following (using the `rc` alias to more easily use the library):
18+
Once installed, import the library, using the following (use the `rc` alias to more easily use the library):
1719

1820
```
1921
import regcensus as rc
@@ -100,6 +102,8 @@ The __get_values__ function is the primary function for obtaining RegData from t
100102
* filtered (optional) - specify if poorly-performing industry results should be excluded. Default is True.
101103
* summary (optional) - specify if summary results should be returned, instead of document-level results. Default is True.
102104
* country (optional) - specify if all values for a country's jurisdiction ID should be returned. Default is False.
105+
* industryType (optional): level of NAICS industries to include. Default is '3-Digit'.
106+
* download (optional): if not False, a path location for a downloaded csv of the results.
103107
* verbose (optional) - value specifying how much debugging information should be printed for each function call. Higher number specifies more information, default is 0.
104108

105109
In the example below, we are interested in the total number of restrictions and total number of words for the US (get_jurisdictions(38)) for the period 2010 to 2019.
@@ -108,6 +112,14 @@ In the example below, we are interested in the total number of restrictions and
108112
rc.get_values(series = [1,2], jurisdiction = 38, date = [2010, 2019])
109113
```
110114

115+
### Get all Values for a Country
116+
117+
The `country` argument can be used to get all values for one or multiple series for a specific national jurisdiction. The following line will get you a summary of the national and state-level restriction counts for the United States from 2016 to 2019:
118+
119+
```
120+
rc.get_values(series = 1, jurisdiction = 38, date = [2016, 2019], country=True)
121+
```
122+
111123
### Values by Subgroup
112124

113125
You can obtain data for any of the three subgroups for each series - agencies, industries, and occupations (when they become available).
@@ -168,5 +180,18 @@ agency_restrictions_ind = agency_by_industry.merge(
168180
agencies, by='agency_id')
169181
```
170182

183+
## Downloading Data
184+
185+
There are two different ways to download data retrieved from RegCensusAPI:
186+
187+
1. Use the pandas `df.to_csv(outpath)` function, which allows the user to download a csv of the data, with the given outpath. See the pandas [documentation][3] for more features.
188+
189+
2. As of version 0.2.0, the __get_values__ function includes a `download` argument, which allows the user to simply download a csv of the data in the same line as the API call. See below for an example of this call.
190+
191+
```
192+
rc.get_values(series = [1,2], jurisdiction = 38, date = [2010, 2019], download='regdata2010to2019.csv')
193+
```
194+
171195
[1]:https://api.quantgov.org/swagger-ui.html
172196
[2]:https://www.quantgov.org/download-interactively
197+
[3]:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html

regcensus/api.py

Lines changed: 47 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,8 @@
1212

1313
def get_values(series, jurisdiction, date, filtered=True, summary=True,
1414
documentType=3, agency=None, industry=None, dateIsRange=True,
15-
country=False, industryType='3-Digit', verbose=0):
15+
country=False, industryType='3-Digit',
16+
download=False, verbose=0):
1617
"""
1718
Get values for a specific jurisdition and series
1819
@@ -23,12 +24,17 @@ def get_values(series, jurisdiction, date, filtered=True, summary=True,
2324
summary (optional): Return summary instead of document level data
2425
filtered (optional): Exclude poorly-performing industry results
2526
documentType (optional): ID for type of document
26-
agency (optional): Agency ID
27+
agency (optional): Agency ID (use 'all' for all agencies,
28+
only works for a single jurisdiction)
2729
industry (optional): Industry code using the jurisdiction-specific
2830
coding system (use 'all' for all industries)
2931
dateIsRange (optional): Indicating whether the time parameter is range
3032
or should be treated as single data points
3133
country (optional): Get all values for country ID
34+
industryType (optional): Level of NAICS industries to include,
35+
default is '3-Digit'
36+
download (optional): If not False, a path location for a
37+
downloaded csv of the results
3238
verbose (optional): Print out the url of the API call
3339
3440
Returns: pandas dataframe with the values and various metadata
@@ -60,6 +66,9 @@ def get_values(series, jurisdiction, date, filtered=True, summary=True,
6066
pp.pprint(list_jurisdictions())
6167
return
6268

69+
# Allows for all agency data to be returned
70+
if str(agency).lower() == 'all':
71+
agency = list(list_agencies(jurisdiction).values())
6372
# If multiple agencies are given, parses the list into a string
6473
if type(agency) == list:
6574
url_call += f'&agency={",".join(str(i) for i in agency)}'
@@ -127,10 +136,16 @@ def get_values(series, jurisdiction, date, filtered=True, summary=True,
127136
print(f'API call: {url_call}')
128137

129138
# Puts flattened JSON output into a pandas DataFrame
130-
output = pd.io.json.json_normalize(requests.get(url_call).json())
139+
output = json_normalize(requests.get(url_call).json())
131140
# Prints error message if call fails
132141
if (output.columns[:3] == ['title', 'status', 'detail']).all():
133142
print('WARNING:', output.iloc[0][-1])
143+
return
144+
elif download:
145+
if type(download) == str:
146+
clean_columns(output).to_csv(download, index=False)
147+
else:
148+
print("Valid outpath required to download.")
134149
# Returns clean data if no error
135150
else:
136151
return clean_columns(output)
@@ -144,21 +159,23 @@ def get_series(seriesID=''):
144159
145160
Returns: pandas dataframe with the metadata
146161
"""
147-
output = pd.io.json.json_normalize(
162+
output = json_normalize(
148163
requests.get(URL + f'/series/{seriesID}').json())
149164
return clean_columns(output)
150165

151166

152-
def get_agencies(agencyID=''):
167+
def get_agencies(jurisdictionID):
153168
"""
154-
Get metadata for all or one specific agency
169+
Get metadata for all agencies of a specific jurisdiction
155170
156-
Args: agencyID (optional): ID for the agency
171+
Args: jurisdictionID: ID for the jurisdiction
157172
158173
Returns: pandas dataframe with the metadata
159174
"""
160-
output = pd.io.json.json_normalize(
161-
requests.get(URL + f'/agencies/{agencyID}').json())
175+
output = json_normalize(
176+
requests.get(
177+
URL + (f'/agencies/jurisdiction?'
178+
f'jurisdictions={jurisdictionID}')).json())
162179
return clean_columns(output)
163180

164181

@@ -170,7 +187,7 @@ def get_jurisdictions(jurisdictionID=''):
170187
171188
Returns: pandas dataframe with the metadata
172189
"""
173-
output = pd.io.json.json_normalize(
190+
output = json_normalize(
174191
requests.get(URL + f'/jurisdictions/{jurisdictionID}').json())
175192
return clean_columns(output)
176193

@@ -185,12 +202,12 @@ def get_periods(jurisdictionID='', documentType=3):
185202
Returns: pandas dataframe with the dates
186203
"""
187204
if jurisdictionID:
188-
output = pd.io.json.json_normalize(
205+
output = json_normalize(
189206
requests.get(
190207
URL + (f'/periods?jurisdiction={jurisdictionID}&'
191208
f'documentType={documentType}')).json())
192209
else:
193-
output = pd.io.json.json_normalize(
210+
output = json_normalize(
194211
requests.get(URL + f'/periods/available').json())
195212
return clean_columns(output)
196213

@@ -203,9 +220,9 @@ def get_industries(jurisdictionID):
203220
204221
Returns: pandas dataframe with the metadata
205222
"""
206-
output = pd.io.json.json_normalize(
207-
requests.get(
208-
URL + f'/industries?jurisdiction={jurisdictionID}').json())
223+
output = json_normalize(
224+
requests.get(
225+
URL + f'/industries?jurisdiction={jurisdictionID}').json())
209226
return clean_columns(output)
210227

211228

@@ -220,11 +237,11 @@ def get_documents(jurisdictionID, documentType=3):
220237
221238
Returns: pandas dataframe with the metadata
222239
"""
223-
output = pd.io.json.json_normalize(
240+
output = json_normalize(
224241
requests.get(
225242
URL + (f'/documents?jurisdiction={jurisdictionID}&'
226243
f'documentType={documentType}')
227-
).json())
244+
).json())
228245
return clean_columns(output)
229246

230247

@@ -246,11 +263,14 @@ def list_series():
246263
return dict(sorted({s["seriesName"]: s["seriesID"] for s in json}.items()))
247264

248265

249-
def list_agencies():
266+
def list_agencies(jurisdictionID):
250267
"""
268+
Args: jurisdictionID: ID for the jurisdiction
269+
251270
Returns: dictionary containing names of agencies and associated IDs
252271
"""
253-
json = requests.get(URL + '/agencies').json()
272+
json = requests.get(
273+
URL + f'/agencies/jurisdiction?jurisdictions={jurisdictionID}').json()
254274
return dict(sorted({
255275
a["agencyName"]: a["agencyID"]
256276
for a in json if a["agencyName"]}.items()))
@@ -281,3 +301,11 @@ def clean_columns(df):
281301
"""Removes JSON prefixes from column names"""
282302
df.columns = [c.split('.')[-1] for c in df.columns]
283303
return df
304+
305+
306+
def json_normalize(output):
307+
"""Backwards compatability for old versions of pandas"""
308+
try:
309+
return pd.json_normalize(output)
310+
except AttributeError:
311+
return pd.io.json.json_normalize(output)

setup.cfg

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
[tool:pytest]
2+
addopts = --flake8 --cov
3+
flake8-ignore =
4+
*.py F541 W503 W504
5+
tests/* F401

setup.py

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,14 @@
33

44

55
setup(
6-
name='regcensus',
7-
version='0.1.4',
8-
description='Python package for accessing data from the QuantGov API',
9-
url='https://github.com/QuantGov/regcensus-api-python',
10-
author='QuantGov',
11-
author_email='quantgov.info@gmail.com',
12-
packages=setuptools.find_packages(),
13-
classifiers=[
6+
name='regcensus',
7+
version='0.2.0',
8+
description='Python package for accessing data from the QuantGov API',
9+
url='https://github.com/QuantGov/regcensus-api-python',
10+
author='QuantGov',
11+
author_email='quantgov.info@gmail.com',
12+
packages=setuptools.find_packages(),
13+
classifiers=[
1414
"Programming Language :: Python :: 3",
1515
"License :: OSI Approved :: MIT License",
1616
"Operating System :: OS Independent",

0 commit comments

Comments
 (0)