Skip to content

Commit a56c34a

Browse files
committed
initial commit
0 parents  commit a56c34a

File tree

11 files changed

+814
-0
lines changed

11 files changed

+814
-0
lines changed

.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
.DS_Store
2+
3+
*.egg-info
4+
5+
__pycache__

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2020 Mercatus Center at George Mason University
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
# RegCensus API
2+
3+
## Introduction
4+
RegCensusAPI is an API client that connects to the RegData regulatory restrictions data by the Mercatus Center at George Mason University. RegData uses machine learning algorithms to quantify the number of regulatory restrictions in a jurisdiction. Currently, RegData is available for three countries - Australia, Canada, and the United States. In addition, there are regulatory restrictions data for jurisdictions (provinces in Canada and states in Australia and US) within these countries. You can find out more about RegData from http://www.quantgov.org.
5+
6+
This Python API client connects to the api located at at the [QuantGov website][1]. More advanced users who want to interact with the API directly can use the link above to pull data from the RegData API. R users can access the same features provided in this package in the R package __regcensusAPI__.
7+
8+
## Installing and Importing __RegCensus__
9+
10+
The RegCensus Python library is pip installable:
11+
12+
```
13+
$ pip install regcensus
14+
```
15+
16+
Once installed, import the library, using the following (using the `rc` alias to more easily use the library):
17+
18+
```
19+
import regcensus as rc
20+
```
21+
22+
## Structure of the API
23+
24+
The API organizes data around __topics__, which are then divided into __series__. Within each series are __values__, which are the ultimate values of interest. Values are available by three sub-groups: agency, industry, and occupation. Presently, there are no series with occupation subgroup. However, these are available for future use. Topics broadly define the data available. For example, RegData for regulatory restrictions is falls under the broad topic "Regulatory Restrictions." Within Regulatory Restrictions topic, there are a number of series available. These include Total Restrictions, Total Wordcount, Total "Shall," etc.
25+
26+
A fundamental concept in RegData is the "document." In RegData, a set of documents represents a body of regulations for which we have produced regulatory restriction counts. For example, to produce data on regulatory restrictions imposed by the US Federal government, RegData uses the Code of Federal Regulations (CFR) as the source documents. Within the CFR, RegData identifies a unit of regulation as the title-part combination. The CFR is organized into 50 titles, and within each title are parts, which could have subparts, but not always. Under the parts are sections. Determining this unit of analyses is critical for the context of the data produced by RegData. Producing regulatory restriction data for US states follows the same strategy but uses the state-specific regulatory code.
27+
28+
In requesting data through the API, you must specify the document type and the indicate a preference for *summary* or *document-level*. By default, RegCensus API returns summarized data for the period of interest. This means that if you do not specify the *summary* preference, you will receive the summarized data for a period. The __list_series_period__ helper function (described below) returns the periods available for each series.
29+
30+
RegCensus API defines a number of periods depending on the series. For example, the total restrictions series of Federal regulations uses two main periods: daily and annual. The daily data produces the number of regulatory restrictions issued on a particular date by the US Federal government. The same data are available on an annual basis.
31+
32+
There are six helper functions to retrieve information about these key components of regdata. These functions provider the following information: topics, documents, jurisdictions, series, agencies, and years with data. The list functions begin with __list__. For example, to view the list of topics call __list_topics__. When an topic id parameter is supplied, the function returns the details about a specific topic.
33+
34+
```
35+
rc.list_topics()
36+
```
37+
38+
Each topic comprises one or more *series*. The __list_series__ function returns the list of all series when no series id is provided.
39+
40+
There are other helper functions that give you a tour around RegData. To see the jurisdictions with data in RegData, call __list_jurisdiction__. This function returns the complete list in a list format.
41+
42+
```
43+
rc.list_jurisdictions(jurisdictionID = 38)
44+
```
45+
46+
The __get_series_period__ function returns a list of all seriesa and the years with data available.
47+
48+
The output from this function can serve as a reference for the valid values that can be passed to parameters in the __get_values__ function. The number of records returned is the unique combination of series and jurisdictions that are available in RegData. The function takes the optional argument jurisdiction id.
49+
50+
```
51+
rc.get_series_period(jurisdictionID = 38)
52+
```
53+
54+
## Metadata
55+
The __get_*__ functions return the details about RegData metadata. These metadata are not included in the __get_values__ functions that will be described later.
56+
57+
### Jurisdictions
58+
59+
Use the __get_jurisdiction__ function to return a data frame with all the jurisdictions. When you supply the jurisdiction ID parameter, the function returns the details of just that jurisdiction. Use the output from the __get_jurisdiction__ function to merge with data from the __get_values__ function.
60+
61+
```
62+
rc.get_jurisdictions()
63+
```
64+
65+
### Agencies
66+
67+
The __get_agencies__ function returns a data frame of all agencies with data in RegData. If an ID is supplied, the data frame returns the details about a single agency specified by the id. The data frame includes characteristics of the agencies. Currently, agency data are only available for federal RegData.
68+
69+
```
70+
rc.get_agencies()
71+
```
72+
73+
Use the value of the agency_id field when pulling values with the __get_values__ function.
74+
75+
### Industries
76+
77+
The __get_industries__ function returns a data frame of industries with data in the API. Presently the only classification system available is the North American Industry Classification System (NAICS). NAICS is used for both countries in North America and Australia, even the latter uses the Australia and New Zealand Standard Industrial Classification (ANZSIC) system. Presently, industry regulations for Australia are based on the NAICS. RegData expands to other countries, the industry codes will be country specific as well as contain mapping to the Standard Industry Codes (SIC) system.
78+
79+
```
80+
rc.get_industries(38)
81+
```
82+
83+
## Values
84+
85+
The __get_values__ function is the primary function for obtaining RegData from the RegCensus API. The function takes the following parameters:
86+
87+
* jurisdiction (required) - value or list of jurisdiction IDs
88+
* series (required) - value or list of series IDs
89+
* date (required) - value or list of years
90+
* agency (optional) - value or list of agencies
91+
* industry (optional) - value of list of agencies
92+
* dateIsRange (optional) - specify if the list of years provided for the parameter years is a range. Default is True.
93+
* filtered (optional) - specify if poorly-performing industry results should be excluded. Default is True.
94+
* summary (optional) - specify if summary results should be returned, instead of document-level results. Default is True.
95+
* country (optional) - specify if all values for a country's jurisdiction ID should be returned. Default is False.
96+
* verbose (optional) - value specifying how much debugging information should be printed for each function call. Higher number specifies more information, default is 0.
97+
98+
In the example below, we are interested in the total number of restrictions and total numbe rof words (get_topics(1)) for the US (get_jurisdictions(38)) for the period 2010 to 2018.
99+
100+
```
101+
rc.get_values(series = [1,2], jurisdiction = 38, date = [2010, 2018])
102+
```
103+
104+
### Values by Subgroup
105+
106+
You can obtain data for any of the three subgroups for each series - agencies, industries, and occupations (when they become available).
107+
108+
#### Values by Agencies
109+
110+
To obtain the restrictions for a specific agency (or agencies), the series id supplied must be in the list of available series by agency. To recap, the list of available series for an agency is available via the __list_series(id,by='agency')__ function, and the list of agencies with data is available via __get_agencies__ function.
111+
112+
```
113+
# Identify all agencies
114+
rc.list_agencies()
115+
116+
# Call the get_values() for this agency and series 91
117+
rc.get_values(series = 91, jurisdiction = 38, date = [1990, 2018], agency = [81, 84])
118+
```
119+
120+
#### Values by Agency and Industry
121+
122+
Some agency series may also have data by industry. For example, under the Total Restrictions topic, RegData includes the industry-relevant restrictions, which estimates the number of restrictions that apply to a given industry. These are available in both the main series - Total Restrictions, and the sub-group Restrictions by Agency.
123+
124+
To pull industry-relevant restrictions for an agency, call __get_agencies__ with the *industry* variable. The industry variable is of type string, and valid values include the industry codes specified in the classification system obtained by calling the __get_industries(jurisdiction)__ function.
125+
126+
In the example below, the series 92 (Restrictions by Agency and Industry), we can request data for the two industries 111 and 33 by the following code snippet.
127+
128+
```
129+
rc.get_values(series = 92, jurisdiction = 38, , time = c(1990,2000), industry = c('111','33'), agency = 66)
130+
```
131+
132+
### Merging with Metadata
133+
134+
To minimize the network bandwidth requirements to use RegCensusAPI, the data returned by __get_values__ functions contain very minimal metadata. Once you pull the values by __get_values__, you can use the Pandas library to include the metadata.
135+
136+
Suppose we want to attach the agency names and other agency characteristics to the data from the last code snippet. First be sure to pull the list of agencies into a separate data frame. Then merge with the values data frame. The key for matching the data will be the *agency_id* column.
137+
138+
We can merge the agency data with the values data as in the code snippet below.
139+
140+
```
141+
agencies = rc.get_agencies()
142+
agency_by_industry = rc.get_values(
143+
series = 92,
144+
jurisdiction = 38,
145+
time = [1990, 2000],
146+
industry = [111, 33],
147+
agency = [66, 111])
148+
agency_restrictions_ind = agency_by_industry.merge(
149+
agencies, by='agency_id')
150+
```
151+
152+
[1]:http://ec2-3-89-6-158.compute-1.amazonaws.com:8080/regdata/swagger-ui.html

build/lib/regcensus/__init__.py

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
__all__ = [
2+
'get_values',
3+
'get_topics',
4+
'get_series',
5+
'get_agencies',
6+
'get_jurisdictions',
7+
'get_periods',
8+
'get_industries',
9+
'get_documents',
10+
'list_topics',
11+
'list_series',
12+
'list_agencies',
13+
'list_jurisdictions',
14+
'list_industries'
15+
]
16+
17+
from . api import (
18+
get_values,
19+
get_topics,
20+
get_series,
21+
get_agencies,
22+
get_jurisdictions,
23+
get_periods,
24+
get_industries,
25+
get_documents,
26+
list_topics,
27+
list_series,
28+
list_agencies,
29+
list_jurisdictions,
30+
list_industries
31+
)

0 commit comments

Comments
 (0)