-
Notifications
You must be signed in to change notification settings - Fork 52
Add samples #173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add samples #173
Conversation
|
@ldecicco-USGS tagging you in case you have some time to take a look and offer feedback on this function. I used the |
jzemmels
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes look good to me. The actual function code looks boilerplate (in a good way). I've added some comments/suggestions to the documentation that you might consider addressing/implementing.
dataretrieval/samples.py
Outdated
| The start date if using a date range. Takes the format YYYY-MM-DD. | ||
| The logic is inclusive, i.e. it will also return results that | ||
| match the date. If left as None, will pull all data on or before | ||
| activityStartDateUpper, of populated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand this sentence -- "of populated" what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! "if populated"!
dataretrieval/samples.py
Outdated
| Example: "Sample-Routine, regular". | ||
| characteristicGroup : string or list of strings, optional | ||
| Characteristic group is a broad category describing one or more | ||
| of results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo?
| of results. | |
| results. |
dataretrieval/samples.py
Outdated
| Characteristic group is a broad category describing one or more | ||
| of results. | ||
| Example: "Organics, PFAS" | ||
| characteristc : string or list of strings, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| characteristc : string or list of strings, optional | |
| characteristic : string or list of strings, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really was on a roll with these typos last week. Thanks for catching.
dataretrieval/samples.py
Outdated
| pointLocationLatitude and pointLocationLongitude | ||
| projectIdentifier : string or list of strings, optional | ||
| Designator used to uniquely id a data collection project in | ||
| organization context. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is an "organization context?" Is there a good example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this was lifted directly from the Samples API descriptions or dataRetrieval. I'm going to rewrite and let me see if I can add an example.
| "actgroup", "count" | ||
| projects - "project", "projectmonitoringlocationweight" | ||
| organizations - "organization", "count" | ||
| activityMediaName : string or list of strings, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kinda crazy to me that the only doc pages on this type of parameter are from dataRetrieval or related packages. The WQP documentation just says something to the effect of "look at the output for more information." Why include it as a query-able parameter if the parameter's possible values aren't communicated to the user? This isn't something to solve in this PR, or even in the package overall, but I feel like this is something seriously lacking in the API documentation we expect users to read.
I found the "sample media" endpoint through the codeservice link you provided that looks to provide the actual values for these more obscure parameters. But pinging one endpoint to learn about the parameter values of a different endpoint seems overly complicated to me. Non-rhetorically: are these parameters so volatile that we couldn't provide static documentation somewhere that describes the values they can take?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you make great points. I believe @ldecicco-USGS has incorporated some of these types of lookup tables to prevent users from passing invalid options to the API, and there are some basic lookup tables in dataRetrieval too. I don't see it as too much trouble to have a set of functions called "[input name]_lookup" that a user can run to return all viable options for Samples. Doubly useful is Laura's approach of checking a user's query before it is made. I will add this as an issue (if it isn't there already) and we can tackle this as a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup! There's actually a web service to check available parameters:
https://github.com/DOI-USGS/dataRetrieval/blob/main/R/read_USGS_samples.R#L334
The main bit:
service_options <- c("characteristicgroup", "states", "counties",
"countries", "sitetype", "samplemedia",
"characteristics", "observedproperty")
check_group_req <- httr2::request("https://api.waterdata.usgs.gov") |>
httr2::req_url_path_append("samples-data",
"codeservice",
service)For instance:
So, https://api.waterdata.usgs.gov/samples-data/codeservice/samplemedia shows all the available sample media options. It's probably not described in the main documentation because (I think) it's dynamically populated from the values in the data base. So, if a data provider suddenly adds "Jello", the full documentation doesn't need a manual update. I think there's a codeservice Swagger somewhere.....
Here it is:
https://api.waterdata.usgs.gov/samples-data/codeservice/docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Laura, I agree: while some of the options may not change very often, I still think it is a smart move to make the options a service rather than a "hard-coded" list....data managers are adding new PFAS compounds every day, for instance.
This PR links to that codeservice in the samples function documentation, but I do think Joe's point is valid that we can be nicer to the user in python and avoid sending them to the Swagger doc to find all available options. I bet it's a similar number of lines of code in python!
Co-authored-by: Joe Zemmels (he/him) <jzemmels@gmail.com>
45f54fa to
c9cbe67
Compare
|
I made some suggestions in the following PR: ehinman#3 |
Co-authored-by: Elise Hinman <121896266+ehinman@users.noreply.github.com>
Refactor samples
This PR adds a function, documentation, testing, etc. for pulling USGS water quality data from the Samples database.
Additional features needed:
Additional features that would be helpful: