Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
**12/04/2025:** The `get_continuous()` function was added to the `waterdata` module, which provides access to measurements collected via automated sensors at a high frequency (often 15 minute intervals) at a monitoring location. This is an early version of the continuous endpoint and should be used with caution as the API team improves its performance. In the future, we anticipate the addition of an endpoint(s) specifically for handling large data requests, so it may make sense for power users to hold off on heavy development using the new continuous endpoint.

**11/24/2025:** `dataretrieval` is pleased to offer a new module, `waterdata`, which gives users access USGS's modernized [Water Data APIs](https://api.waterdata.usgs.gov/). The Water Data API endpoints include daily values, instantaneous values, field measurements (modernized groundwater levels service), time series metadata, and discrete water quality data from the Samples database. Though there will be a period of overlap, the functions within `waterdata` will eventually replace the `nwis` module, which currently provides access to the legacy [NWIS Water Services](https://waterservices.usgs.gov/). More example workflows and functions coming soon. Check `help(waterdata)` for more information.

**09/03/2024:** The groundwater levels service has switched endpoints, and `dataretrieval` was updated accordingly in [`v1.0.10`](https://github.com/DOI-USGS/dataretrieval-python/releases/tag/v1.0.10). Older versions using the discontinued endpoint will return 503 errors for `nwis.get_gwlevels` or the `service='gwlevels'` argument. Visit [Water Data For the Nation](https://waterdata.usgs.gov/blog/wdfn-waterservices-2024/) for more information.
Expand Down
40 changes: 26 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,16 @@

## Latest Announcements

:mega: **11/24/2025:** `dataretrieval` now features the new `waterdata` module,
:mega: **12/04/2025:** `dataretrieval` now features the new `waterdata` module,
which provides access to USGS's modernized [Water Data
APIs](https://api.waterdata.usgs.gov/). The Water Data API endpoints include
daily values, instantaneous values, field measurements, time series metadata,
daily values, **instantaneous values**, field measurements, time series metadata,
and discrete water quality data from the Samples database. This new module will
eventually replace the `nwis` module, which provides access to the legacy [NWIS
Water Services](https://waterservices.usgs.gov/).

Check out the [NEWS](NEWS.md) file for all updates and announcements.

**Important:** Users of the Water Data APIs are strongly encouraged to obtain an
API key for higher rate limits and greater access to USGS data. [Register for
an API key](https://api.waterdata.usgs.gov/signup/) and set it as an
Expand All @@ -24,8 +26,6 @@ import os
os.environ["API_USGS_PAT"] = "your_api_key_here"
```

Check out the [NEWS](NEWS.md) file for all updates and announcements.

## What is dataretrieval?

`dataretrieval` simplifies the process of loading hydrologic data into Python.
Expand Down Expand Up @@ -61,9 +61,9 @@ pip install git+https://github.com/DOI-USGS/dataretrieval-python.git

The `waterdata` module provides access to modern USGS Water Data APIs.

The example below retrieves daily streamflow data for a specific monitoring
location for water year 2025, where a "/" between two dates in the "time"
input argument indicates a desired date range:
Some basic usage examples include retrieving daily streamflow data for a
specific monitoring location, where the `/` in the `time` argument indicates
the desired range:

```python
from dataretrieval import waterdata
Expand All @@ -79,8 +79,7 @@ print(f"Retrieved {len(df)} records")
print(f"Site: {df['monitoring_location_id'].iloc[0]}")
print(f"Mean discharge: {df['value'].mean():.2f} {df['unit_of_measure'].iloc[0]}")
```
Fetch daily discharge data for multiple sites from a start date to present
using the following code:
Retrieving streamflow at multiple locations from October 1, 2024 to the present:

```python
df, metadata = waterdata.get_daily(
Expand All @@ -91,18 +90,31 @@ df, metadata = waterdata.get_daily(

print(f"Retrieved {len(df)} records")
```
The following example downloads location information for all monitoring
locations that are categorized as stream sites in the state of Maryland:
Retrieving location information for all monitoring locations categorized as
stream sites in the state of Maryland:

```python
# Get monitoring location information
locations, metadata = waterdata.get_monitoring_locations(
df, metadata = waterdata.get_monitoring_locations(
state_name='Maryland',
site_type_code='ST' # Stream sites
)

print(f"Found {len(locations)} stream monitoring locations in Maryland")
print(f"Found {len(df)} stream monitoring locations in Maryland")
```
Finally, retrieving continuous (a.k.a. "instantaneous") data
for one location. We *strongly advise* breaking up continuous data requests into smaller time periods and collections to avoid timeouts and other issues:

```python
# Get continuous data for a single monitoring location and water year
df, metadata = waterdata.get_continuous(
monitoring_location_id='USGS-01646500',
parameter_code='00065', # Gage height
time='2024-10-01/2025-09-30'
)
print(f"Retrieved {len(df)} continuous gage height measurements")
```

Visit the
[API Reference](https://doi-usgs.github.io/dataretrieval-python/reference/waterdata.html)
for more information and examples on available services and input parameters.
Expand Down Expand Up @@ -202,13 +214,13 @@ print(f"Found {len(flowlines)} upstream tributaries within 50km")

### Modern USGS Water Data APIs (Recommended)
- **Daily values**: Daily statistical summaries (mean, min, max)
- **Instantaneous values**: High-frequency continuous data
- **Field measurements**: Discrete measurements from field visits
- **Monitoring locations**: Site information and metadata
- **Time series metadata**: Information about available data parameters
- **Latest daily values**: Most recent daily statistical summary data
- **Latest instantaneous values**: Most recent high-frequency continuous data
- **Samples data**: Discrete USGS water quality data
- **Instantaneous values** (*COMING SOON*): High-frequency continuous data

### Legacy NWIS Services (Deprecated)
- **Daily values (dv)**: Legacy daily statistical data
Expand Down
2 changes: 2 additions & 0 deletions dataretrieval/waterdata/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
from .api import (
_check_profiles,
get_codes,
get_continuous,
get_daily,
get_field_measurements,
get_latest_continuous,
Expand All @@ -30,6 +31,7 @@

__all__ = [
"get_codes",
"get_continuous",
"get_daily",
"get_field_measurements",
"get_latest_continuous",
Expand Down
165 changes: 165 additions & 0 deletions dataretrieval/waterdata/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,171 @@ def get_daily(

return get_ogc_data(args, output_id, service)

def get_continuous(
monitoring_location_id: Optional[Union[str, List[str]]] = None,
parameter_code: Optional[Union[str, List[str]]] = None,
statistic_id: Optional[Union[str, List[str]]] = None,
properties: Optional[List[str]] = None,
time_series_id: Optional[Union[str, List[str]]] = None,
continuous_id: Optional[Union[str, List[str]]] = None,
approval_status: Optional[Union[str, List[str]]] = None,
unit_of_measure: Optional[Union[str, List[str]]] = None,
qualifier: Optional[Union[str, List[str]]] = None,
value: Optional[Union[str, List[str]]] = None,
last_modified: Optional[str] = None,
time: Optional[Union[str, List[str]]] = None,
limit: Optional[int] = None,
convert_type: bool = True,
) -> Tuple[pd.DataFrame, BaseMetadata]:
"""
Continuous data provide instantanous water conditions.

This is an early version of the continuous endpoint that is feature-complete
and is being made available for limited use. Geometries are not included
with the continuous endpoint. If the "time" input is left blank, the service
will return the most recent year of measurements. Users may request no more
than three years of data with each function call.

Continuous data are collected at a high frequency, typically 15-minute
intervals. Depending on the specific monitoring location, the data may be
transmitted automatically via telemetry and be available on WDFN within
minutes of collection, while other times the delivery of data may be delayed
if the monitoring location does not have the capacity to automatically
transmit data. Continuous data are described by parameter name and
parameter code (pcode). These data might also be referred to as
"instantaneous values" or "IV".

Parameters
----------
monitoring_location_id : string or list of strings, optional
A unique identifier representing a single monitoring location. This
corresponds to the id field in the monitoring-locations endpoint.
Monitoring location IDs are created by combining the agency code of
the agency responsible for the monitoring location (e.g. USGS) with
the ID number of the monitoring location (e.g. 02238500), separated
by a hyphen (e.g. USGS-02238500).
parameter_code : string or list of strings, optional
Parameter codes are 5-digit codes used to identify the constituent
measured and the units of measure. A complete list of parameter
codes and associated groupings can be found at
https://help.waterdata.usgs.gov/codes-and-parameters/parameters.
statistic_id : string or list of strings, optional
A code corresponding to the statistic an observation represents.
Continuous data are nearly always associated with statistic id
00011. Using a different code (such as 00003 for mean) will
typically return no results. A complete list of codes and their
descriptions can be found at
https://help.waterdata.usgs.gov/code/stat_cd_nm_query?stat_nm_cd=%25&fmt=html.
properties : string or list of strings, optional
A vector of requested columns to be returned from the query.
Available options are: geometry, id, time_series_id,
monitoring_location_id, parameter_code, statistic_id, time, value,
unit_of_measure, approval_status, qualifier, last_modified
time_series_id : string or list of strings, optional
A unique identifier representing a single time series. This
corresponds to the id field in the time-series-metadata endpoint.
continuous_id : string or list of strings, optional
A universally unique identifier (UUID) representing a single version of
a record. It is not stable over time. Every time the record is refreshed
in our database (which may happen as part of normal operations and does
not imply any change to the data itself) a new ID will be generated. To
uniquely identify a single observation over time, compare the time and
time_series_id fields; each time series will only have a single
observation at a given time.
approval_status : string or list of strings, optional
Some of the data that you have obtained from this U.S. Geological Survey
database may not have received Director's approval. Any such data values
are qualified as provisional and are subject to revision. Provisional
data are released on the condition that neither the USGS nor the United
States Government may be held liable for any damages resulting from its
use. This field reflects the approval status of each record, and is either
"Approved", meaining processing review has been completed and the data is
approved for publication, or "Provisional" and subject to revision. For
more information about provisional data, go to:
https://waterdata.usgs.gov/provisional-data-statement/.
unit_of_measure : string or list of strings, optional
A human-readable description of the units of measurement associated
with an observation.
qualifier : string or list of strings, optional
This field indicates any qualifiers associated with an observation, for
instance if a sensor may have been impacted by ice or if values were
estimated.
value : string or list of strings, optional
The value of the observation. Values are transmitted as strings in
the JSON response format in order to preserve precision.
last_modified : string, optional
The last time a record was refreshed in our database. This may happen
due to regular operational processes and does not necessarily indicate
anything about the measurement has changed. You can query this field
using date-times or intervals, adhering to RFC 3339, or using ISO 8601
duration objects. Intervals may be bounded or half-bounded (double-dots
at start or end).
Examples:

* A date-time: "2018-02-12T23:20:50Z"
* A bounded interval: "2018-02-12T00:00:00Z/2018-03-18T12:31:12Z"
* Half-bounded intervals: "2018-02-12T00:00:00Z/.." or "../2018-03-18T12:31:12Z"
* Duration objects: "P1M" for data from the past month or "PT36H" for the last 36 hours

Only features that have a last_modified that intersects the value of
datetime are selected.
time : string, optional
The date an observation represents. You can query this field using
date-times or intervals, adhering to RFC 3339, or using ISO 8601
duration objects. Intervals may be bounded or half-bounded (double-dots
at start or end). Only features that have a time that intersects the
value of datetime are selected. If a feature has multiple temporal
properties, it is the decision of the server whether only a single
temporal property is used to determine the extent or all relevant
temporal properties.
Examples:

* A date-time: "2018-02-12T23:20:50Z"
* A bounded interval: "2018-02-12T00:00:00Z/2018-03-18T12:31:12Z"
* Half-bounded intervals: "2018-02-12T00:00:00Z/.." or "../2018-03-18T12:31:12Z"
* Duration objects: "P1M" for data from the past month or "PT36H" for the last 36 hours

limit : numeric, optional
The optional limit parameter is used to control the subset of the
selected features that should be returned in each page. The maximum
allowable limit is 10000. It may be beneficial to set this number lower
if your internet connection is spotty. The default (NA) will set the
limit to the maximum allowable limit for the service.
convert_type : boolean, optional
If True, the function will convert the data to dates and qualifier to
string vector

Returns
-------
df : ``pandas.DataFrame`` or ``geopandas.GeoDataFrame``
Formatted data returned from the API query.
md: :obj:`dataretrieval.utils.Metadata`
A custom metadata object

Examples
--------
.. code::

>>> # Get instantaneous gage height data from a
>>> # single site from a single year
>>> df, md = dataretrieval.waterdata.get_continuous(
... monitoring_location_id="USGS-02238500",
... parameter_code="00065",
... time="2021-01-01T00:00:00Z/2022-01-01T00:00:00Z",
... )
"""
service = "continuous"
output_id = "continuous_id"

# Build argument dictionary, omitting None values
args = {
k: v
for k, v in locals().items()
if k not in {"service", "output_id"} and v is not None
}

return get_ogc_data(args, output_id, service)


def get_monitoring_locations(
monitoring_location_id: Optional[List[str]] = None,
Expand Down
20 changes: 0 additions & 20 deletions dataretrieval/waterdata/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -773,23 +773,3 @@ def get_ogc_data(
metadata = BaseMetadata(response)
return return_list, metadata


# def _get_description(service: str):
# tags = _get_collection().get("tags", [])
# for tag in tags:
# if tag.get("name") == service:
# return tag.get("description")
# return None

# def _get_params(service: str):
# url = f"{_base_url()}collections/{service}/schema"
# resp = requests.get(url, headers=_default_headers())
# resp.raise_for_status()
# properties = resp.json().get("properties", {})
# return {k: v.get("description") for k, v in properties.items()}

# def _get_collection():
# url = f"{_base_url()}openapi?f=json"
# resp = requests.get(url, headers=_default_headers())
# resp.raise_for_status()
# return resp.json()
24 changes: 16 additions & 8 deletions tests/waterdata_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
_check_profiles,
get_samples,
get_daily,
get_continuous,
get_monitoring_locations,
get_latest_continuous,
get_latest_daily,
Expand Down Expand Up @@ -142,7 +143,7 @@ def test_get_daily_properties():
assert df.parameter_code.unique().tolist() == ["00060"]

def test_get_daily_no_geometry():
df, md = get_daily(
df,_ = get_daily(
monitoring_location_id="USGS-05427718",
parameter_code="00060",
time="2025-01-01/..",
Expand All @@ -152,6 +153,18 @@ def test_get_daily_no_geometry():
assert df.shape[1] == 11
assert isinstance(df, DataFrame)

def test_get_continuous():
df,_ = get_continuous(
monitoring_location_id="USGS-06904500",
parameter_code="00065",
time="2025-01-01/2025-12-31"
)
assert isinstance(df, DataFrame)
assert "geometry" not in df.columns
assert df.shape[1] == 11
assert df['time'].dtype == 'datetime64[ns, UTC]'
assert "continuous_id" in df.columns

def test_get_monitoring_locations():
df, md = get_monitoring_locations(
state_name="Connecticut",
Expand All @@ -162,7 +175,7 @@ def test_get_monitoring_locations():
assert hasattr(md, 'query_time')

def test_get_monitoring_locations_hucs():
df, md = get_monitoring_locations(
df,_ = get_monitoring_locations(
hydrologic_unit_code=["010802050102", "010802050103"]
)
assert set(df.hydrologic_unit_code.unique().tolist()) == {"010802050102", "010802050103"}
Expand All @@ -177,12 +190,7 @@ def test_get_latest_continuous():
assert df.statistic_id.unique().tolist() == ["00011"]
assert hasattr(md, 'url')
assert hasattr(md, 'query_time')
try:
datetime.datetime.strptime(df['time'].iloc[0], "%Y-%m-%dT%H:%M:%S+00:00")
out=True
except:
out=False
assert out
assert df['time'].dtype == 'datetime64[ns, UTC]'

def test_get_latest_daily():
df, md = get_latest_daily(
Expand Down