diff --git a/NEWS.md b/NEWS.md index 2efdc76..a3aa836 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,3 +1,5 @@ +**12/04/2025:** The `get_continuous()` function was added to the `waterdata` module, which provides access to measurements collected via automated sensors at a high frequency (often 15 minute intervals) at a monitoring location. This is an early version of the continuous endpoint and should be used with caution as the API team improves its performance. In the future, we anticipate the addition of an endpoint(s) specifically for handling large data requests, so it may make sense for power users to hold off on heavy development using the new continuous endpoint. + **11/24/2025:** `dataretrieval` is pleased to offer a new module, `waterdata`, which gives users access USGS's modernized [Water Data APIs](https://api.waterdata.usgs.gov/). The Water Data API endpoints include daily values, instantaneous values, field measurements (modernized groundwater levels service), time series metadata, and discrete water quality data from the Samples database. Though there will be a period of overlap, the functions within `waterdata` will eventually replace the `nwis` module, which currently provides access to the legacy [NWIS Water Services](https://waterservices.usgs.gov/). More example workflows and functions coming soon. Check `help(waterdata)` for more information. **09/03/2024:** The groundwater levels service has switched endpoints, and `dataretrieval` was updated accordingly in [`v1.0.10`](https://github.com/DOI-USGS/dataretrieval-python/releases/tag/v1.0.10). Older versions using the discontinued endpoint will return 503 errors for `nwis.get_gwlevels` or the `service='gwlevels'` argument. Visit [Water Data For the Nation](https://waterdata.usgs.gov/blog/wdfn-waterservices-2024/) for more information. diff --git a/README.md b/README.md index 0acb073..465fb5f 100644 --- a/README.md +++ b/README.md @@ -6,14 +6,16 @@ ## Latest Announcements -:mega: **11/24/2025:** `dataretrieval` now features the new `waterdata` module, +:mega: **12/04/2025:** `dataretrieval` now features the new `waterdata` module, which provides access to USGS's modernized [Water Data APIs](https://api.waterdata.usgs.gov/). The Water Data API endpoints include -daily values, instantaneous values, field measurements, time series metadata, +daily values, **instantaneous values**, field measurements, time series metadata, and discrete water quality data from the Samples database. This new module will eventually replace the `nwis` module, which provides access to the legacy [NWIS Water Services](https://waterservices.usgs.gov/). +Check out the [NEWS](NEWS.md) file for all updates and announcements. + **Important:** Users of the Water Data APIs are strongly encouraged to obtain an API key for higher rate limits and greater access to USGS data. [Register for an API key](https://api.waterdata.usgs.gov/signup/) and set it as an @@ -24,8 +26,6 @@ import os os.environ["API_USGS_PAT"] = "your_api_key_here" ``` -Check out the [NEWS](NEWS.md) file for all updates and announcements. - ## What is dataretrieval? `dataretrieval` simplifies the process of loading hydrologic data into Python. @@ -61,9 +61,9 @@ pip install git+https://github.com/DOI-USGS/dataretrieval-python.git The `waterdata` module provides access to modern USGS Water Data APIs. -The example below retrieves daily streamflow data for a specific monitoring -location for water year 2025, where a "/" between two dates in the "time" -input argument indicates a desired date range: +Some basic usage examples include retrieving daily streamflow data for a +specific monitoring location, where the `/` in the `time` argument indicates +the desired range: ```python from dataretrieval import waterdata @@ -79,8 +79,7 @@ print(f"Retrieved {len(df)} records") print(f"Site: {df['monitoring_location_id'].iloc[0]}") print(f"Mean discharge: {df['value'].mean():.2f} {df['unit_of_measure'].iloc[0]}") ``` -Fetch daily discharge data for multiple sites from a start date to present -using the following code: +Retrieving streamflow at multiple locations from October 1, 2024 to the present: ```python df, metadata = waterdata.get_daily( @@ -91,18 +90,31 @@ df, metadata = waterdata.get_daily( print(f"Retrieved {len(df)} records") ``` -The following example downloads location information for all monitoring -locations that are categorized as stream sites in the state of Maryland: +Retrieving location information for all monitoring locations categorized as +stream sites in the state of Maryland: ```python # Get monitoring location information -locations, metadata = waterdata.get_monitoring_locations( +df, metadata = waterdata.get_monitoring_locations( state_name='Maryland', site_type_code='ST' # Stream sites ) -print(f"Found {len(locations)} stream monitoring locations in Maryland") +print(f"Found {len(df)} stream monitoring locations in Maryland") ``` +Finally, retrieving continuous (a.k.a. "instantaneous") data +for one location. We *strongly advise* breaking up continuous data requests into smaller time periods and collections to avoid timeouts and other issues: + +```python +# Get continuous data for a single monitoring location and water year +df, metadata = waterdata.get_continuous( + monitoring_location_id='USGS-01646500', + parameter_code='00065', # Gage height + time='2024-10-01/2025-09-30' +) +print(f"Retrieved {len(df)} continuous gage height measurements") +``` + Visit the [API Reference](https://doi-usgs.github.io/dataretrieval-python/reference/waterdata.html) for more information and examples on available services and input parameters. @@ -202,13 +214,13 @@ print(f"Found {len(flowlines)} upstream tributaries within 50km") ### Modern USGS Water Data APIs (Recommended) - **Daily values**: Daily statistical summaries (mean, min, max) +- **Instantaneous values**: High-frequency continuous data - **Field measurements**: Discrete measurements from field visits - **Monitoring locations**: Site information and metadata - **Time series metadata**: Information about available data parameters - **Latest daily values**: Most recent daily statistical summary data - **Latest instantaneous values**: Most recent high-frequency continuous data - **Samples data**: Discrete USGS water quality data -- **Instantaneous values** (*COMING SOON*): High-frequency continuous data ### Legacy NWIS Services (Deprecated) - **Daily values (dv)**: Legacy daily statistical data diff --git a/dataretrieval/waterdata/__init__.py b/dataretrieval/waterdata/__init__.py index 7f68bfd..39b758f 100644 --- a/dataretrieval/waterdata/__init__.py +++ b/dataretrieval/waterdata/__init__.py @@ -13,6 +13,7 @@ from .api import ( _check_profiles, get_codes, + get_continuous, get_daily, get_field_measurements, get_latest_continuous, @@ -30,6 +31,7 @@ __all__ = [ "get_codes", + "get_continuous", "get_daily", "get_field_measurements", "get_latest_continuous", diff --git a/dataretrieval/waterdata/api.py b/dataretrieval/waterdata/api.py index 4e6c6c4..63f7b81 100644 --- a/dataretrieval/waterdata/api.py +++ b/dataretrieval/waterdata/api.py @@ -204,6 +204,171 @@ def get_daily( return get_ogc_data(args, output_id, service) +def get_continuous( + monitoring_location_id: Optional[Union[str, List[str]]] = None, + parameter_code: Optional[Union[str, List[str]]] = None, + statistic_id: Optional[Union[str, List[str]]] = None, + properties: Optional[List[str]] = None, + time_series_id: Optional[Union[str, List[str]]] = None, + continuous_id: Optional[Union[str, List[str]]] = None, + approval_status: Optional[Union[str, List[str]]] = None, + unit_of_measure: Optional[Union[str, List[str]]] = None, + qualifier: Optional[Union[str, List[str]]] = None, + value: Optional[Union[str, List[str]]] = None, + last_modified: Optional[str] = None, + time: Optional[Union[str, List[str]]] = None, + limit: Optional[int] = None, + convert_type: bool = True, +) -> Tuple[pd.DataFrame, BaseMetadata]: + """ + Continuous data provide instantanous water conditions. + + This is an early version of the continuous endpoint that is feature-complete + and is being made available for limited use. Geometries are not included + with the continuous endpoint. If the "time" input is left blank, the service + will return the most recent year of measurements. Users may request no more + than three years of data with each function call. + + Continuous data are collected at a high frequency, typically 15-minute + intervals. Depending on the specific monitoring location, the data may be + transmitted automatically via telemetry and be available on WDFN within + minutes of collection, while other times the delivery of data may be delayed + if the monitoring location does not have the capacity to automatically + transmit data. Continuous data are described by parameter name and + parameter code (pcode). These data might also be referred to as + "instantaneous values" or "IV". + + Parameters + ---------- + monitoring_location_id : string or list of strings, optional + A unique identifier representing a single monitoring location. This + corresponds to the id field in the monitoring-locations endpoint. + Monitoring location IDs are created by combining the agency code of + the agency responsible for the monitoring location (e.g. USGS) with + the ID number of the monitoring location (e.g. 02238500), separated + by a hyphen (e.g. USGS-02238500). + parameter_code : string or list of strings, optional + Parameter codes are 5-digit codes used to identify the constituent + measured and the units of measure. A complete list of parameter + codes and associated groupings can be found at + https://help.waterdata.usgs.gov/codes-and-parameters/parameters. + statistic_id : string or list of strings, optional + A code corresponding to the statistic an observation represents. + Continuous data are nearly always associated with statistic id + 00011. Using a different code (such as 00003 for mean) will + typically return no results. A complete list of codes and their + descriptions can be found at + https://help.waterdata.usgs.gov/code/stat_cd_nm_query?stat_nm_cd=%25&fmt=html. + properties : string or list of strings, optional + A vector of requested columns to be returned from the query. + Available options are: geometry, id, time_series_id, + monitoring_location_id, parameter_code, statistic_id, time, value, + unit_of_measure, approval_status, qualifier, last_modified + time_series_id : string or list of strings, optional + A unique identifier representing a single time series. This + corresponds to the id field in the time-series-metadata endpoint. + continuous_id : string or list of strings, optional + A universally unique identifier (UUID) representing a single version of + a record. It is not stable over time. Every time the record is refreshed + in our database (which may happen as part of normal operations and does + not imply any change to the data itself) a new ID will be generated. To + uniquely identify a single observation over time, compare the time and + time_series_id fields; each time series will only have a single + observation at a given time. + approval_status : string or list of strings, optional + Some of the data that you have obtained from this U.S. Geological Survey + database may not have received Director's approval. Any such data values + are qualified as provisional and are subject to revision. Provisional + data are released on the condition that neither the USGS nor the United + States Government may be held liable for any damages resulting from its + use. This field reflects the approval status of each record, and is either + "Approved", meaining processing review has been completed and the data is + approved for publication, or "Provisional" and subject to revision. For + more information about provisional data, go to: + https://waterdata.usgs.gov/provisional-data-statement/. + unit_of_measure : string or list of strings, optional + A human-readable description of the units of measurement associated + with an observation. + qualifier : string or list of strings, optional + This field indicates any qualifiers associated with an observation, for + instance if a sensor may have been impacted by ice or if values were + estimated. + value : string or list of strings, optional + The value of the observation. Values are transmitted as strings in + the JSON response format in order to preserve precision. + last_modified : string, optional + The last time a record was refreshed in our database. This may happen + due to regular operational processes and does not necessarily indicate + anything about the measurement has changed. You can query this field + using date-times or intervals, adhering to RFC 3339, or using ISO 8601 + duration objects. Intervals may be bounded or half-bounded (double-dots + at start or end). + Examples: + + * A date-time: "2018-02-12T23:20:50Z" + * A bounded interval: "2018-02-12T00:00:00Z/2018-03-18T12:31:12Z" + * Half-bounded intervals: "2018-02-12T00:00:00Z/.." or "../2018-03-18T12:31:12Z" + * Duration objects: "P1M" for data from the past month or "PT36H" for the last 36 hours + + Only features that have a last_modified that intersects the value of + datetime are selected. + time : string, optional + The date an observation represents. You can query this field using + date-times or intervals, adhering to RFC 3339, or using ISO 8601 + duration objects. Intervals may be bounded or half-bounded (double-dots + at start or end). Only features that have a time that intersects the + value of datetime are selected. If a feature has multiple temporal + properties, it is the decision of the server whether only a single + temporal property is used to determine the extent or all relevant + temporal properties. + Examples: + + * A date-time: "2018-02-12T23:20:50Z" + * A bounded interval: "2018-02-12T00:00:00Z/2018-03-18T12:31:12Z" + * Half-bounded intervals: "2018-02-12T00:00:00Z/.." or "../2018-03-18T12:31:12Z" + * Duration objects: "P1M" for data from the past month or "PT36H" for the last 36 hours + + limit : numeric, optional + The optional limit parameter is used to control the subset of the + selected features that should be returned in each page. The maximum + allowable limit is 10000. It may be beneficial to set this number lower + if your internet connection is spotty. The default (NA) will set the + limit to the maximum allowable limit for the service. + convert_type : boolean, optional + If True, the function will convert the data to dates and qualifier to + string vector + + Returns + ------- + df : ``pandas.DataFrame`` or ``geopandas.GeoDataFrame`` + Formatted data returned from the API query. + md: :obj:`dataretrieval.utils.Metadata` + A custom metadata object + + Examples + -------- + .. code:: + + >>> # Get instantaneous gage height data from a + >>> # single site from a single year + >>> df, md = dataretrieval.waterdata.get_continuous( + ... monitoring_location_id="USGS-02238500", + ... parameter_code="00065", + ... time="2021-01-01T00:00:00Z/2022-01-01T00:00:00Z", + ... ) + """ + service = "continuous" + output_id = "continuous_id" + + # Build argument dictionary, omitting None values + args = { + k: v + for k, v in locals().items() + if k not in {"service", "output_id"} and v is not None + } + + return get_ogc_data(args, output_id, service) + def get_monitoring_locations( monitoring_location_id: Optional[List[str]] = None, diff --git a/dataretrieval/waterdata/utils.py b/dataretrieval/waterdata/utils.py index a4e9780..1bcc58a 100644 --- a/dataretrieval/waterdata/utils.py +++ b/dataretrieval/waterdata/utils.py @@ -773,23 +773,3 @@ def get_ogc_data( metadata = BaseMetadata(response) return return_list, metadata - -# def _get_description(service: str): -# tags = _get_collection().get("tags", []) -# for tag in tags: -# if tag.get("name") == service: -# return tag.get("description") -# return None - -# def _get_params(service: str): -# url = f"{_base_url()}collections/{service}/schema" -# resp = requests.get(url, headers=_default_headers()) -# resp.raise_for_status() -# properties = resp.json().get("properties", {}) -# return {k: v.get("description") for k, v in properties.items()} - -# def _get_collection(): -# url = f"{_base_url()}openapi?f=json" -# resp = requests.get(url, headers=_default_headers()) -# resp.raise_for_status() -# return resp.json() diff --git a/tests/waterdata_test.py b/tests/waterdata_test.py index 816bc11..2745d34 100755 --- a/tests/waterdata_test.py +++ b/tests/waterdata_test.py @@ -10,6 +10,7 @@ _check_profiles, get_samples, get_daily, + get_continuous, get_monitoring_locations, get_latest_continuous, get_latest_daily, @@ -142,7 +143,7 @@ def test_get_daily_properties(): assert df.parameter_code.unique().tolist() == ["00060"] def test_get_daily_no_geometry(): - df, md = get_daily( + df,_ = get_daily( monitoring_location_id="USGS-05427718", parameter_code="00060", time="2025-01-01/..", @@ -152,6 +153,18 @@ def test_get_daily_no_geometry(): assert df.shape[1] == 11 assert isinstance(df, DataFrame) +def test_get_continuous(): + df,_ = get_continuous( + monitoring_location_id="USGS-06904500", + parameter_code="00065", + time="2025-01-01/2025-12-31" + ) + assert isinstance(df, DataFrame) + assert "geometry" not in df.columns + assert df.shape[1] == 11 + assert df['time'].dtype == 'datetime64[ns, UTC]' + assert "continuous_id" in df.columns + def test_get_monitoring_locations(): df, md = get_monitoring_locations( state_name="Connecticut", @@ -162,7 +175,7 @@ def test_get_monitoring_locations(): assert hasattr(md, 'query_time') def test_get_monitoring_locations_hucs(): - df, md = get_monitoring_locations( + df,_ = get_monitoring_locations( hydrologic_unit_code=["010802050102", "010802050103"] ) assert set(df.hydrologic_unit_code.unique().tolist()) == {"010802050102", "010802050103"} @@ -177,12 +190,7 @@ def test_get_latest_continuous(): assert df.statistic_id.unique().tolist() == ["00011"] assert hasattr(md, 'url') assert hasattr(md, 'query_time') - try: - datetime.datetime.strptime(df['time'].iloc[0], "%Y-%m-%dT%H:%M:%S+00:00") - out=True - except: - out=False - assert out + assert df['time'].dtype == 'datetime64[ns, UTC]' def test_get_latest_daily(): df, md = get_latest_daily(