Tutorial
Tools
To make requests to these urls you must have a HTTP request library that submits credentials that allow you to read our private datasets. Hakai IT maintains three libraries that do this, hakai-api-client-py, hakai-api-client-r, and hakai-api-client-matlab. For instructions on how to install and use these, see their individual documentation.
Getting data
You can filter the data returned from any of these endpoints by adding querystring parameters to
your request url. A querystring is a text string you place after a "?" in your url to dictate some
parameters you want to pass to the server. Typically, querystrings consists one or more key=value
pairs joined with "&" symbols. An example would be http://example.com?foo=bar&year=2017
, where the
url is http://example.com
and the querystring is ?foo=bar&year=2017
. In the context of this API,
querystrings are used to filter and sort data. You can read about all the possible query strings for
this API in the querying data documentation. For a more gentle introduction to
data filtering, see the data filtering crash course section.
Data filtering crash course
This short tutorial assumes you are using the hakai-api-client-python library, although the instructions should be simple to adapt for the R client as well.
Say you wanted to get all chlorophyll data for the year 2016 that was collected on the KWAK survey. To do this, you could use the following workflow:
1. Request some data to see what attributes it has.
Just make a request to the chlorophyll data endpoint without any querystring parameters to see what kind of results you get back. The attribute names in the result will be what you use to filter this particular dataset.
# Example using hakai_api_client_python library
from hakai_api import Client
client = Client()
response = client.get("https://hecate.hakai.org/api/eims/views/output/chlorophyll")
data = response.json()
print(data)
# [
# {
# "no": "1",
# "action": "",
# "event_pk": 7065,
# "rn": "1",
# "date": "2012-05-17",
# "work_area": "CALVERT",
# "survey": "KWAK",
# "sampling_bout": 1,
# "site_id": "PRUTH",
# "lat": 51.6554,
# "long": -128.0913,
# "gather_lat": null,
# "gather_long": null,
# "collection_method": null,
# "line_out_depth": 5,
# "pressure_transducer_depth": null,
# "volume": 250,
# "collected": "2012-05-17T17:48:00.000Z",
# "preserved": "2012-05-17T07:00:00.000Z",
# "analyzed": null,
# "lab_technician": null,
# "project_specific_id": null,
# "hakai_id": "CHL3793",
# "is_blank": null,
# "is_solid_standard": null,
# "filter_size_mm": null,
# "filter_type": "20",
# "acetone_volume_ml": 10,
# "flurometer_serial_no": null,
# "calibration": null,
# "acid_ratio_correction_factor": null,
# "acid_coefficient": null,
# "calibration_slope": null,
# "before_acid": 87.7,
# "after_acid": 46.9,
# "acid_flag": null,
# "chla": 4.7820864,
# "chla_flag": "SVC",
# "chla_final": 4.7820864,
# "phaeo": 1.869350392,
# "phaeo_flag": "SVC",
# "phaeo_final": 1.869350392,
# "analyzing_lab": "HAKAI",
# "row_flag": "Results",
# "quality_level": "Principal Investigator",
# "comments": "",
# "quality_log": "1: Results QC''d by BH; Given new Hakai Ids\r2: Given new Hakai Ids\r3: Results QC''d by BH; Given new Hakai Ids; analyzed pre- 2014-12-05"
# },
# ...etc.
# ]
2. Add some filters to request a subset of all the data.
From the previous printout, we see that we have keys in the data like no
, action
, event_pk
,
etc. We can provide querystring parameters with our request so that the data we get back
matches those criteria. Notice that in the following code, we've added querystring parameters to do
data filtering.
# Request data where survey is KWAK and the date falls sometime in 2016.
request = client.get("https://hecate.hakai.org/api/eims/views/output/chlorophyll?survey=KWAK&date>=2016-01-01&date<2017-01-01")
data = request.json()
print(data)
# [
# {
# "no": "8751",
# "action": "",
# "event_pk": 30395,
# "rn": "1",
# "date": "2016-01-16",
# "work_area": "CALVERT",
# "survey": "KWAK",
# "sampling_bout": 1,
# "site_id": "KC1",
# "lat": 51.6545,
# "long": -128.1289,
# "gather_lat": null,
# "gather_long": null,
# "collection_method": null,
# "line_out_depth": 0,
# "pressure_transducer_depth": null,
# "volume": 250,
# "collected": "2016-01-17T03:35:13.000Z",
# "preserved": "2016-01-16T00:50:30.000Z",
# "analyzed": "2016-01-26T18:44:21.000Z",
# "lab_technician": "Bryn,Emma",
# "project_specific_id": null,
# "hakai_id": "CHL4605",
# "is_blank": null,
# "is_solid_standard": null,
# "filter_size_mm": null,
# "filter_type": "Bulk GF/F",
# "acetone_volume_ml": 10,
# "flurometer_serial_no": "720001154",
# "calibration": "2015-08-06T07:00:00.000Z",
# "acid_ratio_correction_factor": 1.364,
# "acid_coefficient": 3.748,
# "calibration_slope": 0.0005069,
# "before_acid": 109869.3,
# "after_acid": 78069.69,
# "acid_flag": null,
# "chla": 1.33788238698795,
# "chla_flag": "AV",
# "chla_final": 1.33788238698795,
# "phaeo": 3.02402733269205,
# "phaeo_flag": "AV",
# "phaeo_final": 3.02402733269205,
# "analyzing_lab": "HAKAI",
# "row_flag": "Results",
# "quality_level": "Principal Investigator",
# "comments": "",
# "quality_log": "1: Results QC''d by BH"
# },
# ...19 more rows
# ]
3. Remove the restriction on the number of records being returned.
In the previous step, if you look at the returned result, you'll notice there are only 20 rows of
data returned even though we expect there to be many more. This is due to a default limit that
restricts the api from returning more than 20 records. To remove this limit, add the querystring
parameter limit=-1
. Please note that you might receive a lot of data when you remove this
limit. You should filter your results in some way to prevent your script from crashing due to
receiving too much data at once.
#
request = client.get("https://hecate.hakai.org/api/eims/views/output/chlorophyll?survey=KWAK&date>=2016-01-01&date<2017-01-01&limit=-1")
data = request.json() # The data variable now contains all the 2016 KWAK chlorophyll data
# This will print about 316 data records of chlorophyll data from the KWAK survey collected in 2016
print(data)
Note: Default limit
By default, all urls will only return 20 records. The reason for this is that if all records
where returned, you might for example request a million rows of CTD data at once and cause your
script and the api service to crash. Fortunately, the API will restart itself if this happens,
but you may need to force close your script to recover. To avoid this, you must add filters to
your request as in the above tutorial to reduce the number of records returned before adding the
query string parameter limit=-1
to turn off the default sample restriction. For more details
about the ways you can filter data, see the querying data documentation.
Why is there a default limit?
The default limit stops people from requesting too much data at once. If you request too much data at once, you might cause the api service to crash. If this happens, it will automatically restart, but you will end up waiting a long time, and eventually get an error message instead of the data you were expecting. To avoid this, you must add filters to your request as in the above tutorial to reduce the number of records returned.