Finding CTD Data
A question that gets asked a lot that has many different possible answers is from people not being able to find CTD data. This document outlines some options for tracking down data and resolving issues.
The data does not exist
The person asking might be wrong about a certain survey happening on certain dates. If you can't find any trace of the data they are asking for, it would be worth checking with the oceanography research technicians (hereafter referred to as 'the techs') whether the survey exists. At time of writing Eva Jordison eva.jordison@hakai.org and Chris Mackenzie chris.mackenzie@hakai.org are good people to ask, though you can always message on Slack (hakai-oceanography channel) or email all oceanography staff at oceanography@hakai.org.
The inquirer has not been granted access to organization that manages the data
The user will not be able to see data that belongs to an organization they are not in the google
group for (see api.access_groups
for google group / workarea mapping). If they want to be added to
a group, they can fill out the data access request
form: https://docs.google.com/forms/d/e/1FAIpQLScOGfcxpbt42CINCfK7VbXt8dNah1iz23UVzJ1EHokhwdkLRg/viewform.
Each organization will approve or deny requests, after which you can add them to the appropriate
group through the Google Admin Console.
The data has not been uploaded yet
Usually only an option for very recent data. Some places to check are for recent casts (sort
by start_dt
in ctd.ctd_file_cast
) or recently uploaded data (sort by uploaded_date
in ctd.ctd_file
). This would also be the techs responsibility and the inquirer should follow up
with them about it.
The data has not been annotated yet
Similar to above, though can look at annotated_date
in ctd.ctd_file
.
For data not uploaded or annotated yet, I've been trying to encourage people to use
the CTD Annotation Review
on the download page of the
portal (https://hecate.hakai.org/portal2/download) as they can usually answer these types of
questions themselves with the information provided here. Key fields to look for are date, workarea,
device serial number, uploaded date, annotated date, and is annotated. Helpful search fields like
survey and site will only be filled out if the file has been annotated.
Cast detection failure
The file was uploaded, but cast detection script
under hakai-api/src/routes/ctd/utils/rbr/detectCasts.js
did not find a cast that was expected. The
detection script has been pretty stable over the past few years, so it is most likely that the data
collection did not follow the protocol (for example, the cast was too shallow to the soak too
short).
In the event that the data was good, the file should be added to the
tests hakai-api/src/routes/ctd/utils/rbr/detectCasts.js
and the script should be edited to
accommodate the edge case without failing any of the the existing tests.
If you want to manually add a cast, you can add an entry directly to the database in ctd.ctd_cast
.
You can use Ruskin to get the data for the start_dt, bottom_dt, start_depth and bottom_depth, and
get the ctd_file_pk from ctd.ctd_file
. These need to be exact/existing data points or the "view
chart" feature on the annotation page will not work.
Clock failure
Sometimes the CTD clock will not be accurate (daylight saving time change, operator forgot to sync,
battery dislodged and clock reset, etc) and the data will not show up on the correct date/time.
Refer to hakai-database/docs/sync-ctd-and-form-times.md
A note on solo data
The solo instruments are very similar to RBR CTD instruments, except they only measure depth and are constantly recording. Their purpose is to confirm the depth the niskin bottles are at when the water samples are collected.
The workflow between RBR Solo and CTD instruments is similar and the upload, annotate, cast detection, and clock information described above all apply to solos.
Key differences include that once the solo data is annotated, the bottom depth is used in the EIMS
schema as pressure_transducer_depth
and only ever used in conjunction with other water sample
data (such as nutrients). Solo data is never used on its own, so there is no download page for
solos. There is also no data processing involved for solos.
The data is not processing
Monitoring unprocessed casts
A nightly cronjob checks for unprocessed casts, and sends an email to eims.support@hakai.org. See bash scripts referenced in hecate's crontab.
Processing errors
The data may have failed processing, in which case it will not show up under the 'processed data' in
the CTD Download Page. You can check the error message in process_error
under ctd.ctd_cast
or ctd.ctd_file_cast
and collaborate with Jessy Barrette jessy.barrette@hakai.org to determine
if there is an issue with the data or the processing tool.
Site location missing
The processing script needs location data in order to process a cast, which can be provided as the collected lat/long from the tablet in the field, or retrieved from the official site locations list. If neither of these are available the cast will fail with an error similar to this:
error: An error occured while processing cast 17432: Error: Matlab processing script for cast 17432 failed with error Error using processHakaiRBRCast (line 196)
Can't find the site ARMS11 in our list and no lat/long position is available for this data
The official station list can be found here: https://hakai.maps.arcgis.com/apps/webappviewer/index.html?id=38e1b1da8d16466bbe5d7c7a713d2678
Sites are pulled into the database table eims.site
from the AGOL layer with this
script /hakai-api/cli-tools/get-sampling-sites-from-agol.js
.
Will McInnes will.mcinnes@hakai.org maintains the station list and new sites with coordinates can be sent to him. If Will is not available other contacts that might be able to help are Matt Foster matthew.foster@hakai.org, data@hakai.org, or Geospatial team geospatial.team@hakai.org.
Processing script not running
Seabird uses a flag file running.flg
to indicate that the script is running, so that only one
instance is running at once. I'm not 100% sure why but sometimes this file does not get cleaned up
and needs to be deleted (rm running.flg
) so that the script will start running again.
Other Info
RBR files can have multiple casts per file. Usually these are all from the same day, but if the data is not downloaded at the end of the day it might roll over and have casts from multiple days. This is advised against as it can create annotation problems and just be more confusing to be working with so much data off of one file. Seabird instruments create new files for each cast.
The RBR filename will indicate the serial number of the instrument and the date and time the file was downloaded off the instrument. This date is almost always the day of or day after the data was collected, but not necessarily. For example, 065679_20210719_1011 was downloaded from instrument 065679 on 2021-07-19 at 10:11AM.
The Seabird filename indicates the serial number of the instrument and the date it was downloaded, along with the cast number from that day. For example SBE19plus_01907674_2021_07_20_0002 was downloaded from instrument 01907674 on 2021-07-20 and was the second cast of the day.