Finding CTD Data

A question that gets asked a lot that has many different possible answers is from people not being able to find CTD data. This document outlines some options for tracking down data and resolving issues.

The data does not exist

The person asking might be wrong about a certain survey happening on certain dates. If you can't find any trace of the data they are asking for, it would be worth checking with the oceanography research technicians (hereafter referred to as 'the techs') whether the survey exists. At time of writing Eva Jordison eva.jordison@hakai.org and Chris Mackenzie chris.mackenzie@hakai.org are good people to ask, though you can always message on Slack (hakai-oceanography channel) or email all oceanography staff at oceanography@hakai.org.

The inquirer has not been granted access to organization that manages the data

The user will not be able to see data that belongs to an organization they are not in the google group for (see api.access_groups for google group / workarea mapping). If they want to be added to a group, they can fill out the data access request form: https://docs.google.com/forms/d/e/1FAIpQLScOGfcxpbt42CINCfK7VbXt8dNah1iz23UVzJ1EHokhwdkLRg/viewform. Each organization will approve or deny requests, after which you can add them to the appropriate group through the Google Admin Console.

The data has not been uploaded yet

Usually only an option for very recent data. Some places to check are for recent casts (sort by start_dt in ctd.ctd_file_cast) or recently uploaded data (sort by uploaded_date in ctd.ctd_file). This would also be the techs responsibility and the inquirer should follow up with them about it.

The data has not been annotated yet

Similar to above, though can look at annotated_date in ctd.ctd_file.

For data not uploaded or annotated yet, I've been trying to encourage people to use the CTD Annotation Review on the download page of the portal (https://portal.hakai.org/portal2/download) as they can usually answer these types of questions themselves with the information provided here. Key fields to look for are date, workarea, device serial number, uploaded date, annotated date, and is annotated. Helpful search fields like survey and site will only be filled out if the file has been annotated.

Cast detection failure

The file was uploaded, but cast detection script under hakai-api/src/routes/ctd/utils/rbr/detectCasts.js did not find a cast that was expected. The detection script has been pretty stable over the past few years, so it is most likely that the data collection did not follow the protocol (for example, the cast was too shallow to the soak too short).

In the event that the data was good, the file should be added to the tests hakai-api/src/routes/ctd/utils/rbr/detectCasts.js and the script should be edited to accommodate the edge case without failing any of the the existing tests.

If you want to manually add a cast, you can add an entry directly to the database in ctd.ctd_cast. You can use Ruskin to get the data for the start_dt, bottom_dt, start_depth and bottom_depth, and get the ctd_file_pk from ctd.ctd_file. These need to be exact/existing data points or the "view chart" feature on the annotation page will not work.

Clock failure

Sometimes the CTD clock will not be accurate (daylight saving time change, operator forgot to sync, battery dislodged and clock reset, etc) and the data will not show up on the correct date/time. Refer to hakai-database/docs/sync-ctd-and-form-times.md

A note on solo data

The solo instruments are very similar to RBR CTD instruments, except they only measure depth and are constantly recording. Their purpose is to confirm the depth the niskin bottles are at when the water samples are collected.

The workflow between RBR Solo and CTD instruments is similar and the upload, annotate, cast detection, and clock information described above all apply to solos.

Key differences include that once the solo data is annotated, the bottom depth is used in the EIMS schema as pressure_transducer_depth and only ever used in conjunction with other water sample data (such as nutrients). Solo data is never used on its own, so there is no download page for solos. There is also no data processing involved for solos.

The data is not processing

Monitoring unprocessed casts

A nightly cronjob checks for unprocessed casts, and sends an email to eims.support@hakai.org. See bash scripts referenced in hecate's crontab.

Processing errors

The data may have failed processing, in which case it will not show up under the 'processed data' in the CTD Download Page. You can check the error message in process_error under ctd.ctd_cast or ctd.ctd_file_cast and collaborate with Jessy Barrette jessy.barrette@hakai.org to determine if there is an issue with the data or the processing tool.

Site location missing

The processing script needs location data in order to process a cast, which can be provided as the collected lat/long from the tablet in the field, or retrieved from the official site locations list. If neither of these are available the cast will fail with an error similar to this:

error: An error occured while processing cast 17432: Error: Matlab processing script for cast 17432 failed with error Error using processHakaiRBRCast (line 196)
Can't find the site ARMS11 in our list and no lat/long position is available for this data

The official station list can be found here: https://hakai.maps.arcgis.com/apps/webappviewer/index.html?id=38e1b1da8d16466bbe5d7c7a713d2678

Sites are pulled into the database table eims.site from the AGOL layer with this script /hakai-api/cli-tools/get-sampling-sites-from-agol.js.

Will McInnes will.mcinnes@hakai.org maintains the station list and new sites with coordinates can be sent to him. If Will is not available other contacts that might be able to help are Matt Foster matthew.foster@hakai.org, data@hakai.org, or Geospatial team geospatial.team@hakai.org.

Processing script not running

Seabird uses a flag file running.flg to indicate that the script is running, so that only one instance is running at once. I'm not 100% sure why but sometimes this file does not get cleaned up and needs to be deleted (rm running.flg) so that the script will start running again.

Other Info

RBR files can have multiple casts per file. Usually these are all from the same day, but if the data is not downloaded at the end of the day it might roll over and have casts from multiple days. This is advised against as it can create annotation problems and just be more confusing to be working with so much data off of one file. Seabird instruments create new files for each cast.

The RBR filename will indicate the serial number of the instrument and the date and time the file was downloaded off the instrument. This date is almost always the day of or day after the data was collected, but not necessarily. For example, 065679_20210719_1011 was downloaded from instrument 065679 on 2021-07-19 at 10:11AM.

The Seabird filename indicates the serial number of the instrument and the date it was downloaded, along with the cast number from that day. For example SBE19plus_01907674_2021_07_20_0002 was downloaded from instrument 01907674 on 2021-07-20 and was the second cast of the day.