---
title: "CoastConnect Kelp Drone Mapping: Analysis Methods and Report Card Generation"
bibliography: bib/kelp-drone-report-card.bib
csl: apa
format:
html:
toc: true
toc-depth: 3
toc-location: left
number-sections: true
code-fold: show
code-tools: true
code-line-numbers: true
theme: cosmo
embed-resources: true
fig-width: 10
fig-height: 6
execute:
echo: true
warning: true
message: true
error: false
---
# Introduction
This document provides a comprehensive workflow for generating report cards from the CoastConnect Kelp Drone Mapping program. The report card system leverages drone-based imagery to monitor kelp forest canopy extent over time, providing critical insights for coastal ecosystem management and conservation.
The analysis workflow produces three primary outputs:
1. **Long-term trends** spanning the entire time series
2. **Recent trends** focused on the last five years
3. **Status assessments** using a percentile-based approach adapted from U.S. West Coast monitoring programs [@friederDevelopingStatusTrends2025]
# Data Requirements and Specifications
## Required Data Structure
The analysis requires a CSV file with the following structure and column specifications:
- **Site** (text): Unique site identifier or name
- **Dom_sp** (text): Dominant kelp species at the site (`macro` for *Macrocystis* or `nereo` for *Nereocystis*)
- **Date** (YYYY-MM-DD): Survey date in ISO format
- **Year** (YYYY): Calendar year as a four-digit integer
- **Month**: Month of survey
- **Tide_height_m**: Tidal height during the drone survey, measured in meters
- **Site_m2**: Total area of the survey site in square meters
- **Canopy_macro_m2**: Total *Macrocystis* canopy area within the site, measured in square meters
- **Canopy_nereo_m2**: Total *Nereocystis* canopy area within the site, measured in square meters
## Important Methodological Considerations
::: {.callout-note}
Reminder!
1. Canopy area should have been “clipped” to the site area or interest (AOI) to make sure the same survey areas are being compared over time.
2. Need to have at least 5 years worth of data to run this trend and status analysis.
:::
::: {.callout-note}
### Tide Correction Methodology
Tidal height during a drone survey affects the area of kelp canopy present at the ocean surface. Previous research has explored the relationship between tide height and kelp canopy area [@timmerCapturingAccurateKelp2024] and a correction factor has been developed based on that work. You may choose to apply this correction factor, however, take into consideration your site conditions. The Central Coast is typically described as having slack tide conditions for 1 hour on either side of low. This relationship may not be appropriate for sites that experience higher current regimes.
The correction equation is:
```
=canopy_extent_m2*(1+(0.257*tide_height_m))
```
:::
## Report Card Outputs
### Site-Level Analysis
For each monitoring site, the workflow produces:
1. **Trend analysis**: Statistical assessment using either linear regression or the Mann-Kendall test, depending on whether parametric assumptions are met
- **Long-term trend**: Analysis of the entire time series
- **Recent trend**: Analysis of the most recent five years
4. **Status assessment**: Classification using the Frieder et al. (2025) percentile-based method, which categorizes current conditions relative to baseline periods. Status categories include:
- Baseline reference is defined as the first five years of the time series
- Far Above Expectations
- Above Expectations
- Meeting Expectations
- Below Expectations
- Far Below Expectations.
### Regional-Level Report Card
Regional analyses aggregate site-level data by dominant species:
- Pool sites by species
- Report on species level trends
- Pool all kelp together and report on trend (PacRim method)
This multi-scale approach enables both fine-scale site management and broader regional ecosystem assessments.
# Setup
The analysis relies on several R packages for data manipulation, statistical analysis, and visualization. The `tidyverse` suite provides data wrangling capabilities, while the `Kendall` package enables non-parametric trend testing when parametric assumptions are violated.
```{r}
#| label: setup
#| message: false
#| warning: false
library(tidyverse)
library(Kendall) # For Mann-Kendall trend test
# Set paths
data_file <- "data/drone_site_level_data.csv"
output_dir <- "output/Kelp"
# Create output directory if needed
if (!dir.exists(output_dir)) {
dir.create(output_dir, recursive = TRUE)
}
cat("=== KELP FOREST MONITORING ANALYSIS ===\n")
cat("Data file:", data_file, "\n")
cat("Output directory:", output_dir, "\n\n")
```
# Data Loading and Validation
The first step in any ecological analysis is rigorous data validation. This section loads the input data, verifies that all required columns are present, and performs quality checks to identify potential issues such as missing years, insufficient temporal coverage, or data gaps.
```{r}
#| label: data-load
#| message: false
cat("=== LOADING AND VALIDATING DATA ===\n")
kelp_data <- read.csv(data_file)
# Check required columns
required_cols <- c(
"site",
"year",
"dom_sp",
"canopy_macro_m2_tidecor",
"canopy_nereo_m2_tidecor",
"tide_height_m"
)
missing_cols <- setdiff(required_cols, names(kelp_data))
if (length(missing_cols) > 0) {
stop(
"ERROR: Missing required columns: ",
paste(missing_cols, collapse = ", ")
)
}
cat("✓ All required columns present\n")
cat("Total observations:", nrow(kelp_data), "\n")
cat("Sites:", paste(unique(kelp_data$site), collapse = ", "), "\n")
cat("Years:", min(kelp_data$year), "to", max(kelp_data$year), "\n\n")
# Check data quality
sites_with_gaps <- kelp_data %>%
group_by(site) %>%
summarize(
n_years = n_distinct(year),
year_range = max(year) - min(year) + 1,
has_gaps = n_years < year_range
) %>%
filter(has_gaps)
if (nrow(sites_with_gaps) > 0) {
cat("⚠ WARNING: The following sites have gaps in their time series:\n")
print(sites_with_gaps)
cat("\n")
}
# Check minimum data requirements
sites_summary <- kelp_data %>%
group_by(site) %>%
summarize(
n_years = n_distinct(year),
first_year = min(year),
last_year = max(year),
n_obs = n()
)
insufficient_data <- sites_summary %>% filter(n_years < 5)
if (nrow(insufficient_data) > 0) {
cat("⚠ WARNING: The following sites have fewer than 5 years of data:\n")
print(insufficient_data)
cat("These sites will have limited trend analysis capability\n\n")
}
# Create dominant species lookup with full names
dominant_species <- kelp_data %>%
select(site, dom_sp) %>%
distinct() %>%
mutate(
dominant = case_when(
dom_sp == "macro" ~ "Macrocystis",
dom_sp == "nereo" ~ "Nereocystis",
TRUE ~ dom_sp
)
)
cat("Dominant species by site:\n")
print(dominant_species)
cat("\n")
# Add dominant species values to data
kelp_data <- kelp_data %>%
left_join(
dominant_species %>% select(site, dom_sp, dominant),
by = c("site", "dom_sp")
) %>%
mutate(
kelp_extent = ifelse(
dom_sp == "macro",
canopy_macro_m2_tidecor,
canopy_nereo_m2_tidecor
)
)
```
The validation process identifies sites with temporal gaps or insufficient data for robust trend analysis. These diagnostics are essential for interpreting results and understanding the confidence we can place in subsequent statistical tests.
# Long-Term Trend Analysis
Long-term trend analysis provides insight into decadal-scale patterns in kelp canopy extent. The choice of statistical method depends critically on whether the data meet the assumptions of parametric linear regression. When assumptions of normality and homoscedasticity are violated, the non-parametric Mann-Kendall test provides a robust alternative.
## Statistical Methodology
Linear regression assumes:
1. **Linearity**: The relationship between year and canopy extent is approximately linear
2. **Independence**: Observations are independent (a potential concern in time series)
3. **Homoscedasticity**: Variance of residuals is constant across the time series
4. **Normality**: Residuals are normally distributed
When these assumptions are not met, the Mann-Kendall test offers a distribution-free approach that assesses the monotonic trend without assuming linearity or normality. The test uses Sen's slope estimator, which represents the median slope across all pairs of observations and is robust to outliers.
```{r}
#| label: longterm-trends
cat("=== PART 1: LONG-TERM TREND ANALYSIS ===\n\n")
# Test assumptions and run appropriate trend test for each site
longterm_trend_results <- list()
for (i in 1:nrow(sites_summary)) {
site_name <- sites_summary$site[i]
n_years <- sites_summary$n_years[i]
cat(sprintf("Analyzing %s (%d years of data)...\n", site_name, n_years))
# Get site data
site_data <- kelp_data %>%
filter(site == site_name) %>%
arrange(year) %>%
select(year, kelp_extent) %>%
filter(!is.na(kelp_extent))
if (nrow(site_data) < 3) {
cat(" ⚠ Insufficient data for trend analysis (need ≥3 observations)\n\n")
longterm_trend_results[[site_name]] <- data.frame(
site = site_name,
n_obs = nrow(site_data),
method = "Insufficient data",
slope = NA,
p_value = NA,
r_squared = NA,
direction = "Insufficient data"
)
next
}
# Test linear regression assumptions
# 1. Fit linear model
lm_model <- lm(kelp_extent ~ year, data = site_data)
# 2. Test for normality of residuals (Shapiro-Wilk)
shapiro_test <- shapiro.test(residuals(lm_model))
# 3. Test for homoscedasticity (visual check via residual plot would be better,
# but for automation we'll use a simple variance ratio test)
# Split residuals into first half and second half
n_obs <- nrow(site_data)
if (n_obs >= 6) {
first_half_var <- var(residuals(lm_model)[1:floor(n_obs / 2)])
second_half_var <- var(residuals(lm_model)[(floor(n_obs / 2) + 1):n_obs])
var_ratio <- max(first_half_var, second_half_var) /
min(first_half_var, second_half_var)
homoscedastic <- var_ratio < 4 # Rule of thumb: ratio should be < 4
} else {
homoscedastic <- TRUE # Skip test if too few observations
}
# Decide which method to use
assumptions_met <- (shapiro_test$p.value > 0.05) & homoscedastic
if (assumptions_met) {
# Use linear regression
method <- "Linear regression"
slope <- coef(lm_model)[2]
p_value <- summary(lm_model)$coefficients[2, 4]
r_squared <- summary(lm_model)$r.squared
cat(sprintf(" Method: Linear regression (assumptions met)\n"))
cat(sprintf(
" Slope: %.3f m²/year, p = %.4f, R² = %.3f\n",
slope,
p_value,
r_squared
))
} else {
# Use Mann-Kendall test
method <- "Mann-Kendall"
mk_test <- MannKendall(site_data$kelp_extent)
# Estimate slope using Sen's slope (median of all pairwise slopes)
sen_slope <- function(x, y) {
slopes <- c()
for (i in 1:(length(x) - 1)) {
for (j in (i + 1):length(x)) {
slopes <- c(slopes, (y[j] - y[i]) / (x[j] - x[i]))
}
}
return(median(slopes))
}
slope <- sen_slope(site_data$year, site_data$kelp_extent)
p_value <- mk_test$sl
r_squared <- NA # Mann-Kendall doesn't produce R²
cat(sprintf(" Method: Mann-Kendall (assumptions not met)\n"))
if (!shapiro_test$p.value > 0.05) {
cat(sprintf(
" - Normality violated (Shapiro p = %.4f)\n",
shapiro_test$p.value
))
}
if (!homoscedastic) {
cat(sprintf(
" - Homoscedasticity violated (variance ratio = %.2f)\n",
var_ratio
))
}
cat(sprintf(" Sen's slope: %.3f m²/year, p = %.4f\n", slope, p_value))
}
# Determine direction
if (p_value < 0.05) {
direction <- ifelse(slope > 0, "Increasing", "Decreasing")
} else {
direction <- "No significant trend"
}
cat(sprintf(" Direction: %s\n\n", direction))
# Store results
longterm_trend_results[[site_name]] <- data.frame(
site = site_name,
n_obs = nrow(site_data),
method = method,
slope = slope,
p_value = p_value,
r_squared = ifelse(is.na(r_squared), NA, r_squared),
direction = direction
)
}
# Combine results
longterm_trends_df <- bind_rows(longterm_trend_results)
cat("=== LONG-TERM TREND SUMMARY ===\n")
print(longterm_trends_df)
cat("\n")
# Save long-term trend results
write.csv(
longterm_trends_df,
file.path(output_dir, "longterm_trends.csv"),
row.names = FALSE
)
cat("✓ Long-term trends saved to longterm_trends.csv\n\n")
```
The long-term trend results provide the foundation for understanding overall trajectories at each site. Sites showing significant trends warrant particular management attention, while sites with no significant trend may represent stable kelp forests or may lack sufficient statistical power due to high interannual variability.
# Recent Trend Analysis
While long-term trends reveal decadal patterns, recent trends (focused on the most recent five years) provide early warning signals of emerging changes. Recent trends may diverge from long-term patterns, potentially indicating regime shifts, recovery from disturbance, or responses to recent environmental changes.
```{r}
#| label: recent-trends
cat("=== PART 2: RECENT TREND ANALYSIS (LAST 5 YEARS) ===\n\n")
# Determine the last 5 years
max_year <- max(kelp_data$year)
recent_years <- (max_year - 4):max_year
cat("Recent period:", min(recent_years), "to", max(recent_years), "\n\n")
# Filter to recent data
recent_data <- kelp_data %>% filter(year %in% recent_years)
# Run trend analysis for recent period
recent_trend_results <- list()
for (i in 1:nrow(sites_summary)) {
site_name <- sites_summary$site[i]
# Get recent site data
site_data <- recent_data %>%
filter(site == site_name) %>%
arrange(year) %>%
select(year, kelp_extent) %>%
filter(!is.na(kelp_extent))
n_recent <- nrow(site_data)
cat(sprintf(
"Analyzing %s (%d recent observations)...\n",
site_name,
n_recent
))
if (n_recent < 3) {
cat(" ⚠ Insufficient recent data for trend analysis\n\n")
recent_trend_results[[site_name]] <- data.frame(
site = site_name,
n_obs = n_recent,
method = "Insufficient data",
slope = NA,
p_value = NA,
r_squared = NA,
direction = "Insufficient data"
)
next
}
# Test linear regression assumptions
lm_model <- lm(kelp_extent ~ year, data = site_data)
shapiro_test <- shapiro.test(residuals(lm_model))
if (n_recent >= 6) {
first_half_var <- var(residuals(lm_model)[1:floor(n_recent / 2)])
second_half_var <- var(residuals(lm_model)[
(floor(n_recent / 2) + 1):n_recent
])
var_ratio <- max(first_half_var, second_half_var) /
min(first_half_var, second_half_var)
homoscedastic <- var_ratio < 4
} else {
homoscedastic <- TRUE
}
assumptions_met <- (shapiro_test$p.value > 0.05) & homoscedastic
if (assumptions_met) {
# Linear regression
method <- "Linear regression"
slope <- coef(lm_model)[2]
p_value <- summary(lm_model)$coefficients[2, 4]
r_squared <- summary(lm_model)$r.squared
cat(sprintf(" Method: Linear regression\n"))
cat(sprintf(
" Slope: %.3f m²/year, p = %.4f, R² = %.3f\n",
slope,
p_value,
r_squared
))
} else {
# Mann-Kendall
method <- "Mann-Kendall"
mk_test <- MannKendall(site_data$kelp_extent)
sen_slope <- function(x, y) {
slopes <- c()
for (i in 1:(length(x) - 1)) {
for (j in (i + 1):length(x)) {
slopes <- c(slopes, (y[j] - y[i]) / (x[j] - x[i]))
}
}
return(median(slopes))
}
slope <- sen_slope(site_data$year, site_data$kelp_extent)
p_value <- mk_test$sl
r_squared <- NA
cat(sprintf(" Method: Mann-Kendall\n"))
cat(sprintf(" Sen's slope: %.3f m²/year, p = %.4f\n", slope, p_value))
}
# Determine direction
if (p_value < 0.05) {
direction <- ifelse(slope > 0, "Increasing", "Decreasing")
} else {
direction <- "No significant trend"
}
cat(sprintf(" Direction: %s\n\n", direction))
recent_trend_results[[site_name]] <- data.frame(
site = site_name,
n_obs = n_recent,
method = method,
slope = slope,
p_value = p_value,
r_squared = ifelse(is.na(r_squared), NA, r_squared),
direction = direction
)
}
# Combine results
recent_trends_df <- bind_rows(recent_trend_results)
cat("=== RECENT TREND SUMMARY ===\n")
print(recent_trends_df)
cat("\n")
# Save recent trend results
write.csv(
recent_trends_df,
file.path(output_dir, "recent_trends.csv"),
row.names = FALSE
)
cat("✓ Recent trends saved to recent_trends.csv\n\n")
```
Comparing long-term and recent trends can reveal important ecological dynamics. A site with a long-term decline but recent increase may be recovering, while a site with a long-term increase but recent decline may be experiencing new stressors. These patterns inform adaptive management strategies.
# Status Assessment
Status assessment translates raw canopy measurements into management-relevant categories by comparing current conditions to historical baselines. This approach follows the U.S. West Coast percentile method developed by Frieder et al. (2025), which classifies sites based on where current values fall within the distribution of baseline observations.
## Percentile-Based Classification
The status assessment uses site-specific reference periods consisting of the first five years of monitoring at each location. Current observations are assigned scores based on their percentile rank within the reference distribution:
- **Score +2** (Far Above Expectations): Current value ≥ 90th percentile of baseline
- **Score +1** (Above Expectations): Current value ≥ 75th percentile of baseline
- **Score 0** (Meeting Expectations): Current value between 25th and 75th percentiles of baseline
- **Score -1** (Below Expectations): Current value ≤ 25th percentile of baseline
- **Score -2** (Far Below Expectations): Current value ≤ 10th percentile of baseline
This percentile-based approach inherently accounts for site-specific variability and historical context, making it more ecologically meaningful than absolute thresholds.
```{r}
#| label: status-assessment
cat("=== PART 3: STATUS ASSESSMENT ===\n\n")
# Helper functions for status assessment
calculate_score_from_percentile <- function(current_val, ref_vals) {
if (is.na(current_val)) {
return(NA_real_)
}
ref_clean <- na.omit(ref_vals)
if (length(ref_clean) < 2) {
return(NA_real_)
}
percentile <- ecdf(ref_clean)(current_val) * 100
if (percentile >= 90) {
return(2)
} else if (percentile >= 75) {
return(1)
} else if (percentile >= 25) {
return(0)
} else if (percentile >= 10) {
return(-1)
} else {
return(-2)
}
}
translate_score_to_category <- function(avg_score) {
if (is.na(avg_score)) {
return("Insufficient Data")
} else if (avg_score < -1.5) {
return("Far Below Expectations")
} else if (avg_score < -0.5) {
return("Below Expectations")
} else if (avg_score < 0.5) {
return("Meeting Expectations")
} else if (avg_score < 1.5) {
return("Above Expectations")
} else {
return("Far Above Expectations")
}
}
# Define site-specific reference periods (first 5 years)
site_reference_years <- kelp_data %>%
group_by(site) %>%
arrange(year) %>%
summarize(
first_year = min(year),
reference_years = list(sort(unique(year))[1:min(5, length(unique(year)))]),
n_ref_years = min(5, length(unique(year))),
.groups = "drop"
)
cat("Site-specific reference periods:\n")
for (i in 1:nrow(site_reference_years)) {
cat(sprintf(
" %s: %s (n=%d years)\n",
site_reference_years$site[i],
paste(site_reference_years$reference_years[[i]], collapse = ", "),
site_reference_years$n_ref_years[i]
))
}
cat("\n")
# Get current year
current_year <- max(kelp_data$year)
cat("Current assessment year:", current_year, "\n\n")
# Build reference data for each site
reference_data_list <- list()
for (i in 1:nrow(site_reference_years)) {
site_name <- site_reference_years$site[i]
ref_years <- site_reference_years$reference_years[[i]]
dom_sp_code <- dominant_species %>% filter(site == site_name) %>% pull(dom_sp)
dom_sp_name <- dominant_species %>%
filter(site == site_name) %>%
pull(dominant)
site_ref_raw <- kelp_data %>%
filter(site == site_name, year %in% ref_years)
# Extract correct column based on dominant species
if (dom_sp_code == "macro") {
ref_values <- site_ref_raw$canopy_macro_m2_tidecor
ref_median <- median(site_ref_raw$canopy_macro_m2_tidecor, na.rm = TRUE)
} else if (dom_sp_code == "nereo") {
ref_values <- site_ref_raw$canopy_nereo_m2_tidecor
ref_median <- median(site_ref_raw$canopy_nereo_m2_tidecor, na.rm = TRUE)
}
reference_data_list[[i]] <- data.frame(
site = site_name,
dominant = dom_sp_name,
ref_median = ref_median,
ref_values = I(list(ref_values)),
n_obs = nrow(site_ref_raw)
)
}
reference_data <- bind_rows(reference_data_list)
# Get current year data
current_data_list <- list()
for (i in 1:nrow(dominant_species)) {
site_name <- dominant_species$site[i]
dom_sp_code <- dominant_species$dom_sp[i]
dom_sp_name <- dominant_species$dominant[i]
site_current_raw <- kelp_data %>%
filter(site == site_name, year == current_year)
if (dom_sp_code == "macro") {
current_val <- site_current_raw$canopy_macro_m2_tidecor[1]
} else if (dom_sp_code == "nereo") {
current_val <- site_current_raw$canopy_nereo_m2_tidecor[1]
}
current_data_list[[i]] <- data.frame(
site = site_name,
dominant = dom_sp_name,
current_value = current_val
)
}
current_data <- bind_rows(current_data_list)
# Calculate site-level status
site_assessment <- current_data %>%
left_join(
reference_data %>% select(site, ref_median, ref_values),
by = "site"
) %>%
left_join(site_reference_years %>% select(site, reference_years), by = "site")
for (i in 1:nrow(site_assessment)) {
current_val <- site_assessment$current_value[i]
ref_median <- site_assessment$ref_median[i]
ref_vals <- site_assessment$ref_values[[i]]
score <- calculate_score_from_percentile(current_val, ref_vals)
site_assessment$status_pct[i] <- (current_val / ref_median) * 100
if (!is.na(current_val) && length(na.omit(ref_vals)) >= 2) {
site_assessment$percentile[i] <- ecdf(na.omit(ref_vals))(current_val) * 100
} else {
site_assessment$percentile[i] <- NA_real_
}
site_assessment$score[i] <- score
}
site_status_final <- site_assessment %>%
mutate(
status_category = sapply(score, translate_score_to_category),
ref_years_display = sapply(reference_years, function(x) {
paste(x, collapse = ", ")
})
) %>%
select(
site,
dominant,
ref_years_display,
current_value,
ref_median,
status_pct,
percentile,
score,
status_category
)
cat("=== SITE-LEVEL STATUS ===\n")
print(site_status_final)
cat("\n")
# Species-level summary (site-averaged)
species_summary <- site_status_final %>%
group_by(dominant) %>%
summarize(
N_sites = n(),
N_valid_scores = sum(!is.na(score)),
Average_score = mean(score, na.rm = TRUE),
Median_status_pct = median(status_pct, na.rm = TRUE),
Status_category = translate_score_to_category(mean(score, na.rm = TRUE)),
.groups = "drop"
) %>%
rename(Species = dominant)
cat("=== SPECIES-LEVEL STATUS (SITE-AVERAGED) ===\n")
print(species_summary)
cat("\n")
# Pooled species analysis
pooled_analysis_list <- list()
for (sp in unique(dominant_species$dominant)) {
species_sites <- dominant_species %>% filter(dominant == sp) %>% pull(site)
dom_sp_code <- dominant_species %>%
filter(dominant == sp) %>%
pull(dom_sp) %>%
first()
# Get reference data
species_ref_data <- kelp_data %>%
filter(site %in% species_sites) %>%
left_join(
site_reference_years %>% select(site, reference_years),
by = "site"
) %>%
rowwise() %>%
filter(year %in% reference_years[[1]]) %>%
ungroup()
# Get current data
species_current_data <- kelp_data %>%
filter(site %in% species_sites, year == current_year)
if (dom_sp_code == "macro") {
ref_values <- species_ref_data$canopy_macro_m2_tidecor
current_values <- species_current_data$canopy_macro_m2_tidecor
} else {
ref_values <- species_ref_data$canopy_nereo_m2_tidecor
current_values <- species_current_data$canopy_nereo_m2_tidecor
}
ref_values_clean <- na.omit(ref_values)
current_values_clean <- na.omit(current_values)
# Calculate scores
scores <- sapply(current_values_clean, function(val) {
percentile <- ecdf(ref_values_clean)(val) * 100
if (percentile >= 90) {
return(2)
} else if (percentile >= 75) {
return(1)
} else if (percentile >= 25) {
return(0)
} else if (percentile >= 10) {
return(-1)
} else {
return(-2)
}
})
avg_score <- mean(scores)
pooled_analysis_list[[sp]] <- data.frame(
Species = sp,
N_sites = length(species_sites),
N_ref_obs = length(ref_values_clean),
N_current_obs = length(current_values_clean),
Ref_median = median(ref_values_clean),
Current_median = median(current_values_clean),
Percent_of_baseline = (median(current_values_clean) /
median(ref_values_clean)) *
100,
Average_score_pooled = avg_score,
Status_category_pooled = translate_score_to_category(avg_score)
)
}
pooled_species_summary <- bind_rows(pooled_analysis_list)
cat("=== SPECIES-LEVEL STATUS (POOLED) ===\n")
print(pooled_species_summary)
cat("\n")
# Save status results
write.csv(
site_status_final,
file.path(output_dir, "site_status.csv"),
row.names = FALSE
)
write.csv(
species_summary,
file.path(output_dir, "species_status_site_averaged.csv"),
row.names = FALSE
)
write.csv(
pooled_species_summary,
file.path(output_dir, "species_status_pooled.csv"),
row.names = FALSE
)
cat("✓ Status assessment results saved\n\n")
```
The status assessment provides a scientifically defensible framework for communicating kelp forest condition to stakeholders and managers. By grounding assessments in site-specific historical context, this approach avoids the pitfalls of arbitrary thresholds while remaining intuitive and actionable.
# Visualizations
Visual representations of trends and status are essential for effective communication with diverse stakeholders. The visualization workflow generates bar plots with trend lines, reference baselines, and uncertainty bounds for each monitoring site.
```{r}
#| label: visualizations
#| fig-width: 12
#| fig-height: 8
cat("=== CREATING VISUALIZATIONS ===\n\n")
# Create bar plots with trend lines for each site (long-term)
cat("Creating long-term trend visualizations...\n")
for (i in 1:nrow(sites_summary)) {
site_name <- sites_summary$site[i]
# Get site data
site_data <- kelp_data %>%
filter(site == site_name) %>%
arrange(year) %>%
select(year, kelp_extent, dominant) %>%
filter(!is.na(kelp_extent))
if (nrow(site_data) < 3) {
cat(sprintf(" ⚠ Skipping %s (insufficient data)\n", site_name))
next
}
# Get trend results
trend_result <- longterm_trends_df %>% filter(site == site_name)
# Determine color based on dominant species
bar_color <- ifelse(
site_data$dominant[1] == "Macrocystis",
"darkgreen",
"brown"
)
# Create plot
# png(
# file.path(output_dir, paste0("longterm_trend_", site_name, ".png")),
# width = 800,
# height = 600
# )
par(mar = c(5, 5, 4, 2))
# Bar plot
bp <- barplot(
site_data$kelp_extent,
names.arg = site_data$year,
col = bar_color,
border = "black",
main = paste(site_name, "- Long-term Trend"),
xlab = "Year",
ylab = "Kelp Canopy Area (m²)",
las = 2,
ylim = c(0, max(site_data$kelp_extent) * 1.1)
)
# Add trend line if significant
if (trend_result$method == "Linear regression") {
lm_model <- lm(kelp_extent ~ year, data = site_data)
pred_vals <- predict(lm_model)
lines(bp, pred_vals, col = "red", lwd = 2, lty = 2)
} else if (trend_result$method == "Mann-Kendall") {
# For Mann-Kendall, draw Sen's slope line
slope <- trend_result$slope
intercept <- median(site_data$kelp_extent) - slope * median(site_data$year)
pred_vals <- intercept + slope * site_data$year
lines(bp, pred_vals, col = "red", lwd = 2, lty = 2)
}
# Add statistics text
stat_text <- sprintf(
"Method: %s\nSlope: %.3f m²/yr\np-value: %.4f%s\n%s",
trend_result$method,
trend_result$slope,
trend_result$p_value,
ifelse(
!is.na(trend_result$r_squared),
sprintf("\nR² = %.3f", trend_result$r_squared),
""
),
trend_result$direction
)
legend("topright", legend = stat_text, bty = "n", cex = 0.9)
# dev.off()
cat(sprintf(" ✓ %s\n", site_name))
bp
}
cat("\n✓ All visualizations created\n\n")
```
Each visualization includes color-coding by dominant species (dark green for *Macrocystis*, brown for *Nereocystis*), trend lines derived from the appropriate statistical method, and summary statistics in the legend. These plots provide an immediate visual assessment of site condition and trajectory.
# Summary and Outputs
The complete analysis workflow produces a suite of data products suitable for both technical analysis and stakeholder communication.
```{r}
#| label: summary
cat("=== ANALYSIS COMPLETE ===\n\n")
cat("OUTPUT FILES:\n")
cat(" 1. longterm_trends.csv - Long-term trend statistics for each site\n")
cat(" 2. recent_trends.csv - Recent (5-year) trend statistics for each site\n")
cat(" 3. site_status.csv - Current status assessment for each site\n")
cat(
" 4. species_status_site_averaged.csv - Species-level status (site-averaged)\n"
)
cat(
" 5. species_status_pooled.csv - Species-level status (pooled observations)\n"
)
cat(
" 6. longterm_trend_[site].png - Bar plot with trend line for each site\n\n"
)
cat("SUMMARY STATISTICS:\n")
cat(sprintf(" Total sites analyzed: %d\n", nrow(sites_summary)))
cat(sprintf(" Date range: %d-%d\n", min(kelp_data$year), max(kelp_data$year)))
cat(sprintf(" Current assessment year: %d\n", current_year))
cat("\nLONG-TERM TRENDS:\n")
trend_summary <- table(longterm_trends_df$direction)
for (direction in names(trend_summary)) {
cat(sprintf(" %s: %d sites\n", direction, trend_summary[direction]))
}
cat("\nRECENT TRENDS:\n")
recent_trend_summary <- table(recent_trends_df$direction)
for (direction in names(recent_trend_summary)) {
cat(sprintf(" %s: %d sites\n", direction, recent_trend_summary[direction]))
}
cat("\nSTATUS ASSESSMENT:\n")
status_summary <- table(site_status_final$status_category)
for (category in names(status_summary)) {
cat(sprintf(" %s: %d sites\n", category, status_summary[category]))
}
cat("\n=== WORKFLOW COMPLETE ===\n")
```
## Output File Descriptions
The workflow generates six types of output files:
1. **longterm_trends.csv**: Contains site-level statistics for the entire time series, including the statistical method used (linear regression or Mann-Kendall), slope estimate, p-value, and trend direction.
2. **recent_trends.csv**: Parallel structure to long-term trends but focused on the most recent five years, enabling comparison of recent versus historical trajectories.
3. **site_status.csv**: Site-level status classifications including reference period definitions, current values, baseline medians, percentile ranks, and categorical assessments.
4. **species_status_site_averaged.csv**: Species-level summaries derived by averaging scores across all sites dominated by each species, providing a regional perspective on species-specific patterns.
5. **species_status_pooled.csv**: Alternative species-level summary that pools all observations across sites before calculating status, emphasizing regional patterns while de-emphasizing site-level heterogeneity.
6. **longterm_trend_[site].png**: Individual visualization for each site showing annual canopy area, trend line, and summary statistics.
These outputs support both detailed site-level management and broader regional assessments, facilitating multi-scale conservation planning.
# Session Information
For computational reproducibility and troubleshooting, the following session information documents the R version, operating system, and package versions used in this analysis.
```{r}
#| label: session-info
#| code-fold: true
sessionInfo()
```