Stratified Sampling Validation • rfaR

Purpose

Validate that stratified_sampler() correctly creates stratified bins and weights for all three supported distribution transformations: Extreme Value Type I (EV1), Normal, and Uniform. Tests cover bin boundary accuracy against the stratified_example dataset, weight accuracy, weight summation, output dimensions, and bin ordering.

Background

Stratified sampling divides the probability space into bins to ensure adequate coverage of rare events. The choice of transformation determines how bins are distributed:

Uniform — Equal probability width per bin.
Normal — Uniform spacing in standard normal (z-score) space.
EV1 (Gumbel) — Spacing in Gumbel reduced variate space.

Test 1: Bin Spacing Visualization

The following plot illustrates how each transformation distributes bin boundaries along the z-variate axis. Tick marks show the z-lower boundary of each bin. EV1 stratification produces wider spacing at common events (left) and tighter spacing at rare events (right), concentrating sampling effort where it matters most for dam safety/risk analysis.

ev1 <- stratified_sampler(dist = "EV1")
normal <- stratified_sampler(dist = "Normal")
uniform <- stratified_sampler(dist = "Uniform")

ev1 <- stratified_sampler(dist = "EV1")
normal <- stratified_sampler(dist = "Normal")
uniform <- stratified_sampler(dist = "Uniform")

# Shared x-axis range
xlim <- range(c(ev1$Zlower, ev1$Zupper,
                normal$Zlower, normal$Zupper,
                uniform$Zlower, uniform$Zupper))

bin_spacing <- bind_rows(
  data.frame(dist = "EV1",     xmin = ev1$Zlower,     xmax = ev1$Zupper),
  data.frame(dist = "Normal",  xmin = normal$Zlower,  xmax = normal$Zupper),
  data.frame(dist = "Uniform", xmin = uniform$Zlower, xmax = uniform$Zupper)) |>
  mutate(dist = factor(dist, levels = c("Uniform", "Normal", "EV1")))

# Plot
aep_breaks <- c(9.9e-1, 9e-1, 5e-1, 1e-1, 1e-2, 1e-3, 1e-4, 1e-5, 1e-6, 1e-7, 1e-8, 1e-9, 1e-10)
aep_labels <- c("0.99", "0.9", "0.5", "0.1", "1/100", "1/1K",
           "1/10K", "1/100K", "1/1M", "1/10M", "1/100M", "1/1B", "1/10B")
zbreaks <- qnorm(1 - aep_breaks)

ggplot(bin_spacing) +
  geom_rect(aes(xmin = xmin, xmax = xmax,
                ymin = as.numeric(dist) - 0.25,
                ymax = as.numeric(dist) + 0.25,
                fill = dist),
            color = "grey70", linewidth = .7, alpha = 0.8) +
  scale_y_continuous(breaks = 1:3, labels = levels(bin_spacing$dist)) +
  scale_x_continuous(breaks = zbreaks, labels = aep_labels)+
  scale_fill_manual(values = c("Uniform" = "#008B45FF",
                               "Normal"  = "#EE0000FF",
                               "EV1"     = "#3B4992FF")) +
  labs(x = "Annual Exceedance Probability (AEP)",
       y = NULL,
       title = "Stratified Bin Spacing by Distribution Transformation",
       fill = NULL) +
  theme_bw() +
  theme(legend.position  = "top",
        panel.grid.minor = element_blank(),
        panel.grid.major.y = element_blank())

The weight distribution shows how each transformation allocates probability across bins. Uniform produces nearly equal weights. Normal concentrates weight in central bins. EV1 places the most weight in the first few bins (frequent events) with rapidly decreasing weight for rare event bins, reflecting the heavy-tailed nature of flood distributions.

Test 2: Z-Lower Bin Boundaries Match Validation Data

Compare computed z-lower bin boundaries against the stratified_example dataset.

ev1_idx <- which(example_stratified$distribution == "ev1")
normal_idx <- which(example_stratified$distribution == "normal")
uniform_idx <- which(example_stratified$distribution == "uniform")

# Convert validation data to z-space
z_lower_valid <- numeric(nrow(example_stratified))
z_lower_valid[uniform_idx] <- qnorm(1 - example_stratified$lower[uniform_idx])
z_lower_valid[normal_idx] <- example_stratified$lower[normal_idx]
z_lower_valid[ev1_idx] <- qnorm(exp(-exp(-example_stratified$lower[ev1_idx])))

diff_ev1 <- ev1$Zlower - z_lower_valid[ev1_idx]
diff_normal <- normal$Zlower - z_lower_valid[normal_idx]
diff_uniform <- uniform$Zlower - z_lower_valid[uniform_idx]

Z-Lower Bin Boundary Differences vs. Validation Data
Distribution	Max Abs. Difference
EV1	9.0e-10
Normal	5.0e-10
Uniform	4.9e-09

Acceptance Criterion

Metric	Value
Tolerance	10^{-6}
Result	PASS

Test 3: Weights Match Validation Data

Compare computed bin weights against the stratified_example dataset.

wt_diff_ev1 <- ev1$Weights - example_stratified$weight[ev1_idx]
wt_diff_normal <- normal$Weights - example_stratified$weight[normal_idx]
wt_diff_uniform <- uniform$Weights - example_stratified$weight[uniform_idx]

Weight Differences vs. Validation Data
Distribution	Max Abs. Difference
EV1	4.53e-08
Normal	3.79e-08
Uniform	5.00e-10

Acceptance Criterion

Metric	Value
Tolerance	10^{-6}
Result	PASS

Test 4: Weights Sum to 1

sum_ev1 <- sum(ev1$Weights)
sum_normal <- sum(normal$Weights)
sum_uniform <- sum(uniform$Weights)

Weight Summation Check
Distribution	Sum of Weights	\|1 - Sum\|
EV1	1	0
Normal	1	0
Uniform	1	0

Acceptance Criterion

Metric	Value
Tolerance	10^{-10}
Result	PASS

Test 5: Output Dimensions

Verify that stratified_sampler() returns the correct number of bins, events, and vector lengths.

test_custom <- stratified_sampler(Nbins = 10, Mevents = 100)

dim_results <- data.frame(
  Parameter = c("Nbins", "Mevents", "length(normOrd)", "length(Zlower)",
                "length(Zupper)", "length(Weights)"),
  Expected = c(10, 100, 1000, 10, 10, 10),
  Actual = c(test_custom$Nbins, test_custom$Mevents, length(test_custom$normOrd),
             length(test_custom$Zlower), length(test_custom$Zupper), length(test_custom$Weights))
)

Output Dimension Verification (Nbins=10, Mevents=100)
Parameter	Expected	Actual	Pass
Nbins	10	10	TRUE
Mevents	100	100	TRUE
length(normOrd)	1000	1000	TRUE
length(Zlower)	10	10	TRUE
length(Zupper)	10	10	TRUE
length(Weights)	10	10	TRUE

Acceptance Criterion

Metric	Value
Result	PASS

Test 6: Bin Ordering (Zlower < Zupper)

Verify that all bin lower bounds are strictly less than upper bounds.

Bin Ordering Verification
Distribution	All Zlower < Zupper
EV1	TRUE
Normal	TRUE
Uniform	TRUE

Acceptance Criterion

pass_order <- all(order_results$All_Ordered)

Metric	Value
Result	PASS

Summary

Test	Description	Result
1	Bin spacing visualization	(Visual)
2	Z-lower boundaries match validation data	PASS
3	Weights match validation data	PASS
4	Weights sum to 1	PASS
5	Output dimensions correct	PASS
6	Bin ordering (Zlower < Zupper)	PASS