Skip to main content
US Army Corps of EngineersInstitute for Water Resources, Risk Management Center

Working with RMC-BestFit

In this quick start guide, we will demonstrate how to create a project, enter input data, select a model using the distribution fitting analysis, and perform a Bayesian estimation analysis.

Create a Project

To begin, let’s create a new project. When you open RMC-BestFit, a Blank Project file is automatically created, as shown in Figure. The blank project is stored in your local temp directory. You may begin working with the blank project file immediately.

RMC-BestFit Blank Project.
Figure: RMC-BestFit Blank Project.

To save changes to the blank project, click the Save button on the tool bar or under the File menu. This will open the Save Project As… prompt. Enter the desired file name and click the save button in the bottom right. Now you are ready to continue working with RMC-BestFit.

Save Project As….
Figure: Save Project As….

A new project can also be created by clicking New Project… under the File menu, or by clicking the New Project button located on the tool bar as shown in Figure and Figure. If this is the first time you're using RMC-BestFit, your recent projects list will be empty.

Create New Project from the File Menu.
Figure: Create New Project from the File Menu.
Create New Project from the Tool Bar.
Figure: Create New Project from the Tool Bar.

The project properties will be shown in the Properties Window, which is typically located on the right-hand side of the main window. You may edit the project name and description.

Project Properties.
Figure: Project Properties.

Input Data

In RMC-BestFit, the input data must be entered as block annual maxima, which are assumed to be independent and identically distributed (iid). RMC-BestFit supports three different data types:

  1. Systematic Data: Data that are collected at regular, prescribed intervals under a defined protocol. In a maximum likelihood context, these values are treated as exact measurements. Low outlier tests can be performed on the systematic data to ensure homogeneity.
  2. Interval Data: Data whose magnitudes are not known exactly, but are known to fall within a range or interval. In a maximum likelihood context, these values are treated as interval-censored.
  3. Perception Thresholds: Data points that occurred during a period of years and have magnitudes that are below a threshold value, but unknown by how much. In a maximum likelihood context, these values are treated as left-censored.

The Distribution Fitting Analysis chapter provides greater detail on how these data types are treated in a likelihood context.

Create New Input Data

Let’s begin by creating a new input dataset. Right-click on the Input Data folder header and click Create New… as shown in Figure. Next, give the Input Data a name and click OK.

Create New Input Data.
Figure: Create New Input Data.

Once the new Input Data is created, it will be automatically opened into the Tabbed Documents area, and the input data properties will be displayed in the Properties Window. From here, you can set the Description, Unit Label, Plotting Position Parameter, and perform a Low Outlier Test.

For this example, we will be using 3-day inflow volumes for Blakely Mountain Dam, which is located near Hot Springs, Arkansas. This dataset includes systematic data, historical data dating back to 1870, and paleoflood information dating back 5,000 years. As shown in Figure, set the Unit Label to be “3-Day Flow (CFS)” and you will notice that this automatically updates the Y-axis labels on the Chronology and Frequency plots.

Input Data Properties.
Figure: Input Data Properties.

Systematic Data

Systematic data are collected at regular, prescribed intervals under a defined protocol. In a maximum likelihood context, these values are treated as exact measurements. The systematic dataset for Blakely Mountain Dam is provided in Table. Note that the systematic data does not need to be continuous; e.g., the Blakely Mountain dataset has missing data from 1931 to 1935. Gaps in data can be accounted for using thresholds, which will be demonstrated later in the Perception Thresholds section.

To enter data, first click the Add Row(s) button located on the left side of the table tool bar. This will add a blank row to the bottom of the table. Next, you can either manually enter your dataset, or copy and paste the dataset into the table as shown in Figure. Once you have entered all of the data, you will see that the plotting positions are automatically calculated and the data is plotted in the Chronology and Frequency plot as illustrated in Figure.

Add Systematic Data.
Figure: Add Systematic Data.
Table: Systematic Dataset of 3-Day Inflows at Blakely Mountain Dam near Hot Springs, Arkansas.
YearFlow (CFS)YearFlow (CFS)YearFlow (CFS)
192357,985195917,032199030,122
192414,607196045,014199132,586
19258,403196112,637199234,357
192623,479196217,037199332,586
192766,62919639,074199445,065
192830,925196436,066199529,547
192925,046196515,261199614,310
193037,772196623,886199744,107
19369,074196722,432199815,676
193721,382196855,536199918,922
193843,769196943,938200017,981
193950,678197022,490200140,097
19407,360197126,955200242,471
194115,617197253,417200317,451
194223,189197336,920200414,964
194312,637197442,584200520,798
194430,064197539,587200619,157
194552,914197612,996200733,729
194622,607197730,294200858,319
194714,19119788,952200935,041
194819,098197928,109201032,700
194935,383198012,637201135,212
195025,046198115,142201234,585
195113,355198216,624201346,921
195220,330198381,464201415,795
195325,701198413,952201542,189
195426,897198540,946201655,982
195520,019198629,317201734,585
195618,240198724,930201848,324
195721,084198842,076--
195828,886198926,551--
Paste Data into Table.
Figure: Paste Data into Table.
Systematic Data.
Figure: Systematic Data.

You can view the data as a chronology plot or as a frequency plot by toggling the radio buttons located underneath the plot as shown in Figure. Clicking the Chronology Plot radio button will display the data in chronological order, with the years on the X-axis. The Frequency Plot radio button will display the data as a nonparametric frequency plot based on the Hirsch-Stedinger plotting positions.

View Chronology or Frequency Plot.
Figure: View Chronology or Frequency Plot.

Data Table Features

You can edit the data table by using the tool bar located above the table, or by right-clicking within the table. You can interact with the table in the following ways:

  • Add rows(s) to the bottom of the table.
  • Insert row(s) into the table.
  • Delete row(s) from the table.
  • Select all table cells.
  • Copy the selected cells.
  • Copy the selected cells with the table headers.
  • Paste from the clipboard into the table.
  • Sort a column in ascending or descending order.
  • Clear all table sorting.

You can sort a column by right-clicking on the column header as shown below.

Sort Input Data Tables.
Figure: Sort Input Data Tables.

Data Validation

The input data tables have built-in validation. The Systematic Data have the following requirements:

  • The year must be unique.
  • The year must be between -100,000 and +100,000.
  • The value must be non-negative, or greater than or equal to zero.

When you enter invalid data, the table cell will turn red, and provide a tooltip indicating the source of the error. In addition, an error message will appear in the Message Window indicating that you must resolve all errors in the data table.

Input Data Validation.
Figure: Input Data Validation.

Plotting Positions

The input data can be plotted as a Chronology Plot or as a nonparametric Frequency Plot as shown in Figure. A frequency plot or probability plot is a plot of magnitude versus a probability. The probability assigned to each data point is commonly determined using a plotting position formula. Plotting positions are a method for creating an empirical frequency. The formula computes the exceedance probability of a data point based on the rank of the data point in a sample of a given size. The plotting positions typically have significant uncertainty due to sampling error resulting from small sample sizes.

A rank-order method is used to plot the annual maxima data. This involves ordering the data from the largest event to the smallest event, assigning a rank of 1 to the largest event and a rank of n to the smallest event, and using the rank (i) of the event to obtain a probability plotting position. Many plotting position formulae are special cases of the general formula:

Pi=iαn+12αP_i = \frac{i - α}{n + 1 - 2α}

where:

i = the rank of the event
n = the sample size
α = a constant greater than or equal to 0 and less than 1
Pi = the exceedance probability for an event with rank i

The value of α determines how well the calculated plotting positions will fit a given theoretical probability distribution.

RMC-BestFit uses the Hirsch-Stedinger (H-S) plotting position formula (Hirsch and Stedinger, 1987) [?] (U.S. Geological Survey, 2018) [?], which is an extension of the general formula above that is also capable of incorporating threshold-censored data. The H-S plotting positions are used to visually and quantitatively assess the goodness-of-fit of the fitted distributions (see the Goodness-of-Fit Measures section for more detail).

You can choose from the following parameter options:

  • Weibull (α = 0.0): Recommended as the default value because it is unbiased for all distributions.
  • Median (α = 0.3175): Provides median exceedance probabilities for all distributions.
  • Blom (α = 0.375): Recommended for Normal, Gamma, 2-parameter Log-Normal, 3-parameter Log-Normal, and Log-Pearson Type III distributions.
  • Cunnane (α = 0.40): Recommended for Generalized Extreme Value and Log-Gumbel distributions, approximately quantile unbiased.
  • Gringorten (α = 0.44): Recommended for Exponential, Gumbel and Weibull distributions.
  • Hazen (α = 0.50): Recommended when the parameters of the parent distribution are unknown.

As you can see, each plotting position parameter has a different motivation. Some attempt to achieve unbiasedness in quantile estimates across multiple distributions, while other formulas are optimized for use with a particular theoretical probability distribution. Choosing a plotting position parameter is similar to choosing a probability distribution to represent a particular set of data. It is often better to select a plotting position parameter that is flexible and makes the fewest assumptions. For this reason, the Weibull parameter is set as the default value in RMC-BestFit, which is consistent with current practice.

When you hover over the plotting position drop-down, you will see a tooltip providing the recommended use for the given parameter as shown below.

Set Plotting Position Parameter.
Figure: Set Plotting Position Parameter.

Low Outlier Test

For the distribution fitting or Bayesian estimation analyses to be theoretically valid, the input data must be independent and identically distributed. As a means to ensure homogeneity, RMC-BestFit provides the Multiple Grubbs-Beck test (MGBT) [?] for low outliers, which is consistent with the Bulletin 17C guidelines [?]. In RMC-BestFit, the MGBT is only applied to systematic data, which are considered exact measurements. Interval- and threshold-censored data are not included in the test.

To run the MGBT, make sure the Multiple Grubbs-Beck Test checkbox is checked, and click the Run Test command button. When the test is complete, a message box will appear that reports how many low outliers were identified. In addition, the MGBT Critical Value will be displayed in the Threshold Value textbox. There should be zero low outliers identified for this dataset, so the threshold value should also be set to zero as shown in Figure.

Run Multiple Grubbs-Beck Low Outlier Test.
Figure: Run Multiple Grubbs-Beck Low Outlier Test.

If desired, you also have the option to enter a value for the low outlier threshold. When you run the test, any data value below this threshold will be identified as a low outlier. The Threshold Value cannot be set to a value that would censor more than 50 percent of the values within the data set. To use this option, uncheck the Multiple Grubbs-Beck Test checkbox and enter the preferred value.

Values that are identified as low outliers will be checked in the Is Low Outlier column of the systematic data table. The low outliers will be displayed as a red X in the chronology and frequency plots.

The low outlier threshold value identified by the MGBT or manual threshold value is automatically treated as a left-censored threshold in the distribution fitting and Bayesian estimation analyses. For example, if the low outlier threshold value is 8,000 and there are eight data points below the threshold identified as low outliers, then this is treated equivalent to a left-censored threshold with eight values below and zero above. However, RMC-BestFit does not include the low outlier threshold in the H-S plotting position routine. Conceptually, the low outlier test removes exact data points and replaces them with a threshold-censored value. This represents a loss in information. However, if this low outlier threshold is included in the H-S routine, then it will make the plotting positions rarer, signaling an increase in information. This is counterintuitive, and for this reason RMC-BestFit does not include the low outlier threshold in the H-S plotting position routine.

Interval Data

Interval data have magnitudes that are not known exactly, but are known to fall within a range or interval. A paleoflood study was performed for the Blakely Mountain watershed, where two major historical floods were identified: one occurred in 1882 and another occurred sometime around the year 1020.

Table: Interval Data for 3-Day Inflows at Blakely Mountain Dam near Hot Springs, Arkansas.
YearLowerMost LikelyUpper
1020105,000110,000115,000
188266,00076,00086,000

You can add the interval data in the same manner as was done for the systematic data. First click the Add Row(s) button located on the left side of the table tool bar. This will add a blank row to the bottom of the interval data table. Next, you can either manually enter the data, or copy and paste the interval data into the table. Once you have entered the data, you will see that the plotting positions are automatically calculated and the intervals are plotted in the Chronology and Frequency plot as vertical bars (see Figure).

Interval Data.
Figure: Interval Data.

The Interval Data have the following requirements:

  • The year must be unique.
  • The year must be between -100,000 and +100,000.
  • The year cannot overlap with any data point in the Systematic Data table.
  • The lower, most likely, and upper values must be non-negative, or greater than or equal to zero.
  • The lower must be less than the most likely value, and the upper must be greater than the most likely value.

Perception Thresholds

The term perception threshold originates from Bulletin 17C (U.S. Geological Survey, 2018) [?]. For the purposes of RMC-BestFit, a Perception Threshold defines a threshold level over a period of years. Data points that occurred during the threshold period have magnitudes that are below the threshold value, but it is unknown by how much. Conversely, we can also say that if an event occurred during the threshold period, and it had a magnitude larger than the threshold value, then we would have evidence it occurred; i.e., data points larger than the threshold would have been perceived by us.

There were four perception thresholds identified for the Blakely Mountain Dam 3-day inflow volume dataset (see Table). The paleoflood study determined that 3-day inflows have not exceeded 220,000 cfs in the last ~5,000 years. Flood volumes did not exceed 104,000 cfs during the years between the paleoflood in 1019 and 1869. From 1870 to the 1922, which is the beginning of the systematic dataset, 3-day flood volumes did not exceed 65,000 cfs, except for the large 1882 event. Finally, 3-day flood volumes during the missing years of systematic data from 1931 to 1935, also did not exceed 65,000 cfs.

Table: Threshold Data for 3-Day Inflows at Blakely Mountain Dam near Hot Springs, Arkansas.
Start YearEnd YearValue
-28901018220,000
10191869104,000
1870192265,000
1931193565,000

Threshold data is entered in the same manner as was done for the systematic and interval data. Click the Add Row(s) button located on the left side of the table tool bar. Next, you can either manually enter the data, or copy and paste the threshold data into the table. Once you have entered the data, you will see that the plotting positions are automatically calculated and the thresholds are plotted in the Chronology plot as a shaded area (see Figure).

Perception Thresholds.
Figure: Perception Thresholds.

The Perception Threshold Data have the following requirements:

  • The start and end years must be between -100,000 and +100,000.
  • The start year must be less than or equal to the end year.
  • The start and end years of the thresholds must be entered in ascending order.
  • The threshold periods cannot overlap with each other.
  • The threshold values must be non-negative, or greater than or equal to zero.

Summary Statistics

RMC-BestFit provides summary statistics for the systematic data and for all of the data, including low outliers, intervals, and perception thresholds (see Figure). Summary statistics for the systematic data are based on the sample moments and percentile estimates, while summary statistics for all data are based on the nonparametric H-S plotting positions. The central moments of the nonparametric distribution are estimated using numerical integration. The nonparametric distribution functions are provided in Equation to Equation. Percentiles are estimated using the inverse cumulative distribution function as shown in Equation.

f(x)=pi+1pixi+1xif(x) = \frac{p_{i+1} - p_i}{x_{i+1} - x_i}

where f(x)f(x) is the probability density function (PDF) of the variable XX; there is an array of continuous values {x}={x1,x2,,xn}\{x\} = \{x_1, x_2, \ldots, x_n\} for xix<xi+1x_i \leq x < x_{i+1} with non-exceedance probabilities {p}={p1,p2,,pn}\{p\} = \{p_1, p_2, \ldots, p_n\} with 0pi10 \leq p_i \leq 1.

F(x)=pi+(pi+1pi)(xxixi+1xi)F(x) = p_i + \left(p_{i+1} - p_i \right) \left(\frac{x - x_i}{x_{i+1} - x_i}\right)
F1(p)=xi+(xi+1xi)(ppipi+1pi)F^{-1}(p) = x_i + (x_{i+1} - x_i) \left( \frac{p - p_i}{p_{i+1} - p_i} \right)

where F(x)F(x) is the cumulative distribution function (CDF) of the variable XX ;F1(p)F^{-1}(p) is the inverse CDF; and there is an array of continuous values {x}={x1,x2,,xn}\{x\} = \{x_1, x_2, \ldots, x_n\} for xixxi+1x_i \leq x \leq x_{i+1} with non-exceedance probabilities {p}={p1,p2,,pn}\{p\} = \{p_1, p_2, \ldots, p_n\} with 0pi10 \leq p_i \leq 1 and pippi+1p_i \leq p \leq p_{i+1}.

The summary statistics provide a preview for what to expect when performing distribution fitting or Bayesian estimation. For example, in the case of Blakely Mountain Dam, we can see that the inclusion of historical and paleoflood data slightly increased the skewness of the data. The systematic data has a skewness (of log) of -0.2388; whereas the nonparametric analysis, which includes all of the data, has a skewness (of log) of -0.2151. We should expect to see a similar behavior when fitting the Log-Pearson Type III distribution.

Input Data Summary Statistics.
Figure: Input Data Summary Statistics.

Distribution Fitting Analysis

The Distribution Fitting Analysis in RMC-BestFit uses the method of Maximum Likelihood Estimation (MLE) to fit several univariate probability distributions to the user-specified Input Data. You can use the distribution fitting analysis results to inform model selection for use in the Bayesian estimation analysis. For each fitted distribution, RMC-BestFit provides three goodness-of-fit measures: the Akaike Information Criteria (AIC), the Bayesian Information Criteria (BIC), and Root-Mean Squared Error (RMSE). These measures indicate how well the distribution fits the input data, with a smaller value representing a better fit.

To fit distribution with RMC-BestFit, there are four steps required:

  • Define Input Data.
  • Run the fitting analysis.
  • Interpret the results.
  • Select a distribution to use in the Bayesian Estimation Analysis.

Further details of these steps are discussed in the following sections.

Create New Distribution Fitting Analysis

Let’s create a new Distribution Fitting Analysis. Right-click on the Distribution Fitting Analysis folder header and click Create New… as shown in Figure. Next, give the fitting analysis a name and click OK.

Create New Distribution Fitting Analysis.
Figure: Create New Distribution Fitting Analysis.

Once the new Distribution Fitting Analysis is created, it will be automatically opened into the Tabbed Documents area, and the fitting analysis properties will be displayed in the Properties Window. From here, you can set the Description, Input Data, Output Frequency Ordinates, and Fit Distributions.

Define Input Data

Click the Input Data drop-down and select the desired data for the fitting analysis as shown in Figure. You can set the Output Frequency Ordinates by clicking on the Options tab at the top of the Properties Window as shown in Figure. The output frequency ordinates are the annual exceedance probabilities (AEP) used for plotting the fitted distributions on the frequency plot. The default frequency ordinates range from 0.99 to 1E-6 AEP.

Distribution Fitting Analysis Properties.
Figure: Distribution Fitting Analysis Properties.
Distribution Fitting Analysis Output Frequency Ordinates.
Figure: Distribution Fitting Analysis Output Frequency Ordinates.

Run the Fitting Analysis

After you have selected the Input Data, click the Fit Distributions command button to run the distribution fitting analysis. The runtime typically takes less than a second. When the analysis is complete, you will see a table of goodness-of-fit measures and all of the distributions plotted on the Frequency Plot as shown in Figure.

Run the Distribution Fitting Analysis.
Figure: Run the Distribution Fitting Analysis.
Distribution Fitting Analysis Graphical Results.
Figure: Distribution Fitting Analysis Graphical Results.

Maximum Likelihood Estimation (MLE)

In the distribution fitting analysis, parameters are estimated using the MLE method. The MLE method formulates a likelihood function using sample data D=(X1,,Xn)D = (X_1, \dots, X_n) and the parameters θθ of the probability distribution, and solves for the value of the parameters that maximize the likelihood function (Rao and Hamed, 2000) [?] (Jongejan, 2018) [?]. The likelihood function gives the probability of the data conditional on the distribution parameters (Equation).

LS(Dθ)=i=1nsf(Xiθ)L_S(D \mid \theta) = \prod_{i=1}^{n_s} f(X_i \mid \theta)

where:

DD = the sample of systematically recorded annual discharge maxima (X1,,XnS)(X_1, \dots, X_{n_S}) f()f(∙) = the probability density function (PDF) of the variable XX

Censored data can be incorporated into the MLE method by augmenting the likelihood function. Left-censored threshold data has the following likelihood function:

LL(Dθ)=i=1nL(hk)F(X0θ)hkL_L(D \mid \theta) = \prod_{i=1}^{n_L} \binom{h}{k} F(X_0 \mid \theta)^{h - k}

where:

X0X_0 = the threshold hh = the threshold period kk = the number of observations that exceeded the threshold during the period (hk)(h \mid k) = the binomial coefficient F()F(∙) = the cumulative distribution function (CDF) of the variable X0X_0

The binomial coefficient can be dropped from Equation because it will be held constant as θθ is varied. Interval-censored data has the following likelihood function:

LI(Dθ)=i=1nI[F(XUiθ)F(XLiθ)]L_I(D \mid \theta) = \prod_{i=1}^{n_I} \left[ F(X_{U_i} \mid \theta) - F(X_{L_i} \mid \theta) \right]

where there are nIn_I observations known to lie between upper and lower bounds, XUX_U and XLX_L. The overall likelihood function is then constructed by multiplying the components:

L(Dθ)=LS(Dθ)LL(Dθ)LI(Dθ)L(D \mid \theta) = L_S(D \mid \theta) \cdot L_L(D \mid \theta) \cdot L_I(D \mid \theta)

These likelihood formulations for censored data are consistent with those presented in (Stedinger, 1983) [?], (Kuczera, 1999) [?], and (O'Connell et al., 2002) [?].

From the perspective of Bayesian estimation, MLE is a special case of maximum a posteriori (MAP) that assumes a uniform prior distribution for each model parameter. Therefore, if we assume uniform priors in the Bayesian Estimation Analysis, we will get the same posterior mode as the MLE method used in the Distribution Fitting Analysis (any differences in results would be attributed to convergence errors).

RMC-BestFit uses the Nelder-Mead method (also commonly called the downhill simplex method or amoeba method) to perform MLE for every distribution. The Nelder-Mead method finds the parameter set that maximizes the likelihood function using a direct search method.

Interpret the Results

Once the Distribution Fitting Analysis is complete, you should evaluate the results. RMC-BestFit provides goodness-of-fit measures and comparison graphs to help you evaluate the fits and select the best probability distribution to use in the Bayesian Estimation Analysis.

Goodness-of-Fit Measures

RMC-BestFit provides three goodness-of-fit measures: AIC, BIC, and RMSE. These measures indicate how well the distribution fits the input data, with a smaller value representing a better fit. The goodness-of-fit statistics are used for two purposes:

  1. Model selection is the process of picking one fitted distribution over another;
  2. Fit validation is the process of determining whether a fitted distribution agrees well with the data.

AIC and BIC are used for model selection among a finite set of models (the term model is synonymous with probability distribution). The model with the lowest AIC or BIC is preferred. When comparing multiple models, additional parameters often yield larger, optimized log-likelihood values. AIC and BIC penalize for more complex models, i.e., models with additional parameters. However, for BIC, the penalty is a function of the sample size, and so it is typically more severe than that of AIC. The formulas for AIC and BIC are shown in Equation and Equation, respectively. To address potential over-fitting, RMC-BestFit implements a correction for small sample sizes for AIC.

AIC=2k2ln(L^)+2k2+2knk1AIC = 2k - 2 \ln(\hat{L}) + \frac{2k^2 + 2k}{n - k - 1}
BIC=ln(n)k2ln(L^)BIC = \ln(n)k - 2 \ln(\hat{L})

where:

k = the number of parameters
n = the sample size
= the maximum value of the likelihood function for the model

RSME provides a measure for fit validation, with smaller values indicating a better fit. RMSE is computed based on how well the probability distribution agrees with the plotting positions of the input data. You can set the plotting position parameter based on preference or theoretical motives, so this measure has the potential to be biased. To minimize this issue, the default plotting position coefficient in the input data interface is set to Weibull (α = 0), which is unbiased. The formula for RMSE is as follows:

RMSE=i=1n(y^iyi)2nRMSE = \sqrt{\frac{\sum_{i=1}^{n} (\hat{y}_i - y_i)^2}{n}}

where:

n = the sample size
i = the predicted value for item i
yi = the observed value for step i

You can sort each column of the goodness-of-fit table in ascending or descending order by right-clicking the column header as shown in Figure. As you move your cursor over the table, a tooltip will appear showing the fitted parameters of the distribution as shown in Figure. You may check or uncheck distributions in this table to add or remove the fitted distributions from the comparison plots.

Distribution Fitting Analysis Goodness-of-Fit Table.
Figure: Distribution Fitting Analysis Goodness-of-Fit Table.
Check or Uncheck Distributions from the Goodness-of-Fit Table.
Figure: Check or Uncheck Distributions from the Goodness-of-Fit Table.

Comparison Plots

RMC-BestFit provides five types of graphs to help you visually assess the quality of the distribution fits. A comparison graph plots the input data and fitted distributions on the same graph, allowing you to visually compare them. The graphs allow you to determine whether the fitted distribution matches the input data in critical areas. For example, for flood frequency analyses, it is important to have good agreement in the extreme, right-hand tail of the distribution.

Frequency Plot

A Frequency Plot is a plot of magnitude versus annual exceedance probability (AEP). AEP is typically plotted on the X-axis using a Normal or Gumbel probability scale to linearize the plot and exaggerate the extreme right-hand tail of the data. The frequency plot compares the fitted distributions to the plotting positions of the input data as shown in Figure.

For demonstration purposes, let’s only evaluate the three-parameter distributions. Uncheck all of the two-parameter distributions in the goodness-of-fit table. Then sort the RMSE column in ascending order as shown in Figure. We see that the Log-Pearson Type III (LPIII) distribution provides the smallest RMSE. In addition, we can also see that it provides a very good fit through the input data plotted in the Frequency Plot. The Pearson Type III (PIII) and Generalized Extreme Value (GEV) distributions also fit through the data well and have a relatively small RSME.

Distribution Fitting Analysis Frequency Plot.
Figure: Distribution Fitting Analysis Frequency Plot.
PDF Plot

A PDF Plot compares the probability density function (PDF) of the fitted distribution to a histogram of the input data as shown in Figure. The Frequency Plot and PDF Plot are usually the most informative comparisons. With the PDF plot, it is easy to see the where the highest discrepancies are and whether the general shape of the data and fitted distributions agree well. From the PDF plot, we can see that Generalized Pareto (GPA) distribution does not fit the data as well as the others.

Distribution Fitting Analysis PDF Plot.
Figure: Distribution Fitting Analysis PDF Plot.
CDF Plot

The CDF Plot compares the cumulative distribution function (CDF) of the fitted distribution to the plotting positions of the input data as shown in Figure. The CDF plot has a very insensitive scale, and is not very useful for comparing the location, spread, and shape of the distributions, for which the PDF plot is much better. In many cases, the CDF plot will not provide a good visual measure for the goodness-of-fit.

Distribution Fitting Analysis CDF Plot.
Figure: Distribution Fitting Analysis CDF Plot.
P-P Plot

The Probability-Probability (P-P) plot graphs the F(x) of the model (distribution) versus the input data plotting positions. The closer the plot resembles the diagonal 1:1 line, the better the fit. The P-P Plot can be useful if you are interested in closely matching cumulative percentiles as it showsthe differences between the middle of the fitted distributions and the input data. From the P-P plot, we can see that the GPA distribution has the poorest agreement with the data.

Distribution Fitting Analysis P-P Plot.
Figure: Distribution Fitting Analysis P-P Plot.
Q-Q Plot

The Quantile-Quantile (Q-Q) plot graphs the inverse CDF of the model versus the percentile values of the input data. Again, the closer the plot resembles the diagonal 1:1 line, the better the fit. The Q-Q Plot can be useful if you are interested in closely matching cumulative percentiles as it shows the differences between the tails of the fitted distributions and the input data. From the Q-Q plot, we can see that the PIII, LPIII, and GEV distributions all fit the right-hand tail of the data really well. The Generalized Logistic (GLO) distribution has the poorest agreement with the right-hand tail of the data, followed by the GPA distribution as the second least favorable fit.

Distribution Fitting Analysis Q-Q Plot.
Figure: Distribution Fitting Analysis Q-Q Plot.

Tabular Results

RMC-BestFit reports the parameters and basic statistics (mean, standard deviation, skewness, etc.) for each of the fitted distributions, which can also be compared to the same statistics for the input data. Quantiles for the specified output frequency AEPs are also provided in the Tabular Results.

Distribution Fitting Analysis Tabular Results.
Figure: Distribution Fitting Analysis Tabular Results.

Selecting a Probability Distribution

Finally, you need to select a probability distribution to carry forward to the Bayesian Estimation Analysis. The following steps can help you choose an appropriate distribution:

  • Start by reviewing the descriptions of the probability distributions found in the Appendix. Then, look at the variable in question. Does the data have bounds? Is it symmetric or skewed? Which distributions are theoretically appropriate for the data?
  • Select candidate distributions that best characterize the variable.
  • Then, use the Distribution Fitting Analysis results to select the distribution that best describes your Input Data.

In the above example, we evaluated how well the three-parameter distributions fit 3-day inflow data at Blakely Mountain Dam. It was clear that the PIII, LPIII, and GEV distributions produced better fits than the GLO and GPA distributions. Flow data can span several orders in magnitude (e.g., 1,000 cfs to 1,000,000 cfs), and is typically skewed and cannot have negative values. If the skewness of the data is greater than the absolute value of 2 (Cs>2C_s > \lvert 2 \rvert), then the maximum likelihood estimation method cannot produce a solution for the PIII and LPIII distributions. Real-space flow data can often have skews much larger than 2. Therefore, LPIII better characterizes flow data than the PIII distribution (U.S. Geological Survey, 1982) [?] (U.S. Geological Survey, 2018) [?]. As such, the PIII distribution is not considered appropriate for characterizing the flow data.

When considering the statistical and graphical goodness-of-fit performance, and the appropriateness of the distribution, the LPIII distribution fits the Input Data the better than the GEV distribution. The LPIII distribution will now be carried forward to the Bayesian estimation analysis.

Bayesian Estimation Analysis

RMC-BestFit performs Bayesian estimation using a Markov Chain Monte Carlo (MCMC) algorithm to estimate distribution parameters given the specified input data and parent distribution. The Bayesian estimation method produces the most likely estimate for parameters (posterior mode) and a characterization of the parameter uncertainty.

To perform Bayesian Estimation with RMC-BestFit, there are four steps required:

  • Define Input Data and select the parent probability distribution.
  • Run the Bayesian Estimation Analysis.
  • Diagnose convergence.
  • Explore the results.

Further details of these steps are discussed in the following sections.

Create New Bayesian Analysis

Let’s create a new Bayesian Estimation Analysis. Right-click on the Bayesian Estimation Analysis folder header and click Create New… as shown in Figure. Next, give the Bayesian analysis a name and click OK.

Create New Bayesian Estimation Analysis.
Figure: Create New Bayesian Estimation Analysis.

Once the new Bayesian Estimation Analysis is created, it will be automatically opened into the Tabbed Documents area, and the Bayesian analysis properties will be displayed in the Properties Window. From here, you can set the required inputs.

Bayesian Analysis Framework

In Bayesian analysis, the values of the parent probability distribution parameters θ=(θ1,θ2,,θp)θ = (θ₁, θ₂, …, θₚ) converge to a distribution rather than to a single best value. The uncertainty in the parameters is represented by a prior probability distribution P(θ)P(θ), which is established based on information available a priori. This prior distribution is not derived from the observed data D=(X1,,Xn)D = (X₁, …, Xₙ), but instead comes from other sources that can be either subjective (e.g., expert opinion) or objective (e.g., previous statistical analyses, or regional information). After the prior distributions and the observed data are specified, Bayes’ theorem (Equation) is used to combine the a priori information about the parameters with the observed data, using the likelihood P(Dθ)P(D | θ) (Equation).

P(θD)=P(Dθ)P(θ)P(Dθ)P(θ)dθP(\theta \mid D) = \frac{P(D \mid \theta) \cdot P(\theta)}{\int P(D \mid \theta) \cdot P(\theta) \, d\theta}
P(Dθ)=i=1nf(Xiθ)P(D \mid \theta) = \prod_{i=1}^{n} f(X_i \mid \theta)

where:

P(θD)P(θ | D) = the posterior probability density function (PDF) of θθ P(θ)P(θ) = the prior PDF of θθ P(Dθ)P(D | θ) = the likelihood function.

The posterior cumulative distribution function (CDF) of XX now follows from the total probability theorem:

F(X)=F(Xθ,D)P(θD)dθF(X) = \int F(X \mid \theta, D) \cdot P(\theta \mid D) \, d\theta

which is a probability-weighted sum of the CDFs under different posterior parameter sets. Equation is known as the Bayesian posterior predictive distribution and is equivalent to the expected probability of exceedance concept first presented by (Beard, 1960) [?].

(Stedinger, 1983) [?] and (Kuczera, 1999) [?] refer to this integral as the design flood distribution, and it is considered the optimal estimator of an exceedance probability.

In most cases, there is not a closed form solution to the denominator of Equation. Therefore, Monte Carlo simulation techniques such as Markov Chain Monte Carlo (MCMC) are required. The RMC-BestFit software employs an adaptive Differential Evolution Markov Chain (DE-MCz) population-based sampler (ter Braak and Vrugt, 2008) [?], which has proven to be very efficient at arriving at the posterior distribution.

Figure illustrates the basic steps in Bayesian analysis. The Bayesian approach offers a framework that is well suited to incorporate different sources of information, such as systematic records, historical data, regional information, and other information along with related uncertainties (Viglione et al., 2013) [?]. The Bayesian approach allows you to formally include your own expertise into the analysis by choosing a priori distributions. The possibility to combine this information with the observed data is even more important because, in practice, the observed records are usually of limited size.

Diagram Illustrating the Basic Steps in Bayesian Analysis.
Figure: Diagram Illustrating the Basic Steps in Bayesian Analysis.

Define Inputs

You must select the Input Data and the parent probability distribution to use in the Bayesian Estimation Analysis. The parent distribution describes the parent population of your Input Data, which is assumed to be a sample from the parent population. By default, the parent distribution is set as the Generalized Extreme Value (GEV) distribution.

To select your data, click the Input Data drop-down on the General tab and select the desired data for the Bayesian analysis as shown in Figure. Next, set the parent distribution to be the distribution selected from the Distribution Fitting Analysis, which in this case was Log-Pearson Type III (LPIII). The prior distributions for parameters and quantiles can also be set from the General tab.

Bayesian Estimation Analysis General Properties.
Figure: Bayesian Estimation Analysis General Properties.

Default Flat Priors

After you have selected the Input Data and parent distribution, RMC-BestFit automatically develops default flat (uniform) priors for the selected distribution, given the data. The goal of this routine is to develop prior distributions that have minimal impact on the posterior distributions. This approach is sometimes referred to as vague priors, or weakly informative priors. Weakly informative priors contain information to keep the posterior within reasonable bounds without fully capturing one’s scientific knowledge about the underlying parameter (Gelman et al., 2014) [?]. There are two approaches to developing a weakly informative prior as described by (Gelman et al., 2014) [?]:

  1. Start with some version of an uninformative prior distribution and then add enough information so that inferences are constrained to be reasonable.
  2. Start with a strong, highly informative prior and broaden it to account for uncertainty in one’s prior beliefs and in the applicability of any historically based prior distributions to new data.

RMC-BestFit develops default flat priors by first considering the parent distribution and parameter support, and then peeking at the data to determine broad upper and lower constraints for the parameters. This ensures the prior distributions for parameters are somewhat centered near the likelihood, but with a much larger spread. The typical end-user of RMC-BestFit will likely not have much advanced training in Bayesian statistics. Therefore, the routine for default flat priors ensures you will get reasonable results out of the box.

The default flat priors are shown in Figure. You may uncheck the Use Default Flat Priors checkbox to customize the priors. See the Informative Priors section for details on how to set informative priors for parameters and quantiles.

Default Flat Prior Distributions for Parameters.
Figure: Default Flat Prior Distributions for Parameters.

Simulation Options

For typical applications, the default simulation options should provide reasonable results out of the box. The Bayesian Estimation Analysis has the following simulation options (see Figure) available for advanced users:

  • Number of Chains: The Bayesian Estimation Analysis utilizes an adaptive Differential Evolution Markov Chain (DE-MCz) population-based sampler (ter Braak and Vrugt, 2008) [?], in which multiple chains are run in parallel. It is recommended that the number of chains be 2 times the number of parent distribution parameters.
  • Thinning Interval: Determines how often the Markov Chain Monte Carlo (MCMC) evolutions will be recorded. Thinning can be used to reduce autocorrelation in the posterior distributions. A thinning interval of 20 means that every 20th iteration will be recorded.
  • Warm Up Evolutions: The number of thinned warm up MCMC evolutions to discard at the beginning of the simulation. It is recommended that the warm up be half the length of the number of evolutions; e.g., if the number of evolutions is 4,000, then the warm up length should be 2,000.
  • Evolutions: The number of thinned MCMC evolutions to simulate. If the thinning interval is 10 and the number of evolutions is 1,000, there will be a total of 10,000 iterations in the simulation. It is recommended to simulate at least 3,000 thinned evolutions.
  • PRNG Seed: The pseudo random number generator (PRNG) seed used within the Monte Carlo simulation. The PRNG ensures repeatability.
  • Initial Population: Determines the length of the initial population vector. It is recommended that the initial population be at least 100 times the number of parent distribution parameters in length.
  • Jump Parameter (γ): The jump parameter allows the simulation to jump from one mode region to another in the target distribution. It is recommended to set γ=2.382d\gamma = \frac{2.38}{\sqrt{2d}}, where d is the number of parent distribution parameters.
  • Jump Threshold: Determines how often the jump parameter (γ) switches to 1.0. It is recommended that the jump threshold be set to 0.20, which will result in adaptation 20% of the time.
  • Noise Parameter (b): A random noise is added to the proposal in the MCMC simulation. The noise follows a uniform distribution U(b,+b)U(-b, +b). It is recommended that b be very small, such as 0.001.

The simulation options are automatically set with default settings. You can uncheck the Use Defaults checkbox to customize the settings.

Output Options

The Bayesian Estimation Analysis has the following output options (see Figure):

  • Credible Interval: Sets the width of the credible interval. For a 90% credible interval, the value of interest lies with a 90% probability in the interval.
  • Output Length: The number of posterior parameter sets to output. It is recommended to output 10,000 parameter sets to ensure an accurate 90% credible interval.
  • Output Frequency Ordinates: The annual exceedance probabilities (AEP) used for plotting the results on the frequency plot. The default frequency ordinates range from 0.99 to 1E-6 AEP.
Bayesian Estimation Analysis Simulation and Output Options.
Figure: Bayesian Estimation Analysis Simulation and Output Options.

Run the Bayesian Analysis

After you have defined all of your inputs and settings, click the Estimate command button to run the Bayesian analysis. The runtime typically takes on the order of 30 seconds, depending on your computer configurations. A progress bar will appear to the left of the Estimate command button as shown in Figure. When the analysis is complete, you will see the frequency curve with credible intervals appear in Frequency Results plot located in the Tabbed Documents area.

Run the Bayesian Estimation Analysis.
Figure: Run the Bayesian Estimation Analysis.

Diagnose Convergence

After you have run the Bayesian Estimation Analysis, RMC-BestFit provides several useful plots for diagnosing the simulation convergence. The default simulation option (see Simulation Options) will typically ensure that you get reasonable results. However, there may be situations where you would like to adjust the simulation options to reduce runtimes while still achieving accurate results.

The Bayesian Estimation Analysis will open to the Frequency Results tab by default because this information is of most useful for a typical user. However, let’s start from the bottom and work upward by first clicking on the Markov Chain Traces tab as shown in Figure.

Markov Chain Traces

Trace plots provide an important tool for assessing the mixing of a Markov chain. The trace plot is a time series plot of the Markov Chain iterations. The trace plot shows the evolution of a parameter vector over the iterations of one or all chains. You can select the distribution parameter and which chain(s) to view. In addition, you can include the warm up evolutions. Let’s select the Skew (of log)(γ) parameter and All Chains and check the Include Warm Up checkbox, as shown in Figure.

We can see that for the first 100 or so evolutions, the sampler is warming up and has not yet converged. After that point, we see that the MCMC sampler seems to be mixing well because all of the chains are exploring the same region of the parameter space.

What you want to look for is any anomaly in the traces. Are there any chains significantly different from the others? Is there multimodality? In other words, do the traces jump from one modal region to another? These would be signs of poor mixing, and would indicate that you need to increase the number of evolutions.

Markov Chain Traces.
Figure: Markov Chain Traces.

Autocorrelation

Another way to check for convergence is to look at the autocorrelations between the MCMC samples. MCMC samples are dependent, so there will be correlation among each consecutive iteration. This will not affect the validity of inference on the posterior samples so long as the sampler has time to fully explore the posterior distribution. However, autocorrelation will affect the efficiency of the sampler. In other words, highly correlated chains require more samples to produce the same level of precision for an estimate. Since autocorrelation tends to decrease as the lag increases, thinning samples will reduce the final autocorrelation in the sample while also reducing the total number of saved MCMC iterations required.

Select the Autocorrelation tab and select the Skew (of log)(γ) parameter as shown in Figure. This plot shows the lag-k autocorrelation between every sample and the sample k steps before. The autocorrelation should decrease as k increases. When the autocorrelation is zero, the samples can be considered independent. Conversely, if the autocorrelation remains high for higher values of k, then this indicates a high degree of correlation between samples and slow mixing.

The Thinning Interval is set to 20 by default in the simulation options. We can see that this ensures we get good mixing and near zero autocorrelation.

Autocorrelation.
Figure: Autocorrelation.

Mean Likelihood

Another way to check for convergence is to see if log-likelihood function is stable. Select the Mean Likelihood tab. We can see that during the first 100 or so evolutions, the sampler is warming up and has not yet converged. As we previously saw in the Markov Chain trace plot, all of the chains are exploring the same region of the parameter space, which leads to a very stable mean log-likelihood.

Mean Likelihood.
Figure: Mean Likelihood.

It is recommended that you do not rely on a single diagnostic measure. It is important to use a weight-of-evidence approach and view many different diagnostics. After examining these plots, we can be confident that the MCMC simulation has converged. Now let’s explore the results.

Explore Results

RMC-BestFit provides several tools for exploring the results of the Bayesian analysis. The Bayesian Estimation Analysis will open to the Frequency Results tab by default as shown in Figure.

Frequency Results

The posterior predictive distribution, posterior mode distribution, and the credible intervals will be plotted on the Graphical Results tab. By default, the posterior parent distribution is plotted as a frequency curve, with annual exceedance probabilities plotted on the X-axis using a Normal scale, and magnitude on the Y-axis using a logarithmic scale. You may edit the plot properties, flip the axes, or changing the axes scales as desired. RMC-BestFit will save and persist all of the changes you have made to the plot.

Graphical Frequency Results.
Figure: Graphical Frequency Results.

Click the Tabular Results tab to view the frequency curve results and the posterior mode summary statistics as shown in Figure. All of the outputted posterior parameter sets and associated log-likelihood values are available on the Parameter Sets tab as shown in Figure. You may sort these tables, and copy all of the values for external use.

Tabular Frequency Results.
Figure: Tabular Frequency Results.
Parameter Sets.
Figure: Parameter Sets.

Kernel Density

RMC-BestFit provides kernel density plots of the marginal posterior distribution of parameters. Kernel density estimates are closely related to histograms, but provide smoothness and continuity. Select the Kernel Density tab, select the Skew (of log)(γ) parameter, and check the Show Prior Distribution checkbox, as shown in Figure. Recall that the prior for skew was set as a uniform distribution U(2,+2)U(-2, +2).

Underneath the plot, you can expand the Summary Statistics for the marginal distribution. Here we can see that the mean of the skew (of log) parameter is -0.3337 with a standard deviation of 0.1756. You will also notice that there is a statistic labeled Rhat. This is often referred to as the Gelman-Rubin statistic, which assesses the mixing of the Markov chains using the between- and within-chain variances (Gelman et al., 2014) [?]. A Rhat equal to 1.0 indicates that the posterior distribution has likely converged.

Kernel Density.
Figure: Kernel Density.

Histogram

RMC-BestFit also provides histograms of the marginal posterior distribution of parameters. These plots provide the same information as the kernel density plots. The histogram bins are set using the Rice rule k=2n3k = 2\sqrt[3]{n}, where kk is the number of bins, and nn is the number of posterior parameter sets.

Histogram.
Figure: Histogram.

Bivariate

RMC-BestFit provides the option to view the marginal posterior density functions for any pair of parameters as a two-dimensional heat map, with the color red indicating the highest density and blue indicated lowest density. Bivariate plots illustrate the dependence among the parent distribution parameters.

Typically, the higher order parameters are dependent on the lower order parameters. Set the X Parameter to be the Std Dev (of log) (σ) and the Y Parameter to be Skew (of log)(γ), as shown in Figure. We can see that standard deviation is negatively correlated with skew. Smaller standard deviations result in higher skews; whereas higher standard deviations results in lower skews. This tradeoff in parameters is typical with the LPIII distribution.

Bivariate.
Figure: Bivariate.

Informative Priors

An informative prior provides specific, scientific information about the parameter. Prior information can be obtained from regional analysis, causal modeling, or expert elicitation. In flood frequency, an example of an informative prior would be the use of a regional skew (Kuczera, 1983) [?] for the LPIII distribution as described in Bulletin 17B (U.S. Geological Survey, 1982) [?] and 17C (U.S. Geological Survey, 2018) [?].

For the Blakely Mountain Dam, regional skew information was obtained from a USGS regional study of Arkansas, Oklahoma, and Louisiana (Wagner et al., 2016) [?]. From the USGS study, the regional skew was determined to be -0.17 with a mean-square error (MSE) of 0.12. This information can be incorporated into the Bayesian analysis by setting the prior for the skew parameter of LPIII to be a Normal distribution with a mean of -0.17 and standard deviation of 0.35, or 0.12\sqrt{0.12}.

First, uncheck the Use Default Flat Priors checkbox located on the General tab of the Properties Window. Then, click the distribution button for the skew (of log) parameter. A distribution selector will pop open to the left of the button, as shown in Figure.

Informative Prior on Skew.
Figure: Informative Prior on Skew.

Next, select the Normal distribution and set the mean to be -0.17 and the standard deviation to be 0.35 as shown below.

Select Distribution for Skew.
Figure: Select Distribution for Skew.

Click away from the distribution selector popup and it will automatically close. You will now see that the prior distribution for skew has been set as N(0.17,0.35)N(-0.17,0.35). Now, click the Estimate command button to perform the Bayesian analysis using the informative prior.

Prior Distributions for Parameters.
Figure: Prior Distributions for Parameters.

When the simulation is complete, click on the Kernel Density tab and select the Skew (of log)(γ) parameter. When we used the default flat prior, the mean of the skew (of log) was -0.3337 with a standard deviation of 0.1756. Now, with the informative prior, we can see that the mean of the skew (of log) is -0.3062 with a standard deviation of 0.1616. The inclusion of the regional skew information has made the skew parameter less negative and reduced the variance.

As the sample size increases, the influence of the prior distribution on posterior inferences will decrease because the data likelihood will dominate. Taking this into account, regional prior information is most valuable when the at-site sample sizes are small relative to the effective sample size of the regional information.

View Results of Using an Informative Prior.
Figure: View Results of Using an Informative Prior.

Priors on Quantiles

Modeled rainfall-runoff results can be incorporated into the Bayesian analysis by defining a prior distribution for a flood quantile. This approach is referred to as causal information expansion, which analyzes the generating mechanisms of floods in the catchment of interest (Merz and Blöschl, 2008) [?].

A regional precipitation-frequency analysis was performed for the Blakely Mountain watershed, and 3-day basin-average rainfall-frequency events were routed using a calibrated HEC-HMS model. The main benefit of modeling regional rainfall-frequency information is that available rainfall records are often much longer than the at-site flood records. Therefore, the regional information combined with causal rainfall-runoff modeling can provide important prior information on the flood quantiles.

Use Single Quantile

Prior distributions for flood quantiles can be set in one of two ways. First, check the Enable Priors on Quantiles checkbox located on the General tab of the Properties Window. By default, the Use Single Quantile checkbox will also be checked. For more details on this single quantile approach, please refer to (Viglione et al., 2013) [?] and (Skahill et al., 2016) [?].

From the analysis performed with HEC-HMS, the distribution of rainfall-runoff at the 1E-4 AEP was determined to be Normally distributed with a mean of 155,000 cfs and a standard deviation of 22,000 cfs. This rare AEP was selected because in order to add information to the fit, it needed to be rarer than the paleoflood event, while not being so rare as to be overly influential.

Click the distribution button and a distribution selector will pop open to the left of the button as shown in Figure. The Normal distribution is the only option available for priors on quantiles. Next, set the mean to be 155,000 and the standard deviation to be 22,000. Click away from the distribution selector popup and it will automatically close. Now, select the 0.0001 AEP as shown below in Figure.

Select Distribution for a Quantile.
Figure: Select Distribution for a Quantile.

You will now see that the prior distribution for the quantile has been set to N(155000,22000)N(155000,22000). Now, click the Estimate command button to perform the Bayesian analysis using the informative quantile prior. When the analysis is complete, you will see the frequency curve with credible intervals appear in the Frequency Results plot located in the Tabbed Documents area. The mean of the quantile prior will be displayed as a green square, and the 5th and 95th percentiles of the prior will be shown as a vertical error bar, as shown in Figure.

Prior Distributions for a Single Quantile.
Figure: Prior Distributions for a Single Quantile.
Frequency Results Using a Prior Distribution for a Single Quantile.
Figure: Frequency Results Using a Prior Distribution for a Single Quantile.

If we click on the Kernel Density tab and select the Skew (of log)(γ) parameter, we can see that the mean of the skew (of log) is -0.2384 with a standard deviation of 0.1465. The inclusion of the causal rainfall-runoff information has made the skew parameter significantly less negative, reduced the variance, and reduced the width of the resulting credible intervals of the Frequency Results.

Summary Statistics Results for the Skew Parameter after Using an Informative Prior on a Single Quantile.
Figure: Summary Statistics Results for the Skew Parameter after Using an Informative Prior on a Single Quantile.
Use Multiple Quantiles

The final way to set prior distributions for flood quantiles is to uncheck the Use Single Quantile checkbox. This option for setting priors on distribution quantiles follows the approach used in (Coles and Tawn, 1996) [?]. The number of quantile priors must be equal to the number of distribution parameters. For example, for the LPIII distribution there must be three quantile priors.

In the single quantile approach shown above, the choice of AEP for the prior information has a large influence on the resulting fit and credible intervals. If a more frequent quantile is chosen, such as 1E-2, more weight would be given to the historical and paleoflood data and the quantile prior would have less influence on the fit. Whereas, the choice of the 1E-4 quantile gives significant weight to the prior. Choosing a rare quantile implies that we have high confidence in the regional precipitation-frequency analysis and modeled rainfall-runoff results. In general, the rarer the chosen quantile for the prior information, the more influence it will have on the posterior results.

As a general rule of thumb, if you are using NOAA Atlas 14 (A14, https://hdsc.nws.noaa.gov/hdsc/pfds/pfds_map_cont.html) precipitation-frequency data with a three parameter distribution, such as LPIII, then you should enter quantile priors for 1E-1, 1E-2, and 1E-3. However, if you have performed a custom regional precipitation-frequency analysis that is believed to be of higher quality than A14, you should enter priors for 1E-2, 1E-3, and 1E-4.

In the case of Blakely Mountain Dam, a custom regional analysis was performed for the nearby Trinity River Basin in Texas, which is described in (MetStat, Inc., 2018) [?]. The regional frequency analysis performed for the Trinity River Basin also included the geographical region where the Blakely Mountain watershed is located in Arkansas. This study incorporated advanced techniques, such as storm typing; therefore, the results are considered to provide a better extrapolation to rare exceedance probabilities than A14.

Using the results from the HEC-HMS analysis, the distribution of rainfall-runoff for the 1E-2, 1E-3, and 1E-4 AEP quantiles are shown in Figure. After you have entered these prior distributions, click the Estimate command button to perform the Bayesian analysis. The Frequency Results plot is shown below in Figure.

Prior Distributions for Multiple Quantiles.
Figure: Prior Distributions for Multiple Quantiles.
Frequency Results Using a Prior Distribution for Multiple Quantiles.
Figure: Frequency Results Using a Prior Distribution for Multiple Quantiles.

If we click on the Kernel Density tab and select the Skew (of log)(γ) parameter, we can see that the mean of the skew (of log) is -0.2779 with a standard deviation of 0.1500. The use of multiple quantiles has still made the skew parameter less negative and reduced the variance. However, the effect is less noticeable as compared to what we saw with the single quantile option. With this in mind, the single quantile option has a potential to underestimate the true parameter and quantile variance. The multiple quantile prior method provides a more complete treatment of the priors, and is considered a better choice when the data is available.

Summary Statistics Results for the Skew Parameter after Using an Informative Prior on Multiple Quantiles.
Figure: Summary Statistics Results for the Skew Parameter after Using an Informative Prior on Multiple Quantiles.

As we saw in the regional skew example, when the at-site sample size increases, the influence of the prior distribution on posterior will decrease because the data likelihood will dominate. Taking this into account, prior information on quantiles is most valuable when the at-site sample sizes are small relative to the effective sample size of the regional precipitation-frequency and causal rainfall-runoff information. For further information on setting informative priors for quantiles, please see (Coles and Tawn, 1996) [?], (Smith, 2005) [?], (Viglione et al., 2013) [?], and (Skahill et al., 2016) [?].

You have now finished the example application with RMC-BestFit. Save the project by selecting File > Save, or by clicking the Save button on the main window Tool Bar.