12. Pair-Stat Tool

12.1. Introduction

The Pair-Stat tool provides verification statistics for forecast and observation data that has already been paired in time and space. While no smoothing, regridding, or interpolation methods apply to the forecast and observation pairs, the Pair-Stat tool filters and groups the pairs temporally and spatially. It then computes continuous, categorical, and probabilistic verification statistics. The categorical and probabilistic statistics are generally derived by applying a threshold to the forecast and observation values. Confidence intervals - representing the uncertainty in the verification measures - are computed for the verification statistics.

Scientific and statistical aspects of the Pair-Stat tool are discussed in the following section. Practical aspects of the Pair-Stat tool are described in Section 12.3.

12.2. Scientific and Statistical Aspects

The statistics and measures computed by the Pair-Stat tool are a subset of those computed by the Point-Stat tool which are described briefly in Section 11.2.4 and in more detail in Appendix C, Section 35. Additionally, Section 11.2.5 describes the methods for computing confidence intervals that are applied to some of the measures computed by the Pair-Stat tool; more detail on confidence intervals is provided in Appendix D, Section 36.

12.3. Practical Information

The Pair-Stat tool performs verification for forecast and observation data that has already been paired in time and space. The paired data is supplied via one of the supported input formats, including the MET MPR Line Type written by the Point-Stat tool with the -format mpr command line option, through Python embedding with the -format python command line option, and from IODA (Interface for Observation Data Access) files described in the IODA2NC Tool with the -format ioda command line option. The Stat-Analysis tool also processes MPR data, so the functionality of Pair-Stat and Stat-Analysis overlap in this way. Based on configuration file settings, paired data for each verification task requested is extracted from one or more input files, subsetted temporally and spatially, and used to compute and write a variety of statistics and measures. If forecast and/or observation climatology data is provided in the configuration file, it is interpolated to the location of each pair and used in the computation of statistics.

If no matched pairs are found for a particular verification task, no statistics are computed or written to the output.

12.3.1. pair_stat Usage

The usage statement for the Pair-Stat tool is shown below:

Usage: pair_stat
       -pairs file_1 ... file_n | file_list
       -format type
       -config config_file
       [-out base]
       [-log file]
       [-v level]

pair_stat has three required arguments and accepts optional ones.

12.3.1.1. Required Arguments for pair_stat

The -pairs argument defines one or more input files containing forecast/observation pairs. May be set as a list of file names (file_1 … file_n) or as an ASCII file containing a list file names (file_list), as described in Section 4.1.1 (required).

This option can be used multiple times but all inputs must follow the same -format type, described below.

For -format python, the -pairs file defines the path to a Python embedding script to be run followed by any arguments to that script and enclosed in single or double quotes.
The -format type argument defines the input pairs file format and may be set to “mpr”, “python”, or “ioda” (required).
The -config config_file argument is a PairStatConfig file containing the desired configuration settings (required).

12.3.1.2. Optional Arguments for pair_stat

The -out base option overrides the default output file base (./pair_stat) (optional).

Each output file begins with this output file base followed by the “.stat” or “_TYPE.txt” suffix, where TYPE is a specific output line type. Users should set -out base appropriately to avoid overwriting output generated by previous runs of the Pair-Stat tool.
The -log file option directs output and errors to the specified log file. All messages will be written to that file as well as standard out and error. Thus, users can save the messages without having to redirect the output on the command line. The default behavior is no log file.
The -v level option indicates the desired level of verbosity. The value of “level” will override the default setting of 2. Setting the verbosity to 0 will make the tool run with no log messages, while increasing the verbosity will increase the amount of logging.

An example of the pair_stat calling sequence is shown below:

pair_stat \
-pairs point_stat_run1_mpr.txt point_stat_run2.stat \
-format mpr \
-config PairStatConfig

In this example, the Pair-Stat tool reads matched pair data (-format mpr) from point_stat_run1_mpr.txt and point_stat_run2.stat files and applies the configuration options specified in the PairStatConfig file.

12.3.2. pair_stat Configuration File

The default configuration file for the Pair-Stat tool named PairStatConfig_default can be found in the installed share/met/config directory. Users are encouraged to make a copy prior to modifying its contents. The configuration file options are described in the subsections below.

Note that environment variables may be used when editing configuration files, as described in the Section 5.1.1.

model             = "FCST";
desc              = "NA";
point_weight_flag = NONE;
tmp_dir           = "/tmp";
version           = "VN.N";

The configuration options listed above are common to multiple MET tools and are described in Section 5.

fcst = {
  pairs = [
    {
      name  = "TMP";
      level = [ "Z2" ];
    }
  ];
}

obs = {
  pairs = [
    {
      name = "TMP";
      level = [ "Z2" ];
    }
  ];
}

The fcst and obs entries are dictionaries containing the pairs entry which is an array of dictionaries. The fcst.pairs and obs.pairs arrays must have the same non-zero length. The formatting of fcst.pairs is the same as fcst.field, described in Section 5.2.13. The name and level entries in each array entry vary based on the input file format:

For the mpr and python formats, set name and level based on the desired values of the input MET MPR Line Type. Set fcst.pairs.name and obs.pairs.name to the desired values of the FCST_VAR and OBS_VAR columns, respectively. Set fcst.pairs.level and and obs.pairs.level to one or more desired values of the FCST_LEV and OBS_LEV columns, respectively. Only MPR lines whose variable names match those requested and whose level strings appear in the list of requested level strings will be used for that verification task.

For the ioda format, the name entry specifies the input IODA NetCDF variable to be read. The level entry does not apply. IODA files contain multiple variables indexed by a point location dimension. Data read from the fcst.pairs.name and obs.pairs.name variables define the matched pairs for that verification task.

Note

IODA files typically use NetCDF4 groups, and the name entry should specify both the group and variable names, formatted as name = "/GROUP_NAME/VARIABLE_NAME"; (e.g. name = "/hofx/air_temperature";).

convert(x)    = ...
censor_thresh = [];
censor_val    = [];
cat_thresh    = [ NA ];
cnt_thresh    = [ NA ];
cnt_logic     = UNION;
wind_thresh   = [ NA ];
wind_logic    = UNION;

The configuration options listed above are used to process the input paired data and define thresholds for filtering data and for categorical verification. They can be speficied separately for each verification task inside in each fcst.pairs or obs.pairs array entry. They are common to multiple MET tools and are described in Section 5.

mpr_column = [];
mpr_thresh = [];

The configuration options listed above filter the input paired data using numeric thresholds. They can be specified separately for each verification task inside each obs.pairs array entry. They are common to multiple MET tools and are described in Section 5.

mpr_str_inc = [];
mpr_str_exc = [];

The configuration options listed above filter the input paired data by specifying input column strings to be included or excluded. They can be specified separately for each verification task inside each obs.pairs array entry. The options define arrays of dictionaries with each dictionary containing a key and val entry. The key entry defines the input MPR column name to be filtered and the val entry defines a comma-separated list of values.

For example, the following settings include only MPR lines with the OBTYPE column set to ADPUPA and exclude any MPR lines with the VX_MASK column set to DTC165 or DTC166.

mpr_str_inc = [ { key = "OBTYPE";  val = "ADPUPA";        } ];
mpr_str_exc = [ { key = "VX_MASK"; val = "DTC165,DTC166"; } ];

fcst_lead       = [];
obs_lead        = [];

fcst_valid_beg  = "";
fcst_valid_end  = "";
fcst_valid_inc  = [];
fcst_valid_exc  = [];
fcst_valid_hour = [];

obs_valid_beg   = "";
obs_valid_end   = "";
obs_valid_inc   = [];
obs_valid_exc   = [];
obs_valid_hour  = [];

fcst_init_beg   = "";
fcst_init_end   = "";
fcst_init_inc   = [];
fcst_init_exc   = [];
fcst_init_hour  = [];

obs_init_beg    = "";
obs_init_end    = "";
obs_init_inc    = [];
obs_init_exc    = [];
obs_init_hour   = [];

The configuration options listed above filter the input paired data temporally. They can be specified separately for each verificaiton task inside each obs.pairs array entry. They are also supported for the Stat-Analysis tool and are described in Section 17.3.2.

eclv_points     = 0.05;
hss_ec_value    = NA;
rank_corr_flag  = FALSE;
ci_alpha        = [ 0.05 ];
boot            = { interval = PCTILE; rep_prop = 1.0; n_rep = 1000;
                    rng = "mt19937"; seed = ""; }
seeps_p1_thresh = >=0.1&&<=0.85;

The configuration options listed above define verification logic options. They can be specified separately for each verification task inside each obs.pairs array entry. They are common to multiple MET tools and are described in Section 5.

climo_mean  = { ... }
climo_stdev = { ... }
climo_cdf   = { ... }

The configuration options listed above specify climatological input data. They can be specified separately inside the fcst and obs dictionaries. They are common to multiple MET tools and are described in Section 5.

The length of the climo_mean.field and climo_stdev.field arrays must match and can either be zero (for no climatological data) or match the fcst.pairs and obs.pairs array length.

Note

When interpolating climatological data to pair data locations, the grid on which the climatological data is defined is used with the nearest neighbor interpolation method and that cannot currently be overridden. A future enhancement may make these interpolation options configurable.

mask = {
  grid  = [ "FULL", "DTC165", "DTC166" ];
  poly  = [];
  sid   = [];
  llpnt = [];
}

The mask dictionary defines how the input paired data is aggregated spatially when computing statistics. For each verification task, output statistics will be computed for each masking region defined. This is common to multiple MET tools and are described in Section 5.

Note

The mask.grid and mask.poly options are currently defined relative to a reference grid. Since no grid applies to the input paired data, a global 1/10 degree reference grid is used by default and that grid cannot currently be overridden. A future enhancement may eliminate the use of a reference grid in this context.

output_flag = {
   fho    = BOTH;
   ctc    = BOTH;
   cts    = BOTH;
   mctc   = BOTH;
   mcts   = BOTH;
   cnt    = BOTH;
   sl1l2  = BOTH;
   sal1l2 = BOTH;
   vl1l2  = BOTH;
   vcnt   = BOTH;
   val1l2 = BOTH;
   pct    = BOTH;
   pstd   = BOTH;
   pjc    = BOTH;
   prc    = BOTH;
   eclv   = BOTH;
   mpr    = BOTH;
   seeps  = NONE;
   seeps_mpr = NONE;
}

The output_flag array controls the type of output that the Pair-Stat tool generates. These output types are a subset of those supported by the Point-Stat tool and are described in Section 11.3.2.

12.3.3. pair_stat Output

The Pair-Stat tool produces output in STAT and, optionally, ASCII format. The ASCII output duplicates the STAT output but has the data organized by line type. The output files names begin with ./pair_stat or the string specified with the -out base command line option. Users should set the -out base command line option appropriately to avoid overwriting output generated by previous runs of the Pair-Stat tool.

The output STAT file is named by appending .stat to the output base string.

The output ASCII files are named by appending _TYPE.txt to the output base string for each output line TYPE set to BOTH in the output_flag configuration dictionary.