.. _series-analysis:

********************
Series-Analysis Tool
********************

Introduction
============

The Series-Analysis Tool accumulates statistics separately for each horizontal grid location over a series. Often, this series is over time or height, though any type of series is possible. This differs from the Grid-Stat tool in that Grid-Stat verifies all grid locations together as a group. Thus, the Series-Analysis Tool can be used to find verification information specific to certain locations or see how model performance varies over the domain.

Practical Information
=====================

This Series-Analysis tool performs verification of gridded model fields using matching gridded observation fields. It computes a variety of user-selected statistics. These statistics are a subset of those produced by the Grid-Stat tool, with options for statistic types, thresholds, and conditional verification options as discussed in :numref:`grid-stat`. However, these statistics are computed separately for each grid location and accumulated over some series such as time or height, rather than accumulated over the whole domain for a single time or height as is done by Grid-Stat.

This tool computes statistics for exactly one series each time it is run. Multiple series may be processed by running the tool multiple times. The length of the series to be processed is determined by the first of the following that is greater than one: the number of forecast fields in the configuration file, the number of observation fields in the configuration file, the number of input forecast files, the number of input observation files. Several examples of defining series are described below.

To define a time series of forecasts where the valid time changes for each time step, set the forecast and observation fields in the configuration file to single values and pass the tool multiple forecast and observation files. The tool will loop over the forecast files, extract the specified field from each, and then search the observation files for a matching record with the same valid time.

To define a time series of forecasts that all have the same valid time, set the forecast and observation fields in the configuration file to single values. Pass the tool multiple forecast files and a single observation file containing the verifying observations. The tool will loop over the forecast files, extract the specified field from each, and then retrieve the verifying observations.

To define a series of vertical levels all contained in a single input file, set the forecast and observation fields to a list of the vertical levels to be used. Pass the tool single forecast and observation files containing the vertical level data. The tool will loop over the forecast field entries, extract that field from the input forecast file, and then search the observation file for a matching record.

series_analysis Usage
---------------------

The usage statement for the Series-Analysis tool is shown below:

.. code-block:: none

  Usage: series_analysis
         -fcst  file_1 ... file_n | file_list
         -obs   file_1 ... file_n | file_list
         [-both file_1 ... file_n | file_list]
         [-aggr file]
         [-paired]
         -out file
         -config file
         [-log file]
         [-v level]
         [-compress level]

series_analysis has four required arguments and accepts several optional ones.

Required Arguments series_stat
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

1. The **-fcst file_1 ... file_n | file_list** option specifies the gridded forecast files or ASCII file list of file names to be used, as described in :numref:`ascii_file_lists`.

2. The **-obs file_1 ... file_n | file_list** option specifies the gridded observation files or ASCII file list of file names to be used, as described in :numref:`ascii_file_lists`.

3. The **-out file** is the NetCDF output file containing computed statistics.

4. The **-config file** is a Series-Analysis Configuration file containing the desired settings.

Optional Arguments for series_analysis
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

5. To set both the forecast and observations to the same set of files, use the optional **-both file_1 ... file_n | file_list** option to the same set of files. This is useful when reading the NetCDF matched pair output of the Grid-Stat tool which contains both forecast and observation data.

6. The **-aggr** option specifies the path to an existing Series-Analysis output file. When computing statistics for the input forecast and observation data, Series-Analysis aggregates the partial sums (SL1L2, SAL1L2 line types) and contingency table counts (CTC, MCTC, and PCT line types) with data provided in the aggregate file. This option enables Series-Analysis to run iteratively and update existing partial sums, counts, and statistics with new data.

.. note:: When the **-aggr** option is used, only statistics that are derivable from partial sums and contingency table counts can be requested. Runtimes are generally much slower when aggregating data since it requires many additional NetCDF variables containing the scalar partial sums and contingency table counts to be read and written.

7. The **-paired** option indicates that the **-fcst** and **-obs** file lists are already paired, meaning there is a one-to-one correspondence between the files in those lists. This option affects how missing data is handled. When **-paired** is not used, missing or incomplete files result in a runtime error with no output file being created. When **-paired** is used, missing or incomplete files result in a warning with output being created using the available data.

8. The **-log** file outputs log messages to the specified file.

9. The **-v** level overrides the default level of logging (2).

10. The **-compress** level option indicates the desired level of compression (deflate level) for NetCDF variables. The valid level is between 0 and 9. The value of "level" will override the default setting of 0 from the configuration file or the environment variable MET_NC_COMPRESS. Setting the compression level to 0 will make no compression for the NetCDF output. Lower number is for fast compression and higher number is for better compression.

An example of the series_analysis calling sequence is shown below:

.. code-block:: none

  series_analysis \
  -fcst   myfcstfilelist.txt \
  -obs    myobsfilelist.txt \
  -config SeriesAnalysisConfig \
  -out    out/my_series_statistics.nc

In this example, the Series-Analysis tool will process the list of forecast and observation files specified in the text file lists into statistics for each grid location using settings specified in the configuration file. Series-Analysis will create an output NetCDF file containing requested statistics.

series_analysis Output
----------------------

The Series-Analysis tool produces NetCDF files containing output statistics for each grid location from the input files. The details about the output statistics available from each output line type are detailed in Chapter 5 since they are also produced by the Grid-Stat Tool. A subset of these can be produced by this tool, with the most notable exceptions being the wind vector and neighborhood statistics. Users can inventory the contents of the Series-Analysis output files using the ncdump -h command to view header information. Additionally, ncview or the Plot-Data-Plane tool can be used to visualize the output. An example of Series-Analysis output is shown in :numref:`series-analysis_Glibert_precip` below.

.. _series-analysis_Glibert_precip:

.. figure:: figure/series-analysis_Glibert_precip.png

   An example of the Gilbert Skill Score for precipitation forecasts at each grid location for a month of files.

series_analysis Configuration File
----------------------------------
The default configuration file for the Series-Analysis tool named **SeriesAnalysisConfig_default** can be found in the installed *share/met/config* directory. The contents of the configuration file are described in the subsections below.

Note that environment variables may be used when editing configuration files, as described in the :numref:`config_env_vars`.

____________________

.. code-block:: none

  model          = "FCST";
  desc           = "NA";
  obtype         = "ANALYS";
  regrid         = { ... }
  fcst           = { ... }
  obs            = { ... }
  climo_mean     = { ... }
  climo_stdev    = { ... }
  ci_alpha       = [ 0.05 ];
  boot           = { interval = PCTILE; rep_prop = 1.0; n_rep = 1000;
                     rng = "mt19937"; seed = ""; }
  mask           = { grid = ""; poly = ""; }
  gradient       = { dx = [ 1 ]; dy = [ 1 ]; }
  hss_ec_value   = NA;
  rank_corr_flag = TRUE;
  tmp_dir        = "/tmp";
  version        = "VN.N";

The configuration options listed above are common to many MET tools and are described in :numref:`config_options`.

____________________

.. code-block:: none

  block_size = 1024;

Number of grid points to be processed concurrently. Set smaller to use less memory but increase the number of passes through the data. The amount of memory the Series-Analysis tool consumes is determined by the size of the grid, the length of the series, and the block_size entry defined above. The larger this entry is set the faster the tool will run, subject to the amount of memory available on the machine. If set less than or equal to 0, it is automatically reset to the number of grid points, and they are all processed concurrently.

____________________

.. code-block:: none

  vld_thresh = 1.0;

Ratio of valid matched pairs for the series of values at each grid point required to compute statistics. Set to a lower proportion to allow some missing values. Setting it to 1.0 requires that every data point be valid over the series to compute statistics.

____________________

.. code-block:: none

  output_stats = {
     fho    = [];
     ctc    = [];
     cts    = [];
     mctc   = [];
     mcts   = [];
     cnt    = ["RMSE", "FBAR", "OBAR"];
     sl1l2  = [];
     sal1l2 = [];
     pct    = [];
     pstd   = [];
     pjc    = [];
     prc    = [];
     grad   = [];
  }

The output_stats array controls the type of output that the Series-Analysis tool generates. Each flag corresponds to an output line type in the STAT file and is used to specify the comma-separated list of statistics to be computed. Use the column names from the tables listed below to specify the statistics. The output flags correspond to the following types of output line types:

1. FHO for Forecast, Hit, Observation Rates (See :numref:`table_PS_format_info_FHO`)

2. CTC for Contingency Table Counts (See :numref:`table_PS_format_info_CTC`)

3. CTS for Contingency Table Statistics (See :numref:`table_PS_format_info_CTS`)

4. MCTC for Multi-Category Contingency Table Counts (See :numref:`table_PS_format_info_MCTC`)

5. MCTS for Multi-Category Contingency Table Statistics (See :numref:`table_PS_format_info_MCTS`)

6. CNT for Continuous Statistics (See :numref:`table_PS_format_info_CNT`)

7. SL1L2 for Scalar L1L2 Partial Sums (See :numref:`table_PS_format_info_SL1L2`)

8. SAL1L2 for Scalar Anomaly L1L2 Partial Sums climatological data is supplied (See :numref:`table_PS_format_info_SAL1L2`)

9. PCT for Contingency Table Counts for Probabilistic forecasts (See :numref:`table_PS_format_info_PCT`)

10. PSTD for Contingency Table Statistics for Probabilistic forecasts (See :numref:`table_PS_format_info_PSTD`)

11. PJC for Joint and Conditional factorization for Probabilistic forecasts (See :numref:`table_PS_format_info_PJC`)

12. PRC for Receiver Operating Characteristic for Probabilistic forecasts (See :numref:`table_PS_format_info_PRC`)

13. GRAD for Gradient Statistics (See :numref:`table_GS_format_info_GRAD`)

.. note:: When the -input option is used, all partial sum and contingency table count columns are required to aggregate statistics across multiple runs. To facilitate this, the output_stats entries for the CTC, SL1L2, SAL1L2, PCT, and GRAD line types can be set to "ALL" to indicate that all available columns for those line types should be written.