API Reference

This page documents all routines provided by gridwxcomp. Tools for different routines are listed roughly in the order that they would normally be used.

Python functions, classes, and modules

prep_metadata

gridwxcomp.prep_metadata(station_path, config_path, grid_name, out_path='formatted_input.csv')[source]

Read list of climate stations in metadata and verify all needed parameters exist. An output CSV file is saved that will be the formatted in a way that is standardized for the variables that are needed by the subsequent Earth Engine download and bias calculation modules.

Station time series files must be in the same directory as the main input to this function, i.e., the station_path metadata file.

Parameters:
  • station_path (str) – path to CSV file containing metadata of climate stations that will later be used to calculate bias ratios to the gridded dataset.

  • config_path (str) – path to config file containing projection info

  • grid_name (str) – name of the gridded dataset that is being used for comparison against observed data.

  • out_path (str) – path to save output CSV, default is to save as ‘merged_input.csv’ to current working directory.

Returns:

None

Example

>>> from gridwxcomp import prep_metadata
>>> prep_metadata('example_metadata.txt','outfile.csv')

outfile.csv will be created containing station and corresponding gridded data. This file is later used as input for gridwxcomp.ee_download and gridwxcomp.calc_bias_ratios.

Important

Make sure the following column headers exist in your input station metadata file (station_path) and are spelled exactly:

  • Latitude

  • Longitude

  • Station

  • Filename

Also, the “Filename” column should match the names of the climate time series files that should be in the same directory as the station metadata file. For example, if one of the time series files is named “Bluebell_daily_data.csv” then the following are permissiable entries as the “Filename”: “Bluebell_daily_data” or “Bluebell_daily_data.csv”.

Raises:

ValueError – if one or more of the following mandatory columns are missing from the input CSV file (station_path parameter): ‘Longitude’, ‘Latitude’, ‘Station’, or ‘Filename’.

ee_download

This module has tools to download timeseries climate data from gridded climate data collections that are hosted on Google’s Earth Engine. It reads the formatted file that was prepared using the gridwxcomp.prep_metadata module and uses the coordinate information there along with the variable names specified in the configuration .INI file to know which data to download and for which geographic locations which are paired with the station locations.

gridwxcomp.ee_download.download_grid_data(metadata_path, config_path, local_folder=None, force_download=False)[source]

Takes in the metadata file generated by gridwxcomp.prep_metadata and downloads the corresponding point data for all stations within. This function requires the dataset be accessible in the user’s Google Earth Engine account, and the image collection name and path should be specified in the configuration .INI file (i.e., in the config_path file).

The metadata file will be updated for the path the gridded data files are downloaded to.

Parameters:
  • metadata_path (str) – path to the metadata path generated by gridwxcomp.prep_metadata

  • config_path (str) – path to config file containing catalog info

  • local_folder (str) – folder to download point data to

  • force_download (bool) – will re-download all data even if local file already exists

Returns:

None

Note: You must authenticate with Google Earth Engine before using

this function.

calc_bias_ratios

Calculate monthly bias ratios of variables from climate station to overlapping gridded dataset cells.

Input file for this module must first be created by running gridwxcomp.prep_metadata followed by gridwxcomp.ee_download.

Note: The module is reliant on values within the specified config file

in order to interpret the station and gridded data values successfully. If you are experiencing errors or bad values please check it is set up correctly.

gridwxcomp.calc_bias_ratios.calc_bias_ratios(input_path, config_path, out_dir, method='long_term_mean', grid_id_name='GRID_ID', comparison_var='etr', grid_id=None, day_limit=10, years='all', comp=True)[source]

Read input metadata CSV file and config file, use them to calculate mean monthly bias ratios between station to corresponding grid cells for all station and grid pairs, optionally calculate ratios for a single gridcell.

Parameters:
  • input_path (str) – path to input CSV file with matching station climate and grid metadata. This file is created by running gridwxcomp.prep_metadata followed by gridwxcomp.ee_download.

  • config_path (str) – path to the configuration file that has the parameters used to interpret the station and gridded data files.

  • out_dir (str) – path to directory to save CSV files with monthly bias ratios of etr.

  • method (str) – default ‘long_term_mean’. How to calculate mean station to grid ratios, currently two options ‘long_term_mean’ takes the mean of all dates for the station variable that fall in a time periods, e.g. the month of January, to the mean of all paired January dates in the gridded product. The other option is ‘mean_of_annual’ which calculates ratios, for each time period if enough paired days exist, the ratio of sums for each year in the record and then takes the mean of the annual ratios. This method is always used to calculate standard deviation and coefficient of variation of ratios which describe interannual variation of ratios.

  • grid_id_name (str) – default ‘GRID_ID’. Name of column containing index/cell identifiers for gridded dataset.

  • comparison_var (str) – default ‘etr’. Grid climate variable to calculate bias ratios. This value must be found within the module parameter VAR_LIST.

  • grid_id (int or str or None) – default None. Grid ID (int cell identifier) to only calculate bias ratios for a single gridcell.

  • day_limit (int) – default 10. Threshold number of days in month of missing data, if less, exclude month from calculations. Ignored when method='long_term_mean'.

  • years (int or str) – default ‘all’. Years to use for calculations e.g. 2000-2005 or 2011.

  • comp (bool) – default True. Flag to save a “comprehensive” summary output CSV file that contains station metadata and statistics in addition to the mean monthly ratios.

Returns:

None

Example

To use within Python for observed ETo

>>> from gridwxcomp import calc_bias_ratios
>>> input_file = 'prepped_metadata.csv'
>>> config_file = 'gridwxcomp_config.ini'
>>> out_directory = 'monthly_ratios'
>>> grid_variable = 'eto'
>>> calc_bias_ratios(input_file, config_file, out_directory, comparison_var=grid_variable, comp=True)

This results in two CSV files in out_directory named “eto_summary_all_yrs.csv” and “eto_summary_comp_all_yrs.csv”.

Raises:
  • FileNotFoundError – if input file or config file are invalid or not found.

  • KeyError – if the input file does not contain file paths to the climate station and grid time series files. This occurs if, for example, the gridwxcomp.prep_metadata and/or gridwxcomp.ee_download scripts have not been run first. Also raised if the given the values specified in the config file are not found within the station and gridded data files.

  • ValueError – if the method kwarg is invalid.

Note

Growing season and summer periods over which ratios are calculated are defined as April through October and June through August respectively.

Note

If an existing summary file contains a climate station that is being reprocessed its monthly bias ratios and other data will be overwritten. Also, to proceed with spatial analysis scripts, the comprehensive summary file must be produced using this function first.

spatial

Perform multiple workflows needed to estimate the spatial surface (and other related outputs) of monthly and annual station-to-grid bias ratios for meteorological variables. The input file is first created by gridwxcomp.calc_bias_ratios.

gridwxcomp.spatial.PT_ATTRS

all attributes expected to be in point shapefile created for stations except station and Grid IDs.

Type:

tuple

Note

All spatial files, i.e. vector and raster files, utilize the ESRI Shapefile or GeoTiff format.

gridwxcomp.spatial.make_points_file(in_path, config_path, grid_id_name='GRID_ID')[source]

Create vector shapefile of points with monthly mean bias ratios for climate stations using all stations found in a comprehensive CSV file created by gridwxcomp.calc_bias_ratios.

Parameters:
  • in_path (str) – path to [var]_summary_comp.CSV file containing monthly bias ratios, lat, long, and other data. Shapefile “[var]_summary_pts.shp” is saved to parent directory of in_path under “spatial” subdirectory.

  • config_path (str) – path to the configuration file that has the parameters used to interpret the station and gridded data files.

  • grid_id_name (str) – name of the column containing grid ID’s

Returns:

None

Example

Create shapefile containing point data for all climate stations in input summary file created by gridwxcomp.calc_bias_ratios

>>> from gridwxcomp import spatial
>>> # path to comprehensive summary CSV
>>> summary_file = 'monthly_ratios/etr_mm_summary_comp_all_yrs.csv'
>>> config_file = 'gridwxcomp_config.ini'
>>> spatial.make_points_file(summary_file, config_file)

The result is the file “etr_mm_summary_pts.shp” being saved to a subdirectory created in the directory containing in_path named “spatial”, i.e.:

"monthly_ratios/spatial/etr_mm_summary_pts_wgs84.shp".

This file has the points projected in the WGS 84 geographic coordinate system. Another point shapefile will also be made if the “interpolation_projection” was listed in the user provided configuration file as a coordinate reference system that differs from WGS84, i.e., a projected coordinate system. The user can provide a EPSG code or an ESRI code such as ESRI:102004 which refers to a coordinate reference system and the other shapefile will then have the following path and suffix:

"monthly_ratios/spatial/etr_mm_summary_pts_ESRI_102004.shp".
Raises:

FileNotFoundError – if input summary CSV or configuration INI files are not found.

Note

make_points_file will overwrite any existing point shapefile of the same climate variable. If no “interpolation_projection” option is listed in the configuration file’s METADATA section the default will be ESRI:102004 which refers to the Lambert Conformal Conic projected coordinate system.

gridwxcomp.spatial.make_grid(in_path, config_path, overwrite=False, grid_id_name='GRID_ID')[source]

Make fishnet grid (vector file of polygon geometry) for select gridcells based on bounding coordinates. Each cell in the grid will be assigned a unique numerical identifier (property specified by grid_id_name) and the grid will be in the WGS84 coordinate system.

Modified from the Python GDAL/OGR Cookbook.

Parameters:
  • in_path (str) – path to [var]_summary_comp_[years].csv file containing monthly bias ratios, lat, long, and other data. Created by gridwxcomp.calc_bias_ratios.

  • config_path (str) – path to the configuration file that has the parameters used to interpret the station and gridded data files.

  • overwrite (bool) – default False. If True, overwrite the grid shapefile at out_path if it already exists.

  • grid_id_name (str) – default “GRID_ID”. Name of gridcell identifier column

Returns:

None

Examples

Build a fishnet uniform grid that is defined by the bounds and resolution defined in the METADATA section of the configuration file. These parameters should be provided in decimal degrees.

>>> from gridwxcomp import spatial
>>> # assign input paths
>>> summary_file = 'monthly_ratios/etr_mm_summary_comp_all_yrs.csv'
>>> config_file = 'gridwxcomp_config.ini'
>>> # make fishnet of grid cells for interpolation
>>> spatial.make_grid(summary_file, config_file)

The file will be saved as “grid.shp” to a newly created subdirectory “spatial” in the same directory as the input summary CSV file. i.e.:

monthly_ratios/
├── etr_mm_summary_all_yrs.csv
├── etr_mm_summary_comp_all_yrs.csv
└── spatial/
    ├── grid.cpg
    ├── grid.dbf
    ├── grid.prj
    ├── grid.shp
    └── grid.shx

If the grid file already exists the default action is to not overwrite. To overwrite an existing grid if, for example, you are working with a new set of climate stations as input, then set the overwrite keyword argument to True.

>>> spatial.make_grid(summary_file, config_file, overwrite=True,)

Note

If the “grid_resolution” is not assigned in the configuration file a default resolution of 0.1 degrees will be used to make the fishnet.

Raises:

FileNotFoundError – if input summary CSV or configuration INI files are not found.

gridwxcomp.spatial.interpolate(in_path, config_path, layer='all', out=None, scale_factor=1, function='invdist', params=None, z_stats=False, res_plot=True, grid_id_name='GRID_ID', options=None)[source]

Use GDAL_grid methods to interpolate a 2-dimensional surface of calculated bias ratios or other statistics for station/gridded data pairs found in input comprehensive summary CSV.

Options allow for modifying down- or up-scaling the resolution of the resampling grid and to select from multiple interpolation methods. Interploated surfaces are saved as GeoTIFF rasters. Zonal statistics using zonal_stats are also extracted to grid cells in the fishnet grid built first by make_grid.

Parameters:
  • in_path (str) – path to CSV file containing monthly bias ratios, lat, lon, and other data. Created by gridwxcomp.calc_bias_ratios.

  • config_path (str) – path to the configuration input file.

  • layer (str or list) – default ‘all’. Name of variable(s) in in_path to conduct 2-D interpolation. e.g. ‘Annual_mean’.

  • out (str) – default None. Subdirectory to save GeoTIFF raster(s).

  • scale_factor (float, int) – default 1. Scaling factor to apply to original grid resolution to create resampling resolution. If scale_factor = 0.1, the interpolation resolution will be one tenth of the grid resolution listed in the configuration file.

  • function (str) – default ‘invdist’. Interpolation method, gdal methods include: ‘invdist’, ‘indistnn’, ‘linear’, ‘average’, and ‘nearest’ see GDAL grid.

  • params (dict, str, or None) – default None. Parameters for interpolation using gdal, see defaults in gridwxcomp.InterpGdal.

  • z_stats (bool) – default True. Calculate zonal means of interpolated surface to grid cells in fishnet and save to a CSV file. The CSV file will be saved to the same directory as the interpolated raster file(s).

  • res_plot (bool) – default True. Make bar plot for residual (error) between interpolated and station value for layer.

  • grid_id_name (str) – default “GRID_ID”. Name of gridcell identifier

  • options (str or None) – default None. Extra command line arguments for gdal interpolation.

Returns:

None

Examples

Let’s say we wanted to interpolate the “Annual_mean” bias ratio in an input CSV first created by gridwxcomp.calc_bias_ratios and a fishnet grid was first created by make_grid. This example uses the “invdist” method (default) to interpolate to a 0.1 decimal degree grid scaled down to a 0.01 decimal degree surface. The result is a GeoTIFF raster that has an extent that is defined by the bounds in the configuration file. Additionally, point residuals of bias ratios are added to CSV and newly created point shapefiles, zonal (grid cell) means are also extracted and stored in a CSV.

>>> from gridwxcomp import spatial
>>> summary_file = 'monthly_ratios/etr_mm_summary_comp_all_yrs.csv'
>>> layer = 'annual_mean'
>>> params = {'power':1, 'smooth':20}
>>> config_file = 'gridwxcomp_config.ini'
>>> out_dir = 's20_p1' # optional subdir name for saving rasters
>>> interpolate(summary_file, config_file, layer=layer, out=out_dir,
>>>     scale_factor=0.1, params=params)

The resulting file structure that is created by the above command is:

monthly_ratios/
├── etr_mm_summary_all_yrs.csv
├── etr_mm_summary_comp_all_yrs.csv
└── spatial/
    ├── etr_mm_invdist_400m/
    │ └── s20_p1/
    │     ├── annual_mean.tiff
    │     ├── etr_mm_summary_comp_all_yrs.csv
    │     ├── etr_mm_summary_pts_wgs84.cpg
    │     ├── etr_mm_summary_pts_wgs84.dbf
    │     ├── etr_mm_summary_pts_wgs84.prj
    │     ├── etr_mm_summary_pts_wgs84.shp
    │     ├── etr_mm_summary_pts_wgs84.shx
    │     ├── etr_mm_summary_pts_ESRI_102004.cpg
    │     ├── etr_mm_summary_pts_ESRI_102004.dbf
    │     ├── etr_mm_summary_pts_ESRI_102004.prj
    │     ├── etr_mm_summary_pts_ESRI_102004.shp
    │     ├── etr_mm_summary_pts_ESRI_102004.shx
    │     ├── zonal_stats.csv
    │     └── residual_plots
    │         └── annual_res.html
    ├── grid.cpg
    ├── grid.dbf
    ├── grid.prj
    ├── grid.shp
    └── grid.shx

Specifically, the interpolated raster is saved to:

'monthly_ratios/spatial/etr_mm_invdist_400m/s20_p1/annual_mean.tiff'

where the file name and directory is based on the variable being interpolated, methods, and the raster resolution. The out keyword argument lets us add any number of subdirectories to the final output directory, in this case the ‘s20_p1’ dir contains info on params.

The final result will be the creation of the CSV:

'monthly_ratios/spatial/etr_mm_invdist_400m/s20_p1/zonal_stats.csv'

In “zonal_stats.csv” the zonal mean for ratios of annual station to grid ETr will be stored along with grid IDs, e.g.

GRID_ID

annual_mean

515902

0.87439453125

514516

0.888170013427734

513130

0.90002197265625

To calculate zonal statistics of bias ratios that are not part of the default workflow we can assign any numeric layer in the input summary CSV to be interpolations. For example, if we wanted to interpolate the coefficient of variation of the growing season bias ratio “grow_cv”, then we could estimate the surface of this variable straightforwardly,

>>> layer = 'grow_cv'
>>> func = 'invdistnn'
>>> # we can also 'upscale' the interpolation resolution
>>> interpolate(summary_file, config_file, layer=layer,
>>>     scale_factor=2, function=func)

This will create the GeoTIFF raster:

'monthly_ratios/spatial/etr_mm_invdistnn_400m/grow_cv.tiff'

And the zonal means will be added as a column named “grow_cv” to:

'monthly_ratios/spatial/etr_mm_invdistnn_400m/zonal_stats.csv'

As with other components of gridwxcomp, any other climatic variables that exist in the grid dataset can be used along with any corresponding station time series data from the user. The input (in_path) to all routines in gridwxcomp.spatial is the summary CSV created by gridwxcomp.calc_bias_ratios, the prefix to this file is what determines the climatic variable that spatial analysis is conducted. See gridwxcomp.calc_bias_ratios for examples of how to use different climatic variables, e.g. TMax or ETo.

Raises:

FileNotFoundError – if the input summary CSV file or the configuration file do not exist or can’t be found. If the fishnet for extracting zonal statistics does not exist and z_stats==True also raises error. The fishnet i should be in the subdirectory of in_path i.e. “<in_path>/spatial/grid.shp”.

gridwxcomp.spatial.calc_pt_error(in_path, config_file, out_dir, layer, grid_var, grid_id_name='GRID_ID')[source]

Calculate point ratio estimates from interpolated raster, residuals, and add to output summary CSV and point shapefile. Make copies of updated files and saves to directory with interpolated rasters.

The original point shapefiles and summary CSV files that are in the parent directory (inputs to gridwxcomp.spatial.interpolate) will not be updated with the point estimates and residuals because they are specific to a interpolation parameter the copies are made within a interpolation output directory.

The output summary CSV and point shapefile will have two sets of additional columns added to them after running this function, one for each monthly, seasonal, and annual sets of bias results for point estimates with the suffix “_est”, e.g., “Jan_est”, and one for the point residuals between the the calculated point bias and the corresponding interpolated bias at the same location, these will have the suffix “_res”, e.g., “Jan_res”. The reason that the interpolated surface may be different from the point data that was used for interpolation is because the smoothing used for interpolation can result in a difference in the interpolated surface at the point locations. See GDAL grid for more background on this.

Parameters:
  • in_path (str) – path to comprehensive summary CSV created by gridwxcomp.calc_bias_ratios

  • config_file (str) – path to configuration input file

  • out_dir (str) – path to dir that contains interpolated raster

  • layer (str) – layer to calculate error e.g. “annual_mean”

  • grid_var (str) – name of grid variable e.g. “etr_mm”

  • grid_id_name (str) – default ‘GRID_ID’. Name of grid shapefile cell ID for computing zonal statistics and other uses.

Returns:

None

Note

This function should be run after make_points_file because it copies data from the shapefile it created.

gridwxcomp.spatial.zonal_stats(in_path, raster, grid_id_name='GRID_ID')[source]

Calculate zonal means from interpolated surface of etr bias ratios created by interpolate using the fishnet grid created by make_grid. Save mean values for each gridcell to a CSV file joined to grid IDs.

Parameters:
  • in_path (str) – path to [var]_summary_comp_[years].csv file containing monthly bias ratios, lat, long, and other data. Created by gridwxcomp.calc_bias_ratios.

  • raster (str) – path to interpolated raster of bias ratios to be used for zonal stats. First created by interpolate.

Example

Although it is prefered to use this function as part of interpolate or indirectly using the gridwxcomp.spatial command line usage. However, if the grid shapefile and spatial interpolated raster(s) have already been created without zonal stats then,

>>> from gridwxcomp import spatial
>>> # assign input paths
>>> summary_file = 'monthly_ratios/etr_mm_summary_comp_[years].csv'
>>> raster_file = 'monthly_ratios/spatial/etr_mm_invdist_400m/Jan_mean.tiff'
>>> spatial.zonal_stats(summary_file, raster_file)

The final result will be the creation of:

'monthly_ratios/spatial/etr_mm_invdist_400m/grid_stats.csv'

The resulting CSV contains the grid IDS and zonal means for each grid cell in the fishnet which must exist at:

'monthly_ratios/spatial/grid.shp'

also see interpolate

Raises:

FileNotFoundError – if the input summary CSV file or the fishnet for extracting zonal statistics do not exist. The fishnet should be in the subdirectory of in_path at “/spatial/grid.shp”.

Note

If zonal statistics are estimated for the same variable on the same raster more than once, the contents of that column in the zonal_stats.csv file will be overwritten.

plot

Create interactive HTML comparison plots between paired station and gridded climatic variables or bar comparison plots between interpolated and station point data.

gridwxcomp.plot.daily_comparison(input_csv, config_path, dataset_name='gridded', out_dir=None, year_filter=None)[source]

Compare daily weather station data from PyWeatherQAQC with gridded data for each month in year specified.

The daily_comparison function produces HTML files with time series and scatter plots of station versus gridded climate variables. It uses the bokeh module to create interactive plots, e.g. they can be zoomed in/out and panned. Separate plot files are created for each month of a single year.

The scatterplots for each month will allow you to visualize the overall correlation and relationship between the gridded and station variables.

The timeseries plots will show all daily observations of a single month together, which will highlight any months that differ from their neighboring years, ex: a year that had a significantly colder June than other Junes

Parameters:
  • input_csv (str) – path to input CSV file containing paired station/ gridded metadata. This file is created by running gridwxcomp.prep_metadata followed by gridwxcomp.ee_download.

  • config_path (str) – path to the config file that has the parameters used to interpret the station and gridded data files

  • dataset_name (str) – Name of gridded dataset to be used in plots

  • out_dir (str or None) – default None. Directory to save comparison plots, if None save to “daily_comp_plots” in currect directory.

  • year_filter (str or None) – default None. Single year YYYY or range YYYY-YYYY

Returns:

None

Example

The daily_comparison function will generate HTML files with bokeh plots for all paired climate variables within the config file

or within Python,

>>> from gridwxcomp.plot import daily_comparison
>>> daily_comparison('merged_input.csv', 'config_file.ini', 'comp_plots_2016', '2016')

Both methods result in monthly HTML bokeh plots being saved to “comp_plots_2016/STATION_ID/” where “STATION_ID” is the station ID as found in the input CSV file. A file is saved for each month with the station ID, month, and year in the file name. If out_dir keyword argument is not given the plots will be saved to a directory named “daily_comp_plots”.

Note

If there are less than five days of data in a month the plot for that month will not be created.

gridwxcomp.plot.monthly_comparison(input_csv, config_path, dataset_name='gridded', out_dir=None, day_limit=10)[source]

Compare monthly average weather station data from PyWeatherQAQC with the gridded dataset.

The monthly_comparison function produces HTML files with time series and scatter plots of station versus gridded climate variables of monthly mean data. It uses the bokeh module to create interactive plots, e.g. they can be zoomed in/out and panned.

Parameters:
  • input_csv (str) – path to input CSV file containing paired station:gridded metadata. This file is created by running gridwxcomp.prep_metadata followed by gridwxcomp.ee_download.

  • config_path (str) – path to the config file that has the parameters used to interpret the station and gridded data files’

  • dataset_name (str) – Name of gridded dataset to be used in plots

  • out_dir (str) – default None. Directory to save comparison plots.

  • day_limit (int) – default 10. Number of paired days per month that must exist for variable to be plotted.

Returns:

None

Example

The monthly_comparison function will generate HTML files with bokeh plots for paired climate variable, e.g. etr_mm, tmax_c

>>> from gridwxcomp.plot import monthly_comparison
>>> monthly_comparison('merged_input.csv', 'monthly_plots')

Both methods result in monthly HTML bokeh plots being saved to “monthly_plots/” which contains a plot file for each station as found in the input CSV file. If out_dir keyword argument is not given the plots will be saved to a directory named “monthly_comp_plots”.

Note

If there are less than 2 months of data the plot for that station will not be created.

gridwxcomp.plot.station_bar_plot(summary_csv, bar_plot_layer, out_dir=None, y_label=None, title=None, subtitle=None, year_subtitle=True)[source]

Produce an interactive bar chart comparing multiple climate stations to each other for a particular variable, e.g. bias ratios or interpolated residuals.

This function may also be used for any numerical data in the summary CSV files that are created by gridwxcomp.interpolate in addition to those created by gridwxcomp.calc_bias_ratios. The main requirement is that summary_csv must contain the column ‘STATION_ID’ and the bar_plot_layer keyword argument.

Parameters:
  • summary_csv (str, Path) – path to summary CSV produced by either gridwxcomp.calc_bias_ratios or by gridwxcomp.interpolate. Should contain bar_plot_layer data for plot.

  • bar_plot_layer (str) – name of variable to plot.

  • out_dir (str or None) – default None. Output directory path, default is ‘station_bar_plots’ in parent directory of summary_csv.

  • y_label (str or None) – default None. Label for y-axis, defaults to bar_plot_layer.

  • title (str or None) – default None. Title of plot.

  • subtitle (str, list, or None) – default None. Additional subtitle(s) for plot.

  • year_subtitle (bool) – default True. If true print subtitle on plot with the max year range used for station data, e.g. ‘years: 1995-2005’

Example

Let’s say we want to compare the mean growing seasion bias ratios of reference evapotranspiration (ETr) for the selection of stations we used to calculate bias ratios.

The summary CSV file containing the ratios should be first created using gridwxcomp.calc_bias_ratios.

>>> from gridwxcomp.plot import station_bar_plot
>>> # path to summary CSV with station data
>>> in_file = 'monthly_ratios/etr_mm_summary_all_yrs.csv'
>>> example_layer = 'grow_mean'
>>> station_bar_plot(in_file, example_layer)

The resulting file will be saved using the bar_plot_layer name as a file name:

'monthly_ratios/station_bar_plots/grow_mean.html'

The plot file will contain the mean growing season bias ratios of ETr for each station, sorted from smallest to largest values.

Raises:
  • FileNotFoundError – if summary_csv is not found.

  • KeyError – if bar_plot_layer does not exist as a column name in summary_csv.

InterpGdal

class gridwxcomp.InterpGdal(summary_csv_path, config_path)[source]

Bases: object

Usage of gdal command line tools within gridwxcomp, currently utilizes the GDAL grid command line tool.

Parameters:
  • summary_csv_path (str) – path to [var]_summary_comp CSV file created by gridwxcomp.calc_bias_ratios.calc_bias_ratios containing point bias ratios and statistics, station coordinates, etc.

  • config_path (str) – path to the input configuration file with control parameters such as spatial coordinate reference system, resolution, and the bounds for spatial interpolation.

summary_csv_path

absolute path object to input summary_csv_path.

Type:

pathlib.Path

config_path

absolute path to configuration input file.

Type:

pathlib.Path

config_dict

dictionary of items read from the configuration file.

Type:

dict

layers

list of layers in summary_csv_path to interpolate e.g. when using InterpGdal.gdal_grid with layer="all" defaults to InterpGdal.default_layers.

Type:

list

grid_bounds

default None. Extent for interpolation raster in order (min long, max long, min lat, max lat).

Type:

tuple or None

interp_meth

default ‘invdist’. gdal_grid interpolation method.

Type:

str

interped_rasters

empty list that is appended with pathlib.Path objects to interpolated rasters after using InterpGdal.gdal_grid.

Type:

list

params

default None. After InterpGdal.gdal_grid params is updated with the last used interpolation parameters in the form of a dictionary with parameter names as keys.

Type:

dict or None

Example

The InterpGdal class is useful for experimenting with multiple interpolation algorithms provided by gdal which are optimized and often sensitive to multiple parameters. We can use the object to loop over a range of parameter combinations to test how algorithms perform, we might pick a single layer to test, in this case the growing season bias ratios between station and gridded reference evapotranspiration (etr_mm). Below is a routine to conduct a basic sensitivity analysis of the power and smooth parameters of the inverse distance to a power interpolation method

>>> from gridwxcomp.interpgdal import InterpGdal
>>> import os
>>> # create instance variable from summary csv
>>> summary_file = 'PATH/TO/etr_mm_summary_comp.csv'
>>> config_file = 'PATH/TO/gridwxcomp_config.ini'
>>> # create a InterpGdal instance
>>> test = InterpGdal(summary_file)
>>> layer = 'grow_mean' # could be a list
>>> # run inverse distance interpolation with different params
>>> for p in range(1,10):
>>>     for s in [0, 0.5, 1, 1.5, 2]:
>>>         # build output directory based on parameter sets
>>>         out_dir = os.path.join('spatial', 'invdist',
>>>             'power={}_smooth={}'.format(p, s))
>>>         params = {'power': p, 'smooth': s}
>>>         test.gdal_grid(out_dir=out_dir, layer=layer, params=params)

Note, we did not assign the interpolation method ‘invdist’ because it is the default. To use another interpolation method we would assign the interp_meth kwarg to InterpGdal.gdal_grid. Similarly, we could experiment with other parameters which all can be found in the class attribute InterpGdal.default_params. The instance variable InterpGdal.params can also be used to save metadata on parameters used for each run.

interp_methods = ('average', 'invdist', 'invdistnn', 'linear', 'nearest')

gdal_grid interpolation algorithms.

Type:

interp_methods (tuple)

default_layers = ('Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec', 'grow', 'annual', 'summer')

Layers to interpolate created by gridwxcomp.calc_bias_ratios.calc_bias_ratios and then gridwxcomp.spatial.make_points_file, e.g. “Jan” in the shapefile which is “Jan_mean” found in summary_csv_path.

Type:

default_layers (tuple)

default_params = {'average': {'angle': 0, 'min_points': 0, 'nodata': -999, 'radius1': 0, 'radius2': 0}, 'invdist': {'angle': 0, 'max_points': 0, 'min_points': 0, 'nodata': -999, 'power': 2, 'radius1': 0, 'radius2': 0, 'smoothing': 0}, 'invdistnn': {'max_points': 12, 'min_points': 0, 'nodata': -999, 'power': 2, 'radius': 10, 'smoothing': 0}, 'linear': {'nodata': -999, 'radius': -1}, 'nearest': {'angle': 0, 'nodata': -999, 'radius1': 0, 'radius2': 0}}

Dictionary with default parameters for each interpolation algorithm, slightly modified from GDAL defaults. Keys are interpolation method names, keys are dictionaries with parameter names keys and corresponding values.

Type:

default_params (dict)

gdal_grid(layer='all', out_dir='', interp_meth='invdist', params=None, scale_factor=1, z_stats=True, res_plot=True, grid_id_name='GRID_ID', options=None)[source]

Run gdal_grid command line tool to interpolate point ratios.

For further information on theinterpolation algorithms including their function, parameters, and options see gdal_grid.

Parameters:
  • layer (str, list) – default ‘all’. Name of summary file column to interpolate, e.g. ‘Jan_mean’, or list of names. If ‘all’ use all variables in mutable instance attribute “layers”.

  • out_dir (str) – default ‘’. Output directory to save rasters and zonal stats CSV, always appended to the root dir of the input summary CSV parent path that contains point ratios.

  • interp_meth (str) – default ‘invdist’. gdal interpolation algorithm

  • params (dict, str, or None) – default None. Parameters for interpolation algorithm. See examples for format rules.

  • bounds (tuple) – default None. Tuple of bounding coordinates in the following order (min long, max long, min lat, max lat) which need to be in decimal degrees or meters.

  • scale_factor (float, int) – default 1. Scaling factor to apply to original grid resolution to create resampling resolution. If scale_factor = 0.1, the interpolation resolution will be one tenth of the grid resolution listed in the config file.

  • z_stats (bool) – default True. Calculate zonal means of interpolated surface to gridded cells in fishnet and save to a CSV file. The CSV file will be saved to the same directory as the interpolated raster file(s).

  • res_plot (bool) – default True. Make bar plot for residual (error) between interpolated and station value for layer.

  • options (str or None) – default None. Extra command line options for gdal_grid spatial interpolation.

Returns:

None

Examples

The default interpolation algorithm ‘invdist’ or inverse distance weighting to a power to interpolate bias ratios in a summary CSV file that was first created by gridwxcomp.calc_bias_ratios. The default option will interpolate all layers in InterpGdal.default_layers and calculate zonal statistics for all layers. The interpolation resolution and projection are specified by the user in the configuration file which was used to create the gridwxcomp.InterpGdal object, however if they are not specified there, the fallback used is Lambert Conformal Conic projected coordinate reference system and 1000 m resolution. This example limits the interpolation to two layers, growing season and annual mean bias ratios,

>>> from gridwxcomp.interpgdal import InterpGdal
>>> summary_file = 'PATH/TO/[var]_summary_comp.csv'
>>> out_dir = 'default_params'
>>> layers = ['grow_mean', 'annual_mean']
>>>
>>> # create a InterpGdal instance
>>> test = InterpGdal(summary_file)
>>> # run inverse distance interpolation
>>> test.gdal_grid(out_dir=out_dir, layer=layers)

Note, zonal statistics to gridded cells and interpolated residual plots are computed by default. A gridded fishnet must have been previously created using gridwxcomp.spatial.make_grid.

After running the code above the following files will be created in the ‘default_params’ directory which will be built in the same location as the input summary CSV:

default_params/
├── annual_mean.tiff
├── annual_mean_ESRI_102004.tiff
├── etr_mm_summary_comp_all_yrs.csv
├── etr_mm_summary_pts_wgs84.cpg
├── etr_mm_summary_pts_wgs84.dbf
├── etr_mm_summary_pts_wgs84.prj
├── etr_mm_summary_pts_wgs84.shp
├── etr_mm_summary_pts_wgs84.shx
├── etr_mm_summary_pts_ESRI_102004.cpg
├── etr_mm_summary_pts_ESRI_102004.dbf
├── etr_mm_summary_pts_ESRI_102004.prj
├── etr_mm_summary_pts_ESRI_102004.shp
├── etr_mm_summary_pts_ESRI_102004.shx
├── zonal_stats.csv
├── grow_mean.tiff
├── grow_mean_ESRI_102004.tiff
└── residual_plots/
    ├── annual_res.html
    └── grow_res.html

GeoTiff interpolated raster files are now created for the select layers. The file “zonal_stats.csv” contains grid_id as an index and each layer zonal mean as columns. For example,

grid_id

grow_mean

annual_mean

511747

0.9650671287940088

0.9078723876243023

510361

0.9658465063428492

0.9097255715561022

508975

0.9667075970344162

0.9117676407214926

There are several InterpGdal instance attributes that may be useful, for example to see the parameters that were used for the last call to InterpGdal.gdal_grid

>>> test.params
{'power': 2,
 'smoothing': 0,
 'radius1': 0,
 'radius2': 0,
 'angle': 0,
 'max_points': 0,
 'min_points': 0,
 'nodata': -999}

Or to find the paths to the interpolated raster files that have been created by the instance (all), the “interped_rasters” instance attribute is a list of all pathlib.Path objects of absolute paths of raster files. To get them as strings,

>>> list(map(str, test.interped_rasters))
['PATH/TO/grow_mean.tiff',
 'PATH/TO/annual_mean.tiff']

Similary, the raster extent that was used and will be used again for any subsequent calls of InterpGdal.gdal_grid can be retrieved by

>>> test.grid_bounds
(-111.74583329966664,
 -108.74583330033335,
 38.21250000003333,
 40.462499999966674)
Raises:

KeyError – if interp_meth is not a valid gdal_grid interpolation algorithm name.

util

Utility functions or classes for gridwxcomp package

gridwxcomp.util.affine_transform(img)[source]

Get the affine transform of the image as an EE object

Parameters:

img – ee.Image object

Returns

ee.List object

gridwxcomp.util.parse_yr_filter(dt_df, years, label)[source]

Parse string year filter and apply it to datetime-indexed DataFrame.

Parameters:
  • dt_df (pandas.DataFrame) – datetime-indexed DataFrame

  • years (str or int) – years to select, e.g. 2015 or 2000-2010

  • label (str) – identifier to print warning message if years filter partially overlaps with actual date index

Returns:

first element is

input DataFrame dt_df indexed to years filter, second element is string of year range, e.g. ‘2001_2011’

Return type:

ret (tuple of (pandas.DataFrame, str))

Example

>>> df = pd.DataFrame(index=pd.date_range('2000', '2015'))
>>> df, yr_str = parse_yr_filter(df, '1998-2002', 'station1')
WARNING: data for station1 starts in 2000 but you gave 1998
Years used will only include 2000 to 2002

Now df will only contain indices with dates between 2000 and 2002 and

>>> yr_str
'1998_2002'
Raises:

ValueError – if years is invalid or not found in time series index of DataFrame.

gridwxcomp.util.validate_file(file_path, expected_extensions)[source]

Checks to see if provided path is valid, while also checking to see if file is of expected type. Raises exceptions if either of those fail.

Parameters:
  • file_path – string of path to file

  • expected_extensions – list of strings of expected file types

Returns:

None

gridwxcomp.util.read_config(config_file_path)[source]

Opens config file at provided path and stores all required values in a python dictionary. This dictionary will be used both to import data and elsewhere in the code to refer to what type of data was passed in

Parameters:

config_file_path – string of path to config file

Returns:

a dictionary of all required config file parameters

Return type:

config_dict

gridwxcomp.util.read_data(config_dictionary, version, filepath)[source]

Uses config_dict parameters to read in the data and rename it to standard parameters

Parameters:
  • config_dictionary – dictionary of everything

  • version – a string that will be either ‘station’ or ‘gridded’

Returns:

a dataframe containing only the variable we want to plot, with a standardized naming convention

Return type:

filtered_df

gridwxcomp.util.convert_units(config_dictionary, version, df)[source]

Uses config_dict parameters to check what units provided variables are in and convert them if needed

Parameters:
  • config_dictionary – dictionary of everything contained within config file

  • version – a string that will be either ‘station’ or ‘gridded’

  • df – pandas dataframe of input data, at this point naming of dataframe columns has been standardized

Returns:

a dataframe containing data in the correct units

Return type:

converted_df

gridwxcomp.util.reproject_crs_for_point(orig_lon, orig_lat, orig_crs, requested_crs)[source]
Uses the pyproj library to reproject point data from one CRS to another

ex. will be used to make input coords wgs84 for earth engine

Will return original data without any reprojection if orig_crs

and requested_crs are the same

Parameters:
  • orig_lon – float of original longitude

  • orig_lat – float of original latitude

  • orig_crs – string of EPSG code for orig_lat and orig_lon

  • requested_crs – string of EPSG code to reproject into

Returns:

Reprojected latitude and longitude for point

gridwxcomp.util.reproject_crs_for_bounds(bounds, resolution, orig_crs, requested_crs, requested_decimals)[source]
Uses the pyproj library to reproject dictionary of bounds for

interpolation extent. This is done in more than just two calls (ex. NW and SE corners) as some projections may have curvature

Afterwords it rounds the coordinates to the requested decimals

If orig_crs and requested_crs are the same it will just round the coords

without reprojecting

Parameters:
  • bounds – dictionary of bounds, containing the following keys: xmin, xmax, ymin, ymax

  • resolution – resolution used for interpolation, coordinates will be rounded in an attempt to snap to grid

  • orig_crs – string of EPSG code for original bounds

  • requested_crs – string of EPSG code to reprojected bounds

  • requested_decimals – int of number of decimals to round coords to

Returns:

Reprojected bounds into new CRS