API Reference¶
This page documents all routines provided by gridwxcomp. Tools for different routines are listed roughly in the order that they would normally be used.
Python functions, classes, and modules¶
prep_metadata¶
- gridwxcomp.prep_metadata(station_path, config_path, grid_name, out_path='formatted_input.csv')[source]¶
Read list of climate stations in metadata and verify all needed parameters exist. An output CSV file is saved that will be the formatted in a way that is standardized for the variables that are needed by the subsequent Earth Engine download and bias calculation modules.
Station time series files must be in the same directory as the main input to this function, i.e., the station_path metadata file.
- Parameters:
station_path (str) – path to CSV file containing metadata of climate stations that will later be used to calculate bias ratios to the gridded dataset.
config_path (str) – path to config file containing projection info
grid_name (str) – name of the gridded dataset that is being used for comparison against observed data.
out_path (str) – path to save output CSV, default is to save as ‘merged_input.csv’ to current working directory.
- Returns:
None
Example
>>> from gridwxcomp import prep_metadata >>> prep_metadata('example_metadata.txt','outfile.csv')
outfile.csv will be created containing station and corresponding gridded data. This file is later used as input for
gridwxcomp.ee_downloadandgridwxcomp.calc_bias_ratios.Important
Make sure the following column headers exist in your input station metadata file (
station_path) and are spelled exactly:Latitude
Longitude
Station
Filename
Also, the “Filename” column should match the names of the climate time series files that should be in the same directory as the station metadata file. For example, if one of the time series files is named “Bluebell_daily_data.csv” then the following are permissiable entries as the “Filename”: “Bluebell_daily_data” or “Bluebell_daily_data.csv”.
- Raises:
ValueError – if one or more of the following mandatory columns are missing from the input CSV file (
station_pathparameter): ‘Longitude’, ‘Latitude’, ‘Station’, or ‘Filename’.
ee_download¶
This module has tools to download timeseries climate data from gridded climate
data collections that are hosted on Google’s Earth Engine. It reads the
formatted file that was prepared using the gridwxcomp.prep_metadata
module and uses the coordinate information there along with the variable names
specified in the configuration .INI file to know which data to download and
for which geographic locations which are paired with the station locations.
- gridwxcomp.ee_download.download_grid_data(metadata_path, config_path, local_folder=None, force_download=False)[source]¶
Takes in the metadata file generated by
gridwxcomp.prep_metadataand downloads the corresponding point data for all stations within. This function requires the dataset be accessible in the user’s Google Earth Engine account, and the image collection name and path should be specified in the configuration .INI file (i.e., in theconfig_pathfile).The metadata file will be updated for the path the gridded data files are downloaded to.
- Parameters:
metadata_path (str) – path to the metadata path generated by
gridwxcomp.prep_metadataconfig_path (str) – path to config file containing catalog info
local_folder (str) – folder to download point data to
force_download (bool) – will re-download all data even if local file already exists
- Returns:
None
- Note: You must authenticate with Google Earth Engine before using
this function.
calc_bias_ratios¶
Calculate monthly bias ratios of variables from climate station to overlapping gridded dataset cells.
Input file for this module must first be created by running
gridwxcomp.prep_metadata followed by gridwxcomp.ee_download.
- Note: The module is reliant on values within the specified config file
in order to interpret the station and gridded data values successfully. If you are experiencing errors or bad values please check it is set up correctly.
- gridwxcomp.calc_bias_ratios.calc_bias_ratios(input_path, config_path, out_dir, method='long_term_mean', grid_id_name='GRID_ID', comparison_var='etr', grid_id=None, day_limit=10, years='all', comp=True)[source]¶
Read input metadata CSV file and config file, use them to calculate mean monthly bias ratios between station to corresponding grid cells for all station and grid pairs, optionally calculate ratios for a single gridcell.
- Parameters:
input_path (str) – path to input CSV file with matching station climate and grid metadata. This file is created by running
gridwxcomp.prep_metadatafollowed bygridwxcomp.ee_download.config_path (str) – path to the configuration file that has the parameters used to interpret the station and gridded data files.
out_dir (str) – path to directory to save CSV files with monthly bias ratios of etr.
method (str) – default ‘long_term_mean’. How to calculate mean station to grid ratios, currently two options ‘long_term_mean’ takes the mean of all dates for the station variable that fall in a time periods, e.g. the month of January, to the mean of all paired January dates in the gridded product. The other option is ‘mean_of_annual’ which calculates ratios, for each time period if enough paired days exist, the ratio of sums for each year in the record and then takes the mean of the annual ratios. This method is always used to calculate standard deviation and coefficient of variation of ratios which describe interannual variation of ratios.
grid_id_name (str) – default ‘GRID_ID’. Name of column containing index/cell identifiers for gridded dataset.
comparison_var (str) – default ‘etr’. Grid climate variable to calculate bias ratios. This value must be found within the module parameter VAR_LIST.
grid_id (int or str or None) – default None. Grid ID (int cell identifier) to only calculate bias ratios for a single gridcell.
day_limit (int) – default 10. Threshold number of days in month of missing data, if less, exclude month from calculations. Ignored when
method='long_term_mean'.years (int or str) – default ‘all’. Years to use for calculations e.g. 2000-2005 or 2011.
comp (bool) – default True. Flag to save a “comprehensive” summary output CSV file that contains station metadata and statistics in addition to the mean monthly ratios.
- Returns:
None
Example
To use within Python for observed ETo
>>> from gridwxcomp import calc_bias_ratios >>> input_file = 'prepped_metadata.csv' >>> config_file = 'gridwxcomp_config.ini' >>> out_directory = 'monthly_ratios' >>> grid_variable = 'eto' >>> calc_bias_ratios(input_file, config_file, out_directory, comparison_var=grid_variable, comp=True)
This results in two CSV files in
out_directorynamed “eto_summary_all_yrs.csv” and “eto_summary_comp_all_yrs.csv”.- Raises:
FileNotFoundError – if input file or config file are invalid or not found.
KeyError – if the input file does not contain file paths to the climate station and grid time series files. This occurs if, for example, the
gridwxcomp.prep_metadataand/orgridwxcomp.ee_downloadscripts have not been run first. Also raised if the given the values specified in the config file are not found within the station and gridded data files.ValueError – if the
methodkwarg is invalid.
Note
Growing season and summer periods over which ratios are calculated are defined as April through October and June through August respectively.
Note
If an existing summary file contains a climate station that is being reprocessed its monthly bias ratios and other data will be overwritten. Also, to proceed with spatial analysis scripts, the comprehensive summary file must be produced using this function first.
spatial¶
Perform multiple workflows needed to estimate the spatial surface (and
other related outputs) of monthly and annual station-to-grid bias ratios for
meteorological variables. The input file is first created by
gridwxcomp.calc_bias_ratios.
- gridwxcomp.spatial.PT_ATTRS¶
all attributes expected to be in point shapefile created for stations except station and Grid IDs.
- Type:
Note
All spatial files, i.e. vector and raster files, utilize the ESRI Shapefile or GeoTiff format.
- gridwxcomp.spatial.make_points_file(in_path, config_path, grid_id_name='GRID_ID')[source]¶
Create vector shapefile of points with monthly mean bias ratios for climate stations using all stations found in a comprehensive CSV file created by
gridwxcomp.calc_bias_ratios.- Parameters:
in_path (str) – path to [var]_summary_comp.CSV file containing monthly bias ratios, lat, long, and other data. Shapefile “[var]_summary_pts.shp” is saved to parent directory of
in_pathunder “spatial” subdirectory.config_path (str) – path to the configuration file that has the parameters used to interpret the station and gridded data files.
grid_id_name (str) – name of the column containing grid ID’s
- Returns:
None
Example
Create shapefile containing point data for all climate stations in input summary file created by
gridwxcomp.calc_bias_ratios>>> from gridwxcomp import spatial >>> # path to comprehensive summary CSV >>> summary_file = 'monthly_ratios/etr_mm_summary_comp_all_yrs.csv' >>> config_file = 'gridwxcomp_config.ini' >>> spatial.make_points_file(summary_file, config_file)
The result is the file “etr_mm_summary_pts.shp” being saved to a subdirectory created in the directory containing
in_pathnamed “spatial”, i.e.:"monthly_ratios/spatial/etr_mm_summary_pts_wgs84.shp".
This file has the points projected in the WGS 84 geographic coordinate system. Another point shapefile will also be made if the “interpolation_projection” was listed in the user provided configuration file as a coordinate reference system that differs from WGS84, i.e., a projected coordinate system. The user can provide a EPSG code or an ESRI code such as ESRI:102004 which refers to a coordinate reference system and the other shapefile will then have the following path and suffix:
"monthly_ratios/spatial/etr_mm_summary_pts_ESRI_102004.shp".
- Raises:
FileNotFoundError – if input summary CSV or configuration INI files are not found.
Note
make_points_filewill overwrite any existing point shapefile of the same climate variable. If no “interpolation_projection” option is listed in the configuration file’s METADATA section the default will be ESRI:102004 which refers to the Lambert Conformal Conic projected coordinate system.
- gridwxcomp.spatial.make_grid(in_path, config_path, overwrite=False, grid_id_name='GRID_ID')[source]¶
Make fishnet grid (vector file of polygon geometry) for select gridcells based on bounding coordinates. Each cell in the grid will be assigned a unique numerical identifier (property specified by grid_id_name) and the grid will be in the WGS84 coordinate system.
Modified from the Python GDAL/OGR Cookbook.
- Parameters:
in_path (str) – path to [var]_summary_comp_[years].csv file containing monthly bias ratios, lat, long, and other data. Created by
gridwxcomp.calc_bias_ratios.config_path (str) – path to the configuration file that has the parameters used to interpret the station and gridded data files.
overwrite (bool) – default False. If True, overwrite the grid shapefile at
out_pathif it already exists.grid_id_name (str) – default “GRID_ID”. Name of gridcell identifier column
- Returns:
None
Examples
Build a fishnet uniform grid that is defined by the bounds and resolution defined in the METADATA section of the configuration file. These parameters should be provided in decimal degrees.
>>> from gridwxcomp import spatial >>> # assign input paths >>> summary_file = 'monthly_ratios/etr_mm_summary_comp_all_yrs.csv' >>> config_file = 'gridwxcomp_config.ini' >>> # make fishnet of grid cells for interpolation >>> spatial.make_grid(summary_file, config_file)
The file will be saved as “grid.shp” to a newly created subdirectory “spatial” in the same directory as the input summary CSV file. i.e.:
monthly_ratios/ ├── etr_mm_summary_all_yrs.csv ├── etr_mm_summary_comp_all_yrs.csv └── spatial/ ├── grid.cpg ├── grid.dbf ├── grid.prj ├── grid.shp └── grid.shxIf the grid file already exists the default action is to not overwrite. To overwrite an existing grid if, for example, you are working with a new set of climate stations as input, then set the
overwritekeyword argument to True.>>> spatial.make_grid(summary_file, config_file, overwrite=True,)
Note
If the “grid_resolution” is not assigned in the configuration file a default resolution of 0.1 degrees will be used to make the fishnet.
- Raises:
FileNotFoundError – if input summary CSV or configuration INI files are not found.
- gridwxcomp.spatial.interpolate(in_path, config_path, layer='all', out=None, scale_factor=1, function='invdist', params=None, z_stats=False, res_plot=True, grid_id_name='GRID_ID', options=None)[source]¶
Use GDAL_grid methods to interpolate a 2-dimensional surface of calculated bias ratios or other statistics for station/gridded data pairs found in input comprehensive summary CSV.
Options allow for modifying down- or up-scaling the resolution of the resampling grid and to select from multiple interpolation methods. Interploated surfaces are saved as GeoTIFF rasters. Zonal statistics using
zonal_statsare also extracted to grid cells in the fishnet grid built first bymake_grid.- Parameters:
in_path (str) – path to CSV file containing monthly bias ratios, lat, lon, and other data. Created by
gridwxcomp.calc_bias_ratios.config_path (str) – path to the configuration input file.
layer (str or list) – default ‘all’. Name of variable(s) in
in_pathto conduct 2-D interpolation. e.g. ‘Annual_mean’.out (str) – default None. Subdirectory to save GeoTIFF raster(s).
scale_factor (float, int) – default 1. Scaling factor to apply to original grid resolution to create resampling resolution. If scale_factor = 0.1, the interpolation resolution will be one tenth of the grid resolution listed in the configuration file.
function (str) – default ‘invdist’. Interpolation method, gdal methods include: ‘invdist’, ‘indistnn’, ‘linear’, ‘average’, and ‘nearest’ see GDAL grid.
params (dict, str, or None) – default None. Parameters for interpolation using gdal, see defaults in
gridwxcomp.InterpGdal.z_stats (bool) – default True. Calculate zonal means of interpolated surface to grid cells in fishnet and save to a CSV file. The CSV file will be saved to the same directory as the interpolated raster file(s).
res_plot (bool) – default True. Make bar plot for residual (error) between interpolated and station value for
layer.grid_id_name (str) – default “GRID_ID”. Name of gridcell identifier
options (str or None) – default None. Extra command line arguments for gdal interpolation.
- Returns:
None
Examples
Let’s say we wanted to interpolate the “Annual_mean” bias ratio in an input CSV first created by
gridwxcomp.calc_bias_ratiosand a fishnet grid was first created bymake_grid. This example uses the “invdist” method (default) to interpolate to a 0.1 decimal degree grid scaled down to a 0.01 decimal degree surface. The result is a GeoTIFF raster that has an extent that is defined by the bounds in the configuration file. Additionally, point residuals of bias ratios are added to CSV and newly created point shapefiles, zonal (grid cell) means are also extracted and stored in a CSV.>>> from gridwxcomp import spatial >>> summary_file = 'monthly_ratios/etr_mm_summary_comp_all_yrs.csv' >>> layer = 'annual_mean' >>> params = {'power':1, 'smooth':20} >>> config_file = 'gridwxcomp_config.ini' >>> out_dir = 's20_p1' # optional subdir name for saving rasters >>> interpolate(summary_file, config_file, layer=layer, out=out_dir, >>> scale_factor=0.1, params=params)
The resulting file structure that is created by the above command is:
monthly_ratios/ ├── etr_mm_summary_all_yrs.csv ├── etr_mm_summary_comp_all_yrs.csv └── spatial/ ├── etr_mm_invdist_400m/ │ └── s20_p1/ │ ├── annual_mean.tiff │ ├── etr_mm_summary_comp_all_yrs.csv │ ├── etr_mm_summary_pts_wgs84.cpg │ ├── etr_mm_summary_pts_wgs84.dbf │ ├── etr_mm_summary_pts_wgs84.prj │ ├── etr_mm_summary_pts_wgs84.shp │ ├── etr_mm_summary_pts_wgs84.shx │ ├── etr_mm_summary_pts_ESRI_102004.cpg │ ├── etr_mm_summary_pts_ESRI_102004.dbf │ ├── etr_mm_summary_pts_ESRI_102004.prj │ ├── etr_mm_summary_pts_ESRI_102004.shp │ ├── etr_mm_summary_pts_ESRI_102004.shx │ ├── zonal_stats.csv │ └── residual_plots │ └── annual_res.html ├── grid.cpg ├── grid.dbf ├── grid.prj ├── grid.shp └── grid.shxSpecifically, the interpolated raster is saved to:
'monthly_ratios/spatial/etr_mm_invdist_400m/s20_p1/annual_mean.tiff'where the file name and directory is based on the variable being interpolated, methods, and the raster resolution. The
outkeyword argument lets us add any number of subdirectories to the final output directory, in this case the ‘s20_p1’ dir contains info on params.The final result will be the creation of the CSV:
'monthly_ratios/spatial/etr_mm_invdist_400m/s20_p1/zonal_stats.csv'In “zonal_stats.csv” the zonal mean for ratios of annual station to grid ETr will be stored along with grid IDs, e.g.
GRID_ID
annual_mean
515902
0.87439453125
514516
0.888170013427734
513130
0.90002197265625
…
…
To calculate zonal statistics of bias ratios that are not part of the default workflow we can assign any numeric layer in the input summary CSV to be interpolations. For example, if we wanted to interpolate the coefficient of variation of the growing season bias ratio “grow_cv”, then we could estimate the surface of this variable straightforwardly,
>>> layer = 'grow_cv' >>> func = 'invdistnn' >>> # we can also 'upscale' the interpolation resolution >>> interpolate(summary_file, config_file, layer=layer, >>> scale_factor=2, function=func)
This will create the GeoTIFF raster:
'monthly_ratios/spatial/etr_mm_invdistnn_400m/grow_cv.tiff'And the zonal means will be added as a column named “grow_cv” to:
'monthly_ratios/spatial/etr_mm_invdistnn_400m/zonal_stats.csv'As with other components of
gridwxcomp, any other climatic variables that exist in the grid dataset can be used along with any corresponding station time series data from the user. The input (in_path) to all routines ingridwxcomp.spatialis the summary CSV created bygridwxcomp.calc_bias_ratios, the prefix to this file is what determines the climatic variable that spatial analysis is conducted. Seegridwxcomp.calc_bias_ratiosfor examples of how to use different climatic variables, e.g. TMax or ETo.- Raises:
FileNotFoundError – if the input summary CSV file or the configuration file do not exist or can’t be found. If the fishnet for extracting zonal statistics does not exist and
z_stats==Truealso raises error. The fishnet i should be in the subdirectory ofin_pathi.e. “<in_path>/spatial/grid.shp”.
- gridwxcomp.spatial.calc_pt_error(in_path, config_file, out_dir, layer, grid_var, grid_id_name='GRID_ID')[source]¶
Calculate point ratio estimates from interpolated raster, residuals, and add to output summary CSV and point shapefile. Make copies of updated files and saves to directory with interpolated rasters.
The original point shapefiles and summary CSV files that are in the parent directory (inputs to
gridwxcomp.spatial.interpolate) will not be updated with the point estimates and residuals because they are specific to a interpolation parameter the copies are made within a interpolation output directory.The output summary CSV and point shapefile will have two sets of additional columns added to them after running this function, one for each monthly, seasonal, and annual sets of bias results for point estimates with the suffix “_est”, e.g., “Jan_est”, and one for the point residuals between the the calculated point bias and the corresponding interpolated bias at the same location, these will have the suffix “_res”, e.g., “Jan_res”. The reason that the interpolated surface may be different from the point data that was used for interpolation is because the smoothing used for interpolation can result in a difference in the interpolated surface at the point locations. See GDAL grid for more background on this.
- Parameters:
in_path (str) – path to comprehensive summary CSV created by
gridwxcomp.calc_bias_ratiosconfig_file (str) – path to configuration input file
out_dir (str) – path to dir that contains interpolated raster
layer (str) – layer to calculate error e.g. “annual_mean”
grid_var (str) – name of grid variable e.g. “etr_mm”
grid_id_name (str) – default ‘GRID_ID’. Name of grid shapefile cell ID for computing zonal statistics and other uses.
- Returns:
None
Note
This function should be run after
make_points_filebecause it copies data from the shapefile it created.
- gridwxcomp.spatial.zonal_stats(in_path, raster, grid_id_name='GRID_ID')[source]¶
Calculate zonal means from interpolated surface of etr bias ratios created by
interpolateusing the fishnet grid created bymake_grid. Save mean values for each gridcell to a CSV file joined to grid IDs.- Parameters:
in_path (str) – path to [var]_summary_comp_[years].csv file containing monthly bias ratios, lat, long, and other data. Created by
gridwxcomp.calc_bias_ratios.raster (str) – path to interpolated raster of bias ratios to be used for zonal stats. First created by
interpolate.
Example
Although it is prefered to use this function as part of
interpolateor indirectly using thegridwxcomp.spatialcommand line usage. However, if the grid shapefile and spatial interpolated raster(s) have already been created without zonal stats then,>>> from gridwxcomp import spatial >>> # assign input paths >>> summary_file = 'monthly_ratios/etr_mm_summary_comp_[years].csv' >>> raster_file = 'monthly_ratios/spatial/etr_mm_invdist_400m/Jan_mean.tiff' >>> spatial.zonal_stats(summary_file, raster_file)
The final result will be the creation of:
'monthly_ratios/spatial/etr_mm_invdist_400m/grid_stats.csv'The resulting CSV contains the grid IDS and zonal means for each grid cell in the fishnet which must exist at:
'monthly_ratios/spatial/grid.shp'also see
interpolate- Raises:
FileNotFoundError – if the input summary CSV file or the fishnet for extracting zonal statistics do not exist. The fishnet should be in the subdirectory of
in_pathat “/spatial/grid.shp”.
Note
If zonal statistics are estimated for the same variable on the same raster more than once, the contents of that column in the zonal_stats.csv file will be overwritten.
plot¶
Create interactive HTML comparison plots between paired station and gridded climatic variables or bar comparison plots between interpolated and station point data.
- gridwxcomp.plot.daily_comparison(input_csv, config_path, dataset_name='gridded', out_dir=None, year_filter=None)[source]¶
Compare daily weather station data from PyWeatherQAQC with gridded data for each month in year specified.
The
daily_comparisonfunction produces HTML files with time series and scatter plots of station versus gridded climate variables. It uses the bokeh module to create interactive plots, e.g. they can be zoomed in/out and panned. Separate plot files are created for each month of a single year.The scatterplots for each month will allow you to visualize the overall correlation and relationship between the gridded and station variables.
The timeseries plots will show all daily observations of a single month together, which will highlight any months that differ from their neighboring years, ex: a year that had a significantly colder June than other Junes
- Parameters:
input_csv (str) – path to input CSV file containing paired station/ gridded metadata. This file is created by running
gridwxcomp.prep_metadatafollowed bygridwxcomp.ee_download.config_path (str) – path to the config file that has the parameters used to interpret the station and gridded data files
dataset_name (str) – Name of gridded dataset to be used in plots
out_dir (str or None) – default None. Directory to save comparison plots, if None save to “daily_comp_plots” in currect directory.
year_filter (str or None) – default None. Single year YYYY or range YYYY-YYYY
- Returns:
None
Example
The
daily_comparisonfunction will generate HTML files with bokeh plots for all paired climate variables within the config fileor within Python,
>>> from gridwxcomp.plot import daily_comparison >>> daily_comparison('merged_input.csv', 'config_file.ini', 'comp_plots_2016', '2016')
Both methods result in monthly HTML bokeh plots being saved to “comp_plots_2016/STATION_ID/” where “STATION_ID” is the station ID as found in the input CSV file. A file is saved for each month with the station ID, month, and year in the file name. If
out_dirkeyword argument is not given the plots will be saved to a directory named “daily_comp_plots”.Note
If there are less than five days of data in a month the plot for that month will not be created.
- gridwxcomp.plot.monthly_comparison(input_csv, config_path, dataset_name='gridded', out_dir=None, day_limit=10)[source]¶
Compare monthly average weather station data from PyWeatherQAQC with the gridded dataset.
The
monthly_comparisonfunction produces HTML files with time series and scatter plots of station versus gridded climate variables of monthly mean data. It uses the bokeh module to create interactive plots, e.g. they can be zoomed in/out and panned.- Parameters:
input_csv (str) – path to input CSV file containing paired station:gridded metadata. This file is created by running
gridwxcomp.prep_metadatafollowed bygridwxcomp.ee_download.config_path (str) – path to the config file that has the parameters used to interpret the station and gridded data files’
dataset_name (str) – Name of gridded dataset to be used in plots
out_dir (str) – default None. Directory to save comparison plots.
day_limit (int) – default 10. Number of paired days per month that must exist for variable to be plotted.
- Returns:
None
Example
The
monthly_comparisonfunction will generate HTML files with bokeh plots for paired climate variable, e.g. etr_mm, tmax_c>>> from gridwxcomp.plot import monthly_comparison >>> monthly_comparison('merged_input.csv', 'monthly_plots')
Both methods result in monthly HTML bokeh plots being saved to “monthly_plots/” which contains a plot file for each station as found in the input CSV file. If
out_dirkeyword argument is not given the plots will be saved to a directory named “monthly_comp_plots”.Note
If there are less than 2 months of data the plot for that station will not be created.
- gridwxcomp.plot.station_bar_plot(summary_csv, bar_plot_layer, out_dir=None, y_label=None, title=None, subtitle=None, year_subtitle=True)[source]¶
Produce an interactive bar chart comparing multiple climate stations to each other for a particular variable, e.g. bias ratios or interpolated residuals.
This function may also be used for any numerical data in the summary CSV files that are created by
gridwxcomp.interpolatein addition to those created bygridwxcomp.calc_bias_ratios. The main requirement is thatsummary_csvmust contain the column ‘STATION_ID’ and thebar_plot_layerkeyword argument.- Parameters:
summary_csv (str, Path) – path to summary CSV produced by either
gridwxcomp.calc_bias_ratiosor bygridwxcomp.interpolate. Should containbar_plot_layerdata for plot.bar_plot_layer (str) – name of variable to plot.
out_dir (str or None) – default None. Output directory path, default is ‘station_bar_plots’ in parent directory of
summary_csv.y_label (str or None) – default None. Label for y-axis, defaults to
bar_plot_layer.title (str or None) – default None. Title of plot.
subtitle (str, list, or None) – default None. Additional subtitle(s) for plot.
year_subtitle (bool) – default True. If true print subtitle on plot with the max year range used for station data, e.g. ‘years: 1995-2005’
Example
Let’s say we want to compare the mean growing seasion bias ratios of reference evapotranspiration (ETr) for the selection of stations we used to calculate bias ratios.
The summary CSV file containing the ratios should be first created using
gridwxcomp.calc_bias_ratios.>>> from gridwxcomp.plot import station_bar_plot >>> # path to summary CSV with station data >>> in_file = 'monthly_ratios/etr_mm_summary_all_yrs.csv' >>> example_layer = 'grow_mean' >>> station_bar_plot(in_file, example_layer)
The resulting file will be saved using the bar_plot_layer name as a file name:
'monthly_ratios/station_bar_plots/grow_mean.html'The plot file will contain the mean growing season bias ratios of ETr for each station, sorted from smallest to largest values.
- Raises:
FileNotFoundError – if
summary_csvis not found.KeyError – if
bar_plot_layerdoes not exist as a column name insummary_csv.
InterpGdal¶
- class gridwxcomp.InterpGdal(summary_csv_path, config_path)[source]¶
Bases:
objectUsage of gdal command line tools within
gridwxcomp, currently utilizes the GDAL grid command line tool.- Parameters:
summary_csv_path (str) – path to [var]_summary_comp CSV file created by
gridwxcomp.calc_bias_ratios.calc_bias_ratioscontaining point bias ratios and statistics, station coordinates, etc.config_path (str) – path to the input configuration file with control parameters such as spatial coordinate reference system, resolution, and the bounds for spatial interpolation.
- summary_csv_path¶
absolute path object to input
summary_csv_path.- Type:
- config_path¶
absolute path to configuration input file.
- Type:
- layers¶
list of layers in
summary_csv_pathto interpolate e.g. when usingInterpGdal.gdal_gridwithlayer="all"defaults toInterpGdal.default_layers.- Type:
- grid_bounds¶
default None. Extent for interpolation raster in order (min long, max long, min lat, max lat).
- Type:
tuple or None
- interped_rasters¶
empty list that is appended with
pathlib.Pathobjects to interpolated rasters after usingInterpGdal.gdal_grid.- Type:
- params¶
default None. After
InterpGdal.gdal_gridparamsis updated with the last used interpolation parameters in the form of a dictionary with parameter names as keys.- Type:
dict or None
Example
The
InterpGdalclass is useful for experimenting with multiple interpolation algorithms provided by gdal which are optimized and often sensitive to multiple parameters. We can use the object to loop over a range of parameter combinations to test how algorithms perform, we might pick a single layer to test, in this case the growing season bias ratios between station and gridded reference evapotranspiration (etr_mm). Below is a routine to conduct a basic sensitivity analysis of the power and smooth parameters of the inverse distance to a power interpolation method>>> from gridwxcomp.interpgdal import InterpGdal >>> import os >>> # create instance variable from summary csv >>> summary_file = 'PATH/TO/etr_mm_summary_comp.csv' >>> config_file = 'PATH/TO/gridwxcomp_config.ini' >>> # create a InterpGdal instance >>> test = InterpGdal(summary_file) >>> layer = 'grow_mean' # could be a list >>> # run inverse distance interpolation with different params >>> for p in range(1,10): >>> for s in [0, 0.5, 1, 1.5, 2]: >>> # build output directory based on parameter sets >>> out_dir = os.path.join('spatial', 'invdist', >>> 'power={}_smooth={}'.format(p, s)) >>> params = {'power': p, 'smooth': s} >>> test.gdal_grid(out_dir=out_dir, layer=layer, params=params)
Note, we did not assign the interpolation method ‘invdist’ because it is the default. To use another interpolation method we would assign the
interp_methkwarg toInterpGdal.gdal_grid. Similarly, we could experiment with other parameters which all can be found in the class attributeInterpGdal.default_params. The instance variableInterpGdal.paramscan also be used to save metadata on parameters used for each run.- interp_methods = ('average', 'invdist', 'invdistnn', 'linear', 'nearest')¶
gdal_grid interpolation algorithms.
- Type:
interp_methods (tuple)
- default_layers = ('Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec', 'grow', 'annual', 'summer')¶
Layers to interpolate created by
gridwxcomp.calc_bias_ratios.calc_bias_ratiosand thengridwxcomp.spatial.make_points_file, e.g. “Jan” in the shapefile which is “Jan_mean” found insummary_csv_path.- Type:
default_layers (tuple)
- default_params = {'average': {'angle': 0, 'min_points': 0, 'nodata': -999, 'radius1': 0, 'radius2': 0}, 'invdist': {'angle': 0, 'max_points': 0, 'min_points': 0, 'nodata': -999, 'power': 2, 'radius1': 0, 'radius2': 0, 'smoothing': 0}, 'invdistnn': {'max_points': 12, 'min_points': 0, 'nodata': -999, 'power': 2, 'radius': 10, 'smoothing': 0}, 'linear': {'nodata': -999, 'radius': -1}, 'nearest': {'angle': 0, 'nodata': -999, 'radius1': 0, 'radius2': 0}}¶
Dictionary with default parameters for each interpolation algorithm, slightly modified from GDAL defaults. Keys are interpolation method names, keys are dictionaries with parameter names keys and corresponding values.
- Type:
default_params (dict)
- gdal_grid(layer='all', out_dir='', interp_meth='invdist', params=None, scale_factor=1, z_stats=True, res_plot=True, grid_id_name='GRID_ID', options=None)[source]¶
Run gdal_grid command line tool to interpolate point ratios.
For further information on theinterpolation algorithms including their function, parameters, and options see gdal_grid.
- Parameters:
layer (str, list) – default ‘all’. Name of summary file column to interpolate, e.g. ‘Jan_mean’, or list of names. If ‘all’ use all variables in mutable instance attribute “layers”.
out_dir (str) – default ‘’. Output directory to save rasters and zonal stats CSV, always appended to the root dir of the input summary CSV parent path that contains point ratios.
interp_meth (str) – default ‘invdist’. gdal interpolation algorithm
params (dict, str, or None) – default None. Parameters for interpolation algorithm. See examples for format rules.
bounds (tuple) – default None. Tuple of bounding coordinates in the following order (min long, max long, min lat, max lat) which need to be in decimal degrees or meters.
scale_factor (float, int) – default 1. Scaling factor to apply to original grid resolution to create resampling resolution. If scale_factor = 0.1, the interpolation resolution will be one tenth of the grid resolution listed in the config file.
z_stats (bool) – default True. Calculate zonal means of interpolated surface to gridded cells in fishnet and save to a CSV file. The CSV file will be saved to the same directory as the interpolated raster file(s).
res_plot (bool) – default True. Make bar plot for residual (error) between interpolated and station value for
layer.options (str or None) – default None. Extra command line options for gdal_grid spatial interpolation.
- Returns:
None
Examples
The default interpolation algorithm ‘invdist’ or inverse distance weighting to a power to interpolate bias ratios in a summary CSV file that was first created by
gridwxcomp.calc_bias_ratios. The default option will interpolate all layers inInterpGdal.default_layersand calculate zonal statistics for all layers. The interpolation resolution and projection are specified by the user in the configuration file which was used to create thegridwxcomp.InterpGdalobject, however if they are not specified there, the fallback used is Lambert Conformal Conic projected coordinate reference system and 1000 m resolution. This example limits the interpolation to two layers, growing season and annual mean bias ratios,>>> from gridwxcomp.interpgdal import InterpGdal >>> summary_file = 'PATH/TO/[var]_summary_comp.csv' >>> out_dir = 'default_params' >>> layers = ['grow_mean', 'annual_mean'] >>> >>> # create a InterpGdal instance >>> test = InterpGdal(summary_file) >>> # run inverse distance interpolation >>> test.gdal_grid(out_dir=out_dir, layer=layers)
Note, zonal statistics to gridded cells and interpolated residual plots are computed by default. A gridded fishnet must have been previously created using
gridwxcomp.spatial.make_grid.After running the code above the following files will be created in the ‘default_params’ directory which will be built in the same location as the input summary CSV:
default_params/ ├── annual_mean.tiff ├── annual_mean_ESRI_102004.tiff ├── etr_mm_summary_comp_all_yrs.csv ├── etr_mm_summary_pts_wgs84.cpg ├── etr_mm_summary_pts_wgs84.dbf ├── etr_mm_summary_pts_wgs84.prj ├── etr_mm_summary_pts_wgs84.shp ├── etr_mm_summary_pts_wgs84.shx ├── etr_mm_summary_pts_ESRI_102004.cpg ├── etr_mm_summary_pts_ESRI_102004.dbf ├── etr_mm_summary_pts_ESRI_102004.prj ├── etr_mm_summary_pts_ESRI_102004.shp ├── etr_mm_summary_pts_ESRI_102004.shx ├── zonal_stats.csv ├── grow_mean.tiff ├── grow_mean_ESRI_102004.tiff └── residual_plots/ ├── annual_res.html └── grow_res.htmlGeoTiff interpolated raster files are now created for the select layers. The file “zonal_stats.csv” contains grid_id as an index and each layer zonal mean as columns. For example,
grid_id
grow_mean
annual_mean
511747
0.9650671287940088
0.9078723876243023
510361
0.9658465063428492
0.9097255715561022
508975
0.9667075970344162
0.9117676407214926
There are several
InterpGdalinstance attributes that may be useful, for example to see the parameters that were used for the last call toInterpGdal.gdal_grid>>> test.params {'power': 2, 'smoothing': 0, 'radius1': 0, 'radius2': 0, 'angle': 0, 'max_points': 0, 'min_points': 0, 'nodata': -999}
Or to find the paths to the interpolated raster files that have been created by the instance (all), the “interped_rasters” instance attribute is a list of all
pathlib.Pathobjects of absolute paths of raster files. To get them as strings,>>> list(map(str, test.interped_rasters)) ['PATH/TO/grow_mean.tiff', 'PATH/TO/annual_mean.tiff']
Similary, the raster extent that was used and will be used again for any subsequent calls of
InterpGdal.gdal_gridcan be retrieved by>>> test.grid_bounds (-111.74583329966664, -108.74583330033335, 38.21250000003333, 40.462499999966674)
- Raises:
KeyError – if interp_meth is not a valid gdal_grid interpolation algorithm name.
util¶
Utility functions or classes for gridwxcomp package
- gridwxcomp.util.affine_transform(img)[source]¶
Get the affine transform of the image as an EE object
- Parameters:
img – ee.Image object
- Returns
ee.List object
- gridwxcomp.util.parse_yr_filter(dt_df, years, label)[source]¶
Parse string year filter and apply it to datetime-indexed DataFrame.
- Parameters:
dt_df (
pandas.DataFrame) – datetime-indexed DataFrameyears (str or int) – years to select, e.g. 2015 or 2000-2010
label (str) – identifier to print warning message if
yearsfilter partially overlaps with actual date index
- Returns:
- first element is
input DataFrame
dt_dfindexed toyearsfilter, second element is string of year range, e.g. ‘2001_2011’
- Return type:
ret (tuple of (
pandas.DataFrame, str))
Example
>>> df = pd.DataFrame(index=pd.date_range('2000', '2015')) >>> df, yr_str = parse_yr_filter(df, '1998-2002', 'station1') WARNING: data for station1 starts in 2000 but you gave 1998 Years used will only include 2000 to 2002
Now df will only contain indices with dates between 2000 and 2002 and
>>> yr_str '1998_2002'
- Raises:
ValueError – if
yearsis invalid or not found in time series index of DataFrame.
- gridwxcomp.util.validate_file(file_path, expected_extensions)[source]¶
Checks to see if provided path is valid, while also checking to see if file is of expected type. Raises exceptions if either of those fail.
- Parameters:
file_path – string of path to file
expected_extensions – list of strings of expected file types
- Returns:
None
- gridwxcomp.util.read_config(config_file_path)[source]¶
Opens config file at provided path and stores all required values in a python dictionary. This dictionary will be used both to import data and elsewhere in the code to refer to what type of data was passed in
- Parameters:
config_file_path – string of path to config file
- Returns:
a dictionary of all required config file parameters
- Return type:
config_dict
- gridwxcomp.util.read_data(config_dictionary, version, filepath)[source]¶
Uses config_dict parameters to read in the data and rename it to standard parameters
- Parameters:
config_dictionary – dictionary of everything
version – a string that will be either ‘station’ or ‘gridded’
- Returns:
a dataframe containing only the variable we want to plot, with a standardized naming convention
- Return type:
filtered_df
- gridwxcomp.util.convert_units(config_dictionary, version, df)[source]¶
Uses config_dict parameters to check what units provided variables are in and convert them if needed
- Parameters:
config_dictionary – dictionary of everything contained within config file
version – a string that will be either ‘station’ or ‘gridded’
df – pandas dataframe of input data, at this point naming of dataframe columns has been standardized
- Returns:
a dataframe containing data in the correct units
- Return type:
converted_df
- gridwxcomp.util.reproject_crs_for_point(orig_lon, orig_lat, orig_crs, requested_crs)[source]¶
- Uses the pyproj library to reproject point data from one CRS to another
ex. will be used to make input coords wgs84 for earth engine
- Will return original data without any reprojection if orig_crs
and requested_crs are the same
- Parameters:
orig_lon – float of original longitude
orig_lat – float of original latitude
orig_crs – string of EPSG code for orig_lat and orig_lon
requested_crs – string of EPSG code to reproject into
- Returns:
Reprojected latitude and longitude for point
- gridwxcomp.util.reproject_crs_for_bounds(bounds, resolution, orig_crs, requested_crs, requested_decimals)[source]¶
- Uses the pyproj library to reproject dictionary of bounds for
interpolation extent. This is done in more than just two calls (ex. NW and SE corners) as some projections may have curvature
Afterwords it rounds the coordinates to the requested decimals
- If orig_crs and requested_crs are the same it will just round the coords
without reprojecting
- Parameters:
bounds – dictionary of bounds, containing the following keys: xmin, xmax, ymin, ymax
resolution – resolution used for interpolation, coordinates will be rounded in an attempt to snap to grid
orig_crs – string of EPSG code for original bounds
requested_crs – string of EPSG code to reprojected bounds
requested_decimals – int of number of decimals to round coords to
- Returns:
Reprojected bounds into new CRS