hagelslag.evaluation package¶
Submodules¶
hagelslag.evaluation.ContingencyTable module¶
- class hagelslag.evaluation.ContingencyTable.ContingencyTable(a, b, c, d)¶
Bases:
objectInitializes a binary contingency table and generates many skill scores.
- Parameters:
a – true positives
b – false positives
c – false negatives
d – true negatives
- table¶
contingency table
- Type:
numpy.ndarray
- N¶
total number of items in table
- accuracy()¶
Finley’s measure, fraction correct, accuracy (a+d)/N
- bias()¶
Frequency Bias. Formula: (a+b)/(a+c)
- csi()¶
Gilbert’s Score or Threat Score or Critical Success Index a/(a+b+c)
- css()¶
Clayton Skill Score (ad - bc)/((a+b)(c+d))
- dfr()¶
Returns Detection Failure Ratio (DFR). Formula: c/(c+d)
- ets()¶
Equitable Threat Score, Gilbert Skill Score, v, (a - R)/(a + b + c - R), R=(a+b)(a+c)/N
- far()¶
False Alarm Ratio (FAR). Formula: b/(a+b)
- focn()¶
Returns Frequency of Correct Null (FOCN). Formula: d/(c+d)
- foh()¶
Frequency of Hits (FOH) or Success Ratio. Formula: a/(a+b)
- fom()¶
Frequency of Misses (FOM). Formula: c/(a+c).
- hss()¶
Doolittle (Heidke) Skill Score. 2(ad-bc)/((a+b)(b+d) + (a+c)(c+d))
- pod()¶
Probability of Detection (POD) or Hit Rate. Formula: a/(a+c)
- pofd()¶
Probability of False Detection (POFD). b/(b+d)
- pon()¶
Returns Probability of Null (PON). Formula: d/(b+d)
- pss()¶
Peirce (Hansen-Kuipers, True) Skill Score (ad - bc)/((a+c)(b+d))
- update(a, b, c, d)¶
Update contingency table with new values without creating a new object.
hagelslag.evaluation.GridEvaluator module¶
- class hagelslag.evaluation.GridEvaluator.GridEvaluator(run_date, ensemble_name, ensemble_member, model_names, size_thresholds, start_hour, end_hour, window_size, time_skip, forecast_path, mrms_path, mrms_variable, obs_mask=True, mask_variable='RadarQualityIndex_00.00')¶
Bases:
objectAn evaluation system for gridded forecasts.
GridEvaluator loads in a set of machine learning model forecasts from a single model run, loads in corresponding observations, and then generates verification statistics from the matching of forecasts and observations. Forecasts can be aggregated in time with flexible window sizes.
- Parameters:
run_date (datetime.datetime object) – The date of an ensemble run.
ensemble_name (str) – Name of the ensemble. Should be consistent with the name used in the data processing and forecasting.
ensemble_member (str) – Name of the ensemble member being loaded.
model_names (list of strings) – Names of the machine learning models being evaluated.
size_thresholds (list or numpy.ndarray of ints) – Intensity thresholds at which probability forecasts are made.
start_hour (int) – Forecast hour at which evaluation begins.
end_hour (int) – Forecast hour at which evaluation ends, inclusive.
window_size (int) – Number of hours to include within a forecast window.
time_skip (int) – Number of hours to skip between window starts
forecast_path (str) – Path to where gridded forecasts are located.
mrms_path (str) – Path to the MRMS gridded observations.
mrms_variable (str) – Name of the variable being used for verification.
obs_mask (bool, optional (default=True)) – Whether or not a masking grid is used to determine which grid points are evaluated
mask_variable (str, optional (default="RadarQualityIndex_00.00")) – Name of the MRMS variable used for masking.
- dilate_obs(dilation_radius)¶
Use a dilation filter to grow positive observation areas by a specified number of grid points
- Parameters:
dilation_radius – Number of times to dilate the grid.
- Returns:
- get_window_forecasts()¶
Aggregate the forecasts within the specified time windows.
- load_forecasts()¶
Load the forecast files into memory.
- load_obs(mask_threshold=0.5)¶
Loads observations and masking grid (if needed).
- Parameters:
mask_threshold – Values greater than the threshold are kept, others are masked.
- Returns:
- reliability_curves(prob_thresholds)¶
Output reliability curves for each machine learning model, size threshold, and time window.
- Parameters:
prob_thresholds –
dilation_radius –
- Returns:
- roc_curves(prob_thresholds)¶
Generate ROC Curve objects for each machine learning model, size threshold, and time window.
- Parameters:
prob_thresholds – Probability thresholds for the ROC Curve
dilation_radius – Number of times to dilate the observation grid.
- Returns:
a dictionary of DistributedROC objects.
hagelslag.evaluation.MetricPlotter module¶
- hagelslag.evaluation.MetricPlotter.attributes_diagram(rel_objs, obj_labels, colors, markers, filename, figsize=(8, 8), xlabel='Forecast Probability', ylabel='Observed Relative Frequency', ticks=<Mock name='mock()' id='140188338225984'>, dpi=300, title='Attributes Diagram', legend_params=None, inset_params=None, inset_position=(0.12, 0.72, 0.25, 0.25), bootstrap_sets=None, ci=(2.5, 97.5))¶
Plot reliability curves against a 1:1 diagonal to determine if probability forecasts are consistent with their observed relative frequency. Also adds gray areas to show where the climatological probabilities lie and what areas result in a positive Brier Skill Score.
- Parameters:
rel_objs (list) – List of DistributedReliability objects.
obj_labels (list) – List of labels describing the forecast model associated with each curve.
colors (list) – List of colors for each line
markers (list) – List of line markers
filename (str) – Where to save the figure.
figsize (tuple) – (Width, height) of the figure in inches.
xlabel (str) – X-axis label
ylabel (str) – Y-axis label
ticks (numpy.ndarray) – Tick value labels for the x and y axes.
dpi (int) – resolution of the saved figure in dots per inch.
title (str) – Title of figure
legend_params (dict) – Keyword arguments for the plot legend.
inset_params (dict) – Keyword arguments for the inset axis.
inset_position (tuple) – Position of the inset axis in normalized axes coordinates (left, bottom, width, height)
bootstrap_sets (list) – A list of arrays of bootstrapped DistributedROC objects. If not None, confidence regions will be plotted.
ci (tuple) – tuple of bootstrap confidence interval percentiles
- hagelslag.evaluation.MetricPlotter.performance_diagram(roc_objs, obj_labels, colors, markers, filename, figsize=(8, 8), xlabel='Success Ratio (1-FAR)', ylabel='Probability of Detection', ticks=<Mock name='mock()' id='140188338225600'>, dpi=300, csi_cmap='Blues', csi_label='Critical Success Index', title='Performance Diagram', legend_params=None, bootstrap_sets=None, ci=(2.5, 97.5), label_fontsize=14, title_fontsize=16, tick_fontsize=12)¶
Draws a performance diagram from a set of DistributedROC objects.
A performance diagram is a variation on the ROC curve in which the Probability of False Detection on the x-axis has been replaced with the Success Ratio (1-False Alarm Ratio or Precision). The diagram also shows the Critical Success Index (CSI or Threat Score) as a series of curved contours, and the frequency bias as angled diagonal lines. Points along the 1:1 diagonal are unbiased, and better performing models should appear in the upper right corner. The performance diagram is particularly useful for displaying verification for severe weather warnings as it displays all three commonly used statistics (POD, FAR, and CSI) simultaneously on the same chart.
- Parameters:
roc_objs (list) – DistributedROC objects being plotted.
obj_labels (list) – list or array of labels describing each DistributedROC object.
obj_labels – Label describing the forecast associated with a DistributedROC object.
colors (list) – List of matplotlib-readable colors (names or hex-values) for each curve.
markers (list) – Matplotlib marker (e.g. *, o, v, etc.) for each curve.
filename (str) – Name of figure file being saved.
figsize (tuple) – (Width, height) of the figure in inches.
xlabel (str) – Label for the x-axis.
ylabel (str) – Label for the y-axis.
title (str) – The title of the figure.
ticks (numpy.ndarray) – Values shown on the x and y axes.
dpi (int) – Figure resolution in dots per inch.
csi_cmap (str) – Matplotlib colormap used to fill CSI contours.
csi_label (str) – Label for CSI colormap.
legend_params (None or dict) – Keyword arguments for the formatting of the figure legend.
bootstrap_sets (list) – A list of arrays of bootstrapped DistributedROC objects. If not None, confidence regions will be plotted.
ci (tuple) – tuple of bootstrap confidence interval percentiles.
label_fontsize (int) – Font size of the x and y axis labels.
title_fontsize (int) – Font size of the title.
tick_fontsize (int) – Font size of the x and y tick labels.
Examples
>>> from hagelslag.evaluation.ProbabilityMetrics import DistributedROC >>> import numpy as np >>> forecasts = np.random.random(1000) >>> obs = np.random.random_integers(0, 1, 1000) >>> roc = DistributedROC() >>> roc.update(forecasts, obs) >>> performance_diagram([roc], ["Random"], ["orange"], ["o"], "random_performance.png")
- hagelslag.evaluation.MetricPlotter.reliability_diagram(rel_objs, obj_labels, colors, markers, filename, figsize=(8, 8), xlabel='Forecast Probability', ylabel='Observed Relative Frequency', ticks=<Mock name='mock()' id='140188338225792'>, dpi=300, inset_size=1.5, title='Reliability Diagram', legend_params=None, bootstrap_sets=None, ci=(2.5, 97.5))¶
Plot reliability curves against a 1:1 diagonal to determine if probability forecasts are consistent with their observed relative frequency.
- Parameters:
rel_objs (list) – List of DistributedReliability objects.
obj_labels (list) – List of labels describing the forecast model associated with each curve.
colors (list) – List of colors for each line
markers (list) – List of line markers
filename (str) – Where to save the figure.
figsize (tuple) – (Width, height) of the figure in inches.
xlabel (str) – X-axis label
ylabel (str) – Y-axis label
ticks (array) – Tick value labels for the x and y axes.
dpi (int) – resolution of the saved figure in dots per inch.
inset_size (float) – Size of inset
title (str) – Title of figure
legend_params (dict) – Keyword arguments for the plot legend.
bootstrap_sets (list) – A list of arrays of bootstrapped DistributedROC objects. If not None, confidence regions will be plotted.
ci (tuple) – tuple of bootstrap confidence interval percentiles
- hagelslag.evaluation.MetricPlotter.roc_curve(roc_objs, obj_labels, colors, markers, filename, figsize=(8, 8), xlabel='Probability of False Detection', ylabel='Probability of Detection', title='ROC Curve', ticks=<Mock name='mock()' id='140188338225504'>, dpi=300, legend_params=None, bootstrap_sets=None, ci=(2.5, 97.5), label_fontsize=14, title_fontsize=16, tick_fontsize=12)¶
Plots a set receiver/relative operating characteristic (ROC) curves from DistributedROC objects.
The ROC curve shows how well a forecast discriminates between two outcomes over a series of thresholds. It features Probability of Detection (True Positive Rate) on the y-axis and Probability of False Detection (False Alarm Rate) on the x-axis. This plotting function allows one to customize the colors and markers of the ROC curves as well as the parameters of the legend and the title.
- Parameters:
roc_objs (list) – DistributedROC objects being plotted.
obj_labels (list) – Label describing the forecast associated with a DistributedROC object.
colors (list) – List of matplotlib-readable colors (names or hex-values) for each curve.
markers (list) – Matplotlib marker (e.g. *, o, v, etc.) for each curve.
filename (str) – Name of figure file being saved.
figsize (tuple) – (Width, height) of the figure in inches.
xlabel (str) – Label for the x-axis.
ylabel (str) – Label for the y-axis.
title (str) – The title of the figure.
ticks (numpy.ndarray) – Values shown on the x and y axes.
dpi (int) – Figure resolution in dots per inch.
legend_params (None, dict) – Keyword arguments for the formatting of the figure legend.
bootstrap_sets (list) – List of lists of DistributedROC objects that were bootstrap resampled for each model.
ci (tuple of 2 floats) – Quantiles of the edges of the bootstrap confidence intervals ranging from 0 to 100.
label_fontsize (int) – Font size of the x and y axis labels.
title_fontsize (int) – Font size of the title.
tick_fontsize (int) – Font size of the x and y tick labels.
Examples
>>> from hagelslag.evaluation.ProbabilityMetrics import DistributedROC >>> import numpy as np >>> forecasts = np.random.random(1000) >>> obs = np.random.random_integers(0, 1, 1000) >>> roc = DistributedROC() >>> roc.update(forecasts, obs) >>> roc_curve([roc], ["Random"], ["orange"], ["o"], "random_roc.png")
hagelslag.evaluation.MulticlassContingencyTable module¶
- class hagelslag.evaluation.MulticlassContingencyTable.MulticlassContingencyTable(table=None, n_classes=2, class_names=('1', '0'))¶
Bases:
objectThis class is a container for a contingency table containing more than 2 classes. The contingency table is stored in table as a numpy array with the rows corresponding to forecast categories, and the columns corresponding to observation categories.
- gerrity_score()¶
Gerrity Score, which weights each cell in the contingency table by its observed relative frequency. :return:
- heidke_skill_score()¶
- peirce_skill_score()¶
Multiclass Peirce Skill Score (also Hanssen and Kuipers score, True Skill Score)
- hagelslag.evaluation.MulticlassContingencyTable.main()¶
hagelslag.evaluation.NeighborEvaluator module¶
- class hagelslag.evaluation.NeighborEvaluator.NeighborEvaluator(run_date, start_hour, end_hour, ensemble_name, model_name, forecast_variable, mrms_variable, neighbor_radii, smoothing_radii, obs_thresholds, size_thresholds, probability_levels, obs_mask, mask_variable, forecast_path, mrms_path, coordinate_file=None, lon_bounds=None, lat_bounds=None)¶
Bases:
objectA framework for statistically evaluating neighborhood probability forecasts.
- run_date¶
Date of the beginning of the model run
- Type:
datetime.datetime object
- start_hour¶
First forecast hour evaluated
- Type:
int
- end_hour¶
Last forecast hour evaluated
- Type:
int
- ensemble_name¶
Name of the ensemble system being evaluated
- Type:
str
- model_name¶
Name of the physical or machine learning model being evaluated
- Type:
str
- forecast_variable¶
Name of the forecast variable being evaluated.
- Type:
str
- mrms_variable¶
Name of the NSSL MRMS product being used for gridded observations
- Type:
str
- neighbor_radii¶
neighborhood radii in number of grid points
- Type:
list or array
- smoothing_radii¶
radius of Gaussian filter used by the forecast
- Type:
list or array
- obs_thresholds¶
Observed intensity threshold that corresponds with each element of size_thresholds
- Type:
list or array
- size_thresholds¶
Intensity threshold for neighborhood probabilities
- Type:
list or array
- obs_mask¶
Whether or not another MRMS product is used to mask invalid grid points
- Type:
bool
- mask_variable¶
MRMS variable used for masking invalid grid points
- Type:
str
- forecast_path¶
Path to forecast files
- Type:
str
- mrms_path¶
Path to MRMS data
- Type:
str
- evaluate_hourly_forecasts()¶
Calculates ROC curves and Reliability scores for each forecast hour.
- Returns:
A pandas DataFrame containing forecast metadata as well as DistributedROC and Reliability objects.
- evaluate_period_forecasts()¶
Evaluates ROC and Reliability scores for forecasts over the full period from start hour to end hour
- Returns:
A pandas DataFrame with full-period metadata and verification statistics
- load_coordinates()¶
Loads lat-lon coordinates from a netCDF file.
- load_forecasts()¶
Load neighborhood probability forecasts.
- load_obs(mask_threshold=0.5)¶
Loads observations and masking grid (if needed).
- Parameters:
mask_threshold – Values greater than the threshold are kept, others are masked.
hagelslag.evaluation.ObjectEvaluator module¶
- class hagelslag.evaluation.ObjectEvaluator.ObjectEvaluator(run_date, ensemble_name, ensemble_member, model_names, model_types, forecast_bins, dist_thresholds, forecast_json_path, track_data_csv_path)¶
Bases:
objectObjectEvaluator performs a statistical evaluation of object-based severe weather forecasts.
ObjectEvaluator loads forecast and observation files for a particular ensemble member and model run and then matches the forecasts with their assigned observations. Verification statistics can be calculated on the full dataset or on subsets selected based on filter queries.
- run_date¶
The date marking the start of the model run.
- Type:
datetime.datetime
- ensemble_name¶
The name of the ensemble or NWP model being used.
- Type:
str
- ensemble_member¶
The name of the ensemble member being evaluated.
- Type:
str
- model_names¶
The names of the machine learning models being evaluated
- Type:
list
- model_types¶
The types of machine learning models being evaluated.
- Type:
list
- forecast_bins¶
For machine learning models forecasting a discrete pdf, this specifies the bin labels used.
- Type:
dict of str and numpy.ndarray pairs
- dist_thresholds¶
Thresholds used to discretize probability distribution forecasts.
- Type:
array
- forecast_json_path¶
Full path to the directory containing all json files with the forecast values.
- Type:
str
- track_data_csv_path¶
Full path to the directory containing the csv data files used for training.
- Type:
str
- metadata_columns¶
Columns pulled from track data csv files.
- Type:
list
- type_cols¶
Map between forecast type used in json files and observation column in csv files
- Type:
dict
- forecasts¶
Dictionary of DataFrames containing forecast information from csv files
- Type:
dict
- matched_forecasts¶
Forecasts merged with observation information.
- Type:
dict
- crps(model_type, model_name, condition_model_name, condition_threshold, query=None)¶
Calculates the cumulative ranked probability score (CRPS) on the forecast data.
- Parameters:
model_type – model type being evaluated.
model_name – machine learning model being evaluated.
condition_model_name – Name of the hail/no-hail model being evaluated
condition_threshold – Threshold for using hail size CDF
query – pandas query string to filter the forecasts based on the metadata
- Returns:
a DistributedCRPS object
- load_forecasts()¶
Loads the forecast files and gathers the forecast information into pandas DataFrames.
- load_obs()¶
Loads the track total and step files and merges the information into a single data frame.
- max_hail_sample_crps(forecast_max_hail, obs_max_hail)¶
- merge_obs()¶
Match forecasts and observations.
- reliability(model_type, model_name, intensity_threshold, prob_thresholds, query=None)¶
Calculate reliability statistics based on the probability of exceeding a specified threshold.
- Parameters:
model_type – type of model being evaluated.
model_name – Name of the machine learning model being evaluated.
intensity_threshold – forecast bin used as the split point for evaluation.
prob_thresholds – Array of probability thresholds being evaluated.
query – str to filter forecasts based on values of forecasts, obs, and metadata.
- Returns:
A DistributedReliability object.
- roc(model_type, model_name, intensity_threshold, prob_thresholds, query=None)¶
Calculates a ROC curve at a specified intensity threshold.
- Parameters:
model_type – type of model being evaluated (e.g. size).
model_name – machine learning model being evaluated
intensity_threshold – forecast bin used as the split point for evaluation
prob_thresholds – Array of probability thresholds being evaluated.
query – str to filter forecasts based on values of forecasts, obs, and metadata.
- Returns:
A DistributedROC object
- sample_forecast_max_hail(dist_model_name, condition_model_name, num_samples, condition_threshold=0.5, query=None)¶
Samples every forecast hail object and returns an empirical distribution of possible maximum hail sizes.
Hail sizes are sampled from each predicted gamma distribution. The total number of samples equals num_samples * area of the hail object. To get the maximum hail size for each realization, the maximum value within each area sample is used.
- Parameters:
dist_model_name – Name of the distribution machine learning model being evaluated
condition_model_name – Name of the hail/no-hail model being evaluated
num_samples – Number of maximum hail samples to draw
condition_threshold – Threshold for drawing hail samples
query – A str that selects a subset of the data for evaluation
- Returns:
A numpy array containing maximum hail samples for each forecast object.
- sample_obs_max_hail(dist_model_name, num_samples, query=None)¶
- hagelslag.evaluation.ObjectEvaluator.gamma_sf(x, a, loc, b)¶
hagelslag.evaluation.ProbabilityMetrics module¶
- class hagelslag.evaluation.ProbabilityMetrics.DistributedCRPS(thresholds=<Mock name='mock()' id='140188350477408'>, input_str=None)¶
Bases:
objectA container for the data used to calculate the Continuous Ranked Probability Score.
- thresholds¶
Array of the intensity threshold bins
- Type:
numpy.ndarray
- input_str¶
String containing the information for initializing the object
- Type:
str
- crps()¶
Calculates the continuous ranked probability score.
- crps_climo()¶
Calculate the climatological CRPS.
- crpss()¶
Calculate the continous ranked probability skill score from existing data.
- from_str(in_str)¶
- merge(other_crps)¶
- update(forecasts, observations)¶
Update the statistics with forecasts and observations.
- Parameters:
forecasts – The discrete Cumulative Distribution Functions of
observations –
- class hagelslag.evaluation.ProbabilityMetrics.DistributedROC(thresholds=<Mock name='mock()' id='140188350477552'>, obs_threshold=1.0, input_str=None)¶
Bases:
objectROC sparse representation that can be aggregated and can generate ROC curves and performance diagrams.
A DistributedROC object is given a specified set of thresholds (could be probability or real-valued) and then stores a pandas DataFrame of contingency tables for each threshold. The contingency tables are updated with a set of forecasts and observations, but the original forecast and observation values are not kept. DistributedROC objects can be combined by adding them together or by storing them in an iterable and summing the contents of the iterable together. This is especially useful when verifying large numbers of cases in parallel.
- thresholds¶
List of probability thresholds in increasing order.
- Type:
numpy.ndarray
- obs_threshold¶
Observation values >= obs_threshold are positive events.
- Type:
float
- contingency_tables¶
Stores contingency table counts for each probability threshold
- Type:
pandas.DataFrame
Examples
>>> import numpy as np >>> forecasts = np.random.random(size=1000) >>> obs = np.random.random_integers(0, 1, size=1000) >>> roc = DistributedROC(thresholds=np.arange(0, 1.1, 0.1), obs_threshold=1) >>> roc.update(forecasts, obs) >>> print(roc.auc())
- auc()¶
Calculate the Area Under the ROC Curve (AUC).
- clear()¶
- from_str(in_str)¶
Read the DistributedROC string and parse the contingency table values from it.
- Parameters:
in_str (str) – The string output from the __str__ method
- get_contingency_tables()¶
Create an Array of ContingencyTable objects for each probability threshold.
- Returns:
Array of ContingencyTable objects
- max_csi()¶
Calculate the maximum Critical Success Index across all probability thresholds
- Returns:
The maximum CSI as a float
- max_threshold_score(score='ets')¶
- merge(other_roc)¶
Ingest the values of another DistributedROC object into this one and update the statistics inplace.
- Parameters:
other_roc – another DistributedROC object.
- performance_curve()¶
Calculate the Probability of Detection and False Alarm Ratio in order to output a performance diagram.
- Returns:
pandas.DataFrame containing POD, FAR, and probability thresholds.
- roc_curve()¶
Generate a ROC curve from the contingency table by calculating the probability of detection (TP/(TP+FN)) and the probability of false detection (FP/(FP+TN)).
- Returns:
A pandas.DataFrame containing the POD, POFD, and the corresponding probability thresholds.
- update(forecasts, observations)¶
Update the ROC curve with a set of forecasts and observations
- Parameters:
forecasts – 1D array of forecast values
observations – 1D array of observation values.
- class hagelslag.evaluation.ProbabilityMetrics.DistributedReliability(thresholds=<Mock name='mock()' id='140188350472608'>, obs_threshold=1.0, input_str=None)¶
Bases:
objectA container for the statistics required to generate reliability diagrams and calculate the Brier Score.
DistributedReliabilty objects accept binary probabilistic forecasts and associated observations. The forecasts are then discretized into the different probability bins. The total frequency and the frequency of positive events for each probability bin are tracked. The Brier Score, Brier Skill Score, and Brier score components can all be derived from this information. Like the DistributedROC object, DistributedReliability objects can be summed together, and their contents can be output as a string.
- thresholds¶
Array of probability thresholds
- Type:
numpy.ndarray
- obs_threshold¶
Split value (>=) for determining positive observation events
- Type:
float
- frequencies¶
Stores the total and positive frequencies for each bin
- Type:
pandas.DataFrame
Examples
>>> forecasts = np.random.random(1000) >>> obs = np.random.random_integers(0, 1, 1000) >>> rel = DistributedReliability() >>> rel.update(forecasts, obs) >>> print(rel.brier_score())
- brier_score()¶
Calculate the Brier Score
- brier_score_components()¶
Calculate the components of the Brier score decomposition: reliability, resolution, and uncertainty.
- brier_skill_score()¶
Calculate the Brier Skill Score
- clear()¶
- climatology()¶
Calculates the sample climatological relative frequency of the event being forecast.
- from_str(in_str)¶
Updates the object attributes with the information contained in the input string
- Parameters:
in_str (str) – String output by the __str__ method containing all of the attribute values
- merge(other_rel)¶
Ingest another DistributedReliability and add its contents to the current object.
- Parameters:
other_rel – a Distributed reliability object.
- reliability_curve()¶
Calculates the reliability diagram statistics. The key columns are Bin_Start and Positive_Relative_Freq
- Returns:
pandas.DataFrame
- update(forecasts, observations)¶
Update the statistics with a set of forecasts and observations.
- Parameters:
forecasts (numpy.ndarray) – Array of forecast probability values
observations (numpy.ndarray) – Array of observation values
- class hagelslag.evaluation.ProbabilityMetrics.ROC(forecasts, observations, thresholds, obs_threshold)¶
Bases:
object- auc()¶
- calc_roc()¶
- class hagelslag.evaluation.ProbabilityMetrics.Reliability(forecasts, observations, thresholds, obs_threshold)¶
Bases:
object- brier_score()¶
- brier_score_components()¶
- brier_skill_score()¶
- calc_reliability_curve()¶
- hagelslag.evaluation.ProbabilityMetrics.bootstrap(score_objs, n_boot=1000)¶
Given a set of DistributedROC or DistributedReliability objects, this function performs a bootstrap resampling of the objects and returns n_boot aggregations of them.
- Parameters:
score_objs – A list of DistributedROC or DistributedReliability objects. Objects must have an __add__ method
n_boot (int) – Number of bootstrap samples
- Returns:
An array of DistributedROC or DistributedReliability