hagelslag.processing package¶

Submodules¶

hagelslag.processing.EnhancedWatershedSegmenter module¶

@author: David John Gagne (djgagne@ou.edu)

class hagelslag.processing.EnhancedWatershedSegmenter.EnhancedWatershed(min_intensity, data_increment, max_intensity, size_threshold_pixels, delta)¶

Bases: object

The enhanced watershed performs image segmentation using a modified version of the traditional watershed technique. It includes a size criteria and creates foothills around each object to keep them distinct. The object is used to store the quantization and size parameters. It can be used to watershed multiple grids.

min_intensity¶

minimum pixel value for pixel to be part of a region

Type:	int

data_increment¶

quantization interval. Use 1 if you don’t want to quantize

Type:	int

max_intensity¶

values greater than maxThresh are treated as the maximum threshold

Type:	int

size_threshold_pixels¶

clusters smaller than this threshold are ignored.

Type:	int

delta¶

maximum number of data increments the cluster is allowed to range over. Larger d results in clusters over larger scales.

Type:	int

find_local_maxima(pixels, q_data)¶

Finds the local maxima in the inputGrid and perform region growing to identify objects.

Parameters:	pixels – dictionary of quantized pixel values q_data – 2D array representation of quantized input data
Returns:	array with labeled objects.

grow_centers(centers, q_data)¶

Once

Parameters:	centers – q_data –

Returns:

static is_closest(point, center, centers, bin_num)¶

static is_valid(point, shape)¶

label(input_grid, only_objects=True)¶

Labels input grid using enhanced watershed algorithm.

Parameters:	input_grid (numpy.ndarray) – Grid to be labeled. only_objects (bool) – Only return object pixel values on final grid
Returns:	Array of labeled pixels

quantize(input_grid)¶

Quantize a grid into discrete steps based on input parameters.

Parameters:	input_grid – 2-d array of values
Returns:	Dictionary of value pointing to pixel locations, and quantized 2-d array of data

remove_foothills(q_data, marked, bin_num, bin_lower, centers, foothills)¶

Mark points determined to be foothills as globbed, so that they are not included in future searches. Also searches neighboring points to foothill points to determine if they should also be considered foothills.

Parameters:	q_data – Quantized data marked – Marked bin_num – Current bin being searched bin_lower – Next bin being searched centers – dictionary of local maxima considered to be object centers foothills – List of foothill points being removed.

set_maximum(q_data, marked, center, bin_lower, foothills, capture_index)¶

Grow a region at a certain bin level and check if the region has reached the maximum size.

Parameters:	q_data – Quantized data array marked – Array marking points that are objects center – Coordinates of the center pixel of the region being grown bin_lower – Intensity level of lower bin being evaluated foothills – List of points that are associated with a center but fall outside the the size or intensity criteria capture_index –
Returns:	True if the object is finished growing and False if the object should be grown again at the next threshold level.

static size_filter(labeled_grid, min_size)¶

Removes labeled objects that are smaller than min_size, and relabels the remaining objects.

Parameters:	labeled_grid – Grid that has been labeled min_size – Minimium object size.
Returns:	Labeled array with re-numbered objects to account for those that have been removed

hagelslag.processing.EnhancedWatershedSegmenter.rescale_data(data, data_min, data_max, out_min=0.0, out_max=100.0)¶

Rescale your input data so that is ranges over integer values, which will perform better in the watershed.

Parameters:	data – 2D or 3D ndarray being rescaled data_min – minimum value of input data for scaling purposes data_max – maximum value of input data for scaling purposes out_min – minimum value of scaled data out_max – maximum value of scaled data
Returns:	Linearly scaled ndarray

hagelslag.processing.EnsembleProducts module¶

hagelslag.processing.Hysteresis module¶

class hagelslag.processing.Hysteresis.Hysteresis(min_intensity, max_intensity)¶

Bases: object

Object segmentation method that identifies objects as contiguous areas with all pixels above a low threshold and contain at least one pixel above a high threshold.

min_intensity¶: lower threshold value

max_intensity¶: higher threshold value

label(input_grid)¶

Label input grid with hysteresis method.

Parameters:	input_grid – 2D array of values.
Returns:	Labeled output grid.

static size_filter(labeled_grid, min_size)¶

Remove labeled objects that do not meet size threshold criteria.

Parameters:	labeled_grid – 2D output from label method. min_size – minimum size of object in pixels.
Returns:	labeled grid with smaller objects removed.

hagelslag.processing.ObjectMatcher module¶

class hagelslag.processing.ObjectMatcher.ObjectMatcher(cost_function_components, weights, max_values)¶

Bases: object

ObjectMatcher calculates distances between two sets of objects and determines the optimal object assignments based on the Hungarian object matching algorithm. ObjectMatcher supports the use of the weighted average of multiple cost functions to determine the distance between objects. Upper limits to each distance component are used to exclude the matching of objects that are too far apart.

cost_function_components¶: List of distance functions for matching

weights¶: List of weights for each distance function

max_values¶: List of the maximum allowable distance for each distance function component.

cost_matrix(set_a, set_b, time_a, time_b)¶

Calculates the costs (distances) between the items in set a and set b at the specified times.

Parameters:	set_a – List of STObjects set_b – List of STObjects time_a – time at which objects in set_a are evaluated time_b – time at whcih object in set_b are evaluated
Returns:	A numpy array with shape [len(set_a), len(set_b)] containing the cost matrix between the items in set a and the items in set b.

match_objects(set_a, set_b, time_a, time_b)¶

Match two sets of objects at particular times.

Parameters:	set_a – list of STObjects set_b – list of STObjects time_a – time at which set_a is being evaluated for matching time_b – time at which set_b is being evaluated for matching
Returns:	List of tuples containing (set_a index, set_b index) for each match

total_cost_function(item_a, item_b, time_a, time_b)¶

Calculate total cost function between two items.

Parameters:	item_a – STObject item_b – STObject time_a – Timestep in item_a at which cost function is evaluated time_b – Timestep in item_b at which cost function is evaluated
Returns:	The total weighted distance between item_a and item_b

class hagelslag.processing.ObjectMatcher.TrackMatcher(cost_function_components, weights, max_values)¶

Bases: object

Find the optimal pairings among two sets of STObject tracks.

cost_function_components¶: Array of cost function objects

weights¶: Array of weights for each cost function. All should sum to 1.

max_values¶: Array of distance values that correspond to the upper limit distance that should be considered.

match_tracks(set_a, set_b, closest_matches=False)¶

Find the optimal set of matching assignments between set a and set b. This function supports optimal 1:1 matching using the Munkres method and matching from every object in set a to the closest object in set b. In this situation set b accepts multiple matches from set a.

Parameters:	set_a – set_b – closest_matches –

Returns:

neighbor_matches(set_a, set_b)¶

raw_cost_matrix(set_a, set_b)¶

track_cost_function(item_a, item_b)¶

track_cost_matrix(set_a, set_b)¶

class hagelslag.processing.ObjectMatcher.TrackStepMatcher(cost_function_components, max_values)¶

Bases: object

Determine if each step in a track is in close proximity to steps from another set of tracks

cost(track_a, time_a, track_b, time_b)¶

cost_matrix(set_a, set_b)¶

match(set_a, set_b)¶

For each step in each track from set_a, identify all steps in all tracks from set_b that meet all cost function criteria

Parameters:	set_a – List of STObjects set_b – List of STObjects
Returns:	pandas.DataFrame
Return type:	track_pairings

hagelslag.processing.ObjectMatcher.area_difference(item_a, time_a, item_b, time_b, max_value)¶

RMS Difference in object areas.

Parameters:	item_a – STObject from the first set in ObjectMatcher time_a – Time integer being evaluated item_b – STObject from the second set in ObjectMatcher time_b – Time integer being evaluated max_value – Maximum distance value used as scaling value and upper constraint.
Returns:	Distance value between 0 and 1.

hagelslag.processing.ObjectMatcher.centroid_distance(item_a, time_a, item_b, time_b, max_value)¶

Euclidean distance between the centroids of item_a and item_b.

Parameters:	item_a – STObject from the first set in ObjectMatcher time_a – Time integer being evaluated item_b – STObject from the second set in ObjectMatcher time_b – Time integer being evaluated max_value – Maximum distance value used as scaling value and upper constraint.
Returns:	Distance value between 0 and 1.

hagelslag.processing.ObjectMatcher.closest_distance(item_a, time_a, item_b, time_b, max_value)¶

Euclidean distance between the pixels in item_a and item_b closest to each other.

Parameters:	item_a – STObject from the first set in ObjectMatcher time_a – Time integer being evaluated item_b – STObject from the second set in ObjectMatcher time_b – Time integer being evaluated max_value – Maximum distance value used as scaling value and upper constraint.
Returns:	Distance value between 0 and 1.

hagelslag.processing.ObjectMatcher.duration_distance(item_a, item_b, max_value)¶

Absolute difference in the duration of two items

Parameters:	item_a – STObject from the first set in TrackMatcher item_b – STObject from the second set in TrackMatcher max_value – Maximum distance value used as scaling value and upper constraint.
Returns:	Distance value between 0 and 1.

hagelslag.processing.ObjectMatcher.ellipse_distance(item_a, time_a, item_b, time_b, max_value)¶

Calculate differences in the properties of ellipses fitted to each object.

Parameters:	item_a – STObject from the first set in ObjectMatcher time_a – Time integer being evaluated item_b – STObject from the second set in ObjectMatcher time_b – Time integer being evaluated max_value – Maximum distance value used as scaling value and upper constraint.
Returns:	Distance value between 0 and 1.

hagelslag.processing.ObjectMatcher.max_intensity(item_a, time_a, item_b, time_b, max_value)¶

RMS difference in maximum intensity

Parameters:	item_a – STObject from the first set in ObjectMatcher time_a – Time integer being evaluated item_b – STObject from the second set in ObjectMatcher time_b – Time integer being evaluated max_value – Maximum distance value used as scaling value and upper constraint.
Returns:	Distance value between 0 and 1.

hagelslag.processing.ObjectMatcher.mean_area_distance(item_a, item_b, max_value)¶

Absolute difference in the means of the areas of each track over time.

Parameters:	item_a – STObject from the first set in TrackMatcher item_b – STObject from the second set in TrackMatcher max_value – Maximum distance value used as scaling value and upper constraint.
Returns:	Distance value between 0 and 1.

hagelslag.processing.ObjectMatcher.mean_min_time_distance(item_a, item_b, max_value)¶

Calculate the mean time difference among the time steps in each object.

Parameters:	item_a – STObject from the first set in TrackMatcher item_b – STObject from the second set in TrackMatcher max_value – Maximum distance value used as scaling value and upper constraint.
Returns:	Distance value between 0 and 1.

hagelslag.processing.ObjectMatcher.mean_minimum_centroid_distance(item_a, item_b, max_value)¶

RMS difference in the minimum distances from the centroids of one track to the centroids of another track

Parameters:	item_a – STObject from the first set in TrackMatcher item_b – STObject from the second set in TrackMatcher max_value – Maximum distance value used as scaling value and upper constraint.
Returns:	Distance value between 0 and 1.

hagelslag.processing.ObjectMatcher.nonoverlap(item_a, time_a, item_b, time_b, max_value)¶

Percentage of pixels in each object that do not overlap with the other object

Parameters:	item_a – STObject from the first set in ObjectMatcher time_a – Time integer being evaluated item_b – STObject from the second set in ObjectMatcher time_b – Time integer being evaluated max_value – Maximum distance value used as scaling value and upper constraint.
Returns:	Distance value between 0 and 1.

hagelslag.processing.ObjectMatcher.shifted_centroid_distance(item_a, time_a, item_b, time_b, max_value)¶

Centroid distance with motion corrections.

Parameters:	item_a – STObject from the first set in ObjectMatcher time_a – Time integer being evaluated item_b – STObject from the second set in ObjectMatcher time_b – Time integer being evaluated max_value – Maximum distance value used as scaling value and upper constraint.
Returns:	Distance value between 0 and 1.

hagelslag.processing.ObjectMatcher.start_centroid_distance(item_a, item_b, max_value)¶

Distance between the centroids of the first step in each object.

Parameters:	item_a – STObject from the first set in TrackMatcher item_b – STObject from the second set in TrackMatcher max_value – Maximum distance value used as scaling value and upper constraint.
Returns:	Distance value between 0 and 1.

hagelslag.processing.ObjectMatcher.start_time_distance(item_a, item_b, max_value)¶

Absolute difference between the starting times of each item.

Parameters:	item_a – STObject from the first set in TrackMatcher item_b – STObject from the second set in TrackMatcher max_value – Maximum distance value used as scaling value and upper constraint.
Returns:	Distance value between 0 and 1.

hagelslag.processing.ObjectMatcher.time_distance(item_a, time_a, item_b, time_b, max_value)¶

hagelslag.processing.STObject module¶

class hagelslag.processing.STObject.STObject(grid, mask, x, y, i, j, start_time, end_time, step=1, dx=3000, u=None, v=None)¶

Bases: object

The STObject stores data and location information for objects extracted from the ensemble grids.

grid¶

All of the data values. Supports a 2D array of values, a list of 2D arrays, or a 3D array.

Type:	ndarray

mask¶

Grid of 1’s and 0’s in which 1’s indicate the location of the object.

Type:	ndarray

x¶

Array of x-coordinate values in meters. Longitudes can also be placed here.

Type:	ndarray

y¶

Array of y-coordinate values in meters. Latitudes can also be placed here.

Type:	ndarray

i¶

Array of row indices from the full model domain.

Type:	ndarray

j¶

Array of column indices from the full model domain.

Type:	ndarray

start_time¶: The first time of the object existence.

end_time¶: The last time of the object existence.

step¶: number of hours between timesteps

dx¶: grid spacing

u¶: storm motion in x-direction

v¶: storm motion in y-direction

boundary_contour(time)¶

Calculate the contour around the edge of the binary mask for the object. For objects with interior holes or multiple connections, binary dilation, hole filling, and erosion are used to generate a single edge contour instead of multiple contours.

Parameters:	time –
Returns:	array of shape (2, number of contour points) containing the x and y coordinates of the object edge.

boundary_polygon(time)¶: Get coordinates of object boundary in counter-clockwise order based on the convex hull of the object. For non-convex objects, the convex hull will not be representative of the object shape and boundary_contour should be used instead.

calc_attribute_statistic(attribute, statistic, time)¶

Calculate statistics based on the values of an attribute. The following statistics are supported: mean, max, min, std, ptp (range), median, skew (mean - median), and percentile_(percentile value).

Parameters:	attribute – Attribute extracted from model grid statistic – Name of statistic being used. time – timestep of the object being investigated
Returns:	The value of the statistic

calc_attribute_statistics(statistic_name)¶

Calculates summary statistics over the domains of each attribute.

Parameters:	statistic_name (string) – numpy statistic, such as mean, std, max, min
Returns:	dict of statistics from each attribute grid.

calc_shape_statistics(stat_names)¶

Calculate shape statistics using regionprops applied to the object mask.

Parameters:	stat_names – List of statistics to be extracted from those calculated by regionprops.
Returns:	Dictionary of shape statistics

calc_shape_step(stat_names, time)¶

Calculate shape statistics for a single time step

Parameters:	stat_names – List of shape statistics calculated from region props time – Time being investigated
Returns:	List of shape statistics

calc_timestep_statistic(statistic, time)¶

Calculate statistics from the primary attribute of the StObject.

Parameters:	statistic – statistic being calculated time – Timestep being investigated
Returns:	Value of the statistic

center_of_mass(time)¶

Calculate the center of mass at a given timestep.

Parameters:	time – Time at which the center of mass calculation is performed
Returns:	The x- and y-coordinates of the center of mass.

center_of_mass_ij(time)¶

Calculate the center of mass in terms of row and column coordinates at a given timestep.

Parameters:	time – Time at which the center of mass calculation is performed
Returns:	The x- and y-coordinates of the center of mass.

closest_distance(time, other_object, other_time)¶

The shortest distance between two objects at specified times.

Parameters:	time (int or datetime) – Valid time for this STObject other_object – Another STObject being compared other_time – The time within the other STObject being evaluated.
Returns:	Distance in units of the x-y coordinates

count_overlap(time, other_object, other_time)¶: Counts the number of points that overlap between this STObject and another STObject. Used for tracking.

estimate_motion(time, intensity_grid, max_u, max_v)¶

Estimate the motion of the object with cross-correlation on the intensity values from the previous time step.

Parameters:	time – time being evaluated. intensity_grid – 2D array of intensities used in cross correlation. max_u – Maximum x-component of motion. Used to limit search area. max_v – Maximum y-component of motion. Used to limit search area
Returns:	u, v, and the minimum error.

extend(step)¶

Adds the data from another STObject to this object.

Parameters:	step – another STObject being added after the current one in time.

extract_attribute_array(data_array, var_name)¶

Extracts data from a 2D array that has the same dimensions as the grid used to identify the object.

Parameters:	data_array – 2D numpy array

extract_attribute_grid(model_grid, potential=False, future=False)¶

Extracts the data from a ModelOutput or ModelGrid object within the bounding box region of the STObject.

Parameters:	model_grid – A ModelGrid or ModelOutput Object potential – Extracts from the time before instead of the same time as the object

extract_patch(patch_radius, full_x, full_y, full_i, full_j)¶

Extract patch of uniform radius from existing STObject. This is intended for extracting patches from STObjects that are built around the bounding box of the original object. Areas outside the object are padded with zeros.

Parameters:	patch_radius (int) – radius of patch in pixels full_x – full x grid encompassing the original field from which the objects have been extracted. full_y – full y grid full_i – full i (row) grid full_j – full j (row) grid
Returns:	new STObject containing the a patched slice of the original STObject.

extract_tendency_grid(model_grid)¶

Extracts the difference in model outputs

Parameters:	model_grid – ModelOutput or ModelGrid object.

get_corner(time)¶

Gets the corner array indices of the STObject at a given time that corresponds to the upper left corner of the bounding box for the STObject.

Parameters:	time – time at which the corner is being extracted.
Returns:	corner index.

max_intensity(time)¶: Calculate the maximum intensity found at a timestep.

max_size()¶

Gets the largest size of the object over all timesteps.

Returns:	Maximum size of the object in pixels

percentile_distance(time, other_object, other_time, percentile)¶

size(time)¶

Gets the size of the object at a given time.

Parameters:	time – Time value being queried.
Returns:	size of the object in pixels

to_geojson(filename, proj, metadata=None)¶

Output the data in the STObject to a geoJSON file.

Parameters:	filename – Name of the file proj – PyProj object for converting the x and y coordinates back to latitude and longitue values. metadata – Metadata describing the object to be included in the top-level properties.

to_geojson_feature(proj, output_grids=False)¶

Output the data in the STObject to a geoJSON file.

Parameters:	proj – PyProj object for converting the x and y coordinates back to latitude and longitude values. output_grids – Whether or not to output the primary gridded fields to the geojson file.

trajectory()¶

Calculates the center of mass for each time step and outputs an array

Returns:

hagelslag.processing.STObject.read_geojson(filename)¶

Reads a geojson file containing an STObject and initializes a new STObject from the information in the file.

Parameters:	filename – Name of the geojson file
Returns:	an STObject

hagelslag.processing.TrackModeler module¶

class hagelslag.processing.TrackModeler.TrackModeler(ensemble_name, train_data_path, forecast_data_path, member_files, start_dates, end_dates, weighting_function, map_file, group_col='Microphysics')¶

Bases: object

TrackModeler is designed to load and process data generated by TrackProcessing and then use that data to fit machine learning models to predict whether or not hail will occur, hail size, and translation errors in time and space.

calc_copulas(output_file, model_names=('start-time', 'translation-x', 'translation-y'), label_columns=('Start_Time_Error', 'Translation_Error_X', 'Translation_Error_Y'))¶

Calculate a copula multivariate normal distribution from the training data for each group of ensemble members. Distributions are written to a pickle file for later use.

Parameters:	output_file – Pickle file model_names – Names of the tracking models label_columns – Names of the data columns used for labeling

Returns:

fit_condition_models(model_names, model_objs, input_columns, output_column='Matched', output_threshold=0.0)¶

Fit machine learning models to predict whether or not hail will occur.

Parameters:

model_names – List of strings with the names for the particular machine learning models
model_objs – scikit-learn style machine learning model objects.
input_columns – list of the names of the columns used as input for the machine learning model
output_column – name of the column used for labeling whether or not the event occurs
output_threshold – splitting threshold to determine if event has occurred. Default 0.0

fit_condition_threshold_models(model_names, model_objs, input_columns, output_column='Matched', output_threshold=0.5, num_folds=5, threshold_score='ets')¶

Fit models to predict hail/no-hail and use cross-validation to determine the probaility threshold that maximizes a skill score.

Parameters:

model_names – List of machine learning model names
model_objs – List of Scikit-learn ML models
input_columns – List of input variables in the training data
output_column – Column used for prediction
output_threshold – Values exceeding this threshold are considered positive events; below are nulls
num_folds – Number of folds in the cross-validation procedure
threshold_score – Score available in ContingencyTable used for determining the best probability threshold

Returns:

None

fit_size_distribution_component_models(model_names, model_objs, input_columns, output_columns)¶

This calculates 2 principal components for the hail size distribution between the shape and scale parameters. Separate machine learning models are fit to predict each component.

Parameters:	model_names – List of machine learning model names model_objs – List of machine learning model objects. input_columns – List of input variables output_columns – Output columns, should contain Shape and Scale.

Returns:

fit_size_distribution_models(model_names, model_objs, input_columns, output_columns=None, calibrate=False)¶

Fits multitask machine learning models to predict the parameters of a size distribution

Parameters:	model_names – List of machine learning model names model_objs – scikit-learn style machine learning model objects input_columns – Training data columns used as input for ML model output_columns – Training data columns used for prediction calibrate – Whether or not to fit a log-linear regression to predictions from ML model

fit_size_models(model_names, model_objs, input_columns, output_column='Hail_Size', output_start=5, output_step=5, output_stop=100)¶: Fit size models to produce discrete pdfs of forecast hail sizes. :param model_names: List of model names :param model_objs: List of model objects :param input_columns: List of input variables :param output_column: Output variable name :param output_start: Hail size bin start :param output_step: hail size bin step :param output_stop: hail size bin stop

fit_track_models(model_names, model_objs, input_columns, output_columns, output_ranges)¶

Fit machine learning models to predict track error offsets.: model_names: model_objs: input_columns: output_columns: output_ranges:

load_data(mode='train', format='csv')¶

Load data from flat data files containing total track information and information about each timestep. The two sets are combined using merge operations on the Track IDs. Additional member information is gathered from the appropriate member file.

Parameters:	mode – “train” or “forecast” format – file format being used. Default is “csv”

load_models(model_path)¶

Load models from pickle files. Note that models should be loaded with the same version of sklearn that they were saved with.

Parameters:	model_path – Path to model pickle files.

output_forecasts_csv(forecasts, mode, csv_path, run_date_format='%Y%m%d-%H%M')¶

Output hail forecast values to csv files by run date and ensemble member.

Parameters:	forecasts (dict) – Dictionary of DataFrames with forecast values mode (str) – either “train” or “forecast” csv_path (str) – Path where csv forecast files are saved

output_forecasts_json(forecasts, condition_model_names, size_model_names, dist_model_names, track_model_names, json_data_path, out_path)¶

Output forecasts to geoJSON format

Parameters:

forecasts (dict) – Dictionary of DataFrames with condition and size distribution forecasts
condition_model_names (list) – Names of all the condition ML models
size_model_names (list) – Names of all the size ML models
dist_model_names (list) – Names of all the size distribution ML models
track_model_names (list) – Names of models that predict the track offset (no longer used)
json_data_path (str) – Path to geoJSON storm files
out_path (str) – Path to where geoJSON forecast files are output

Returns:

output_forecasts_json_parallel(forecasts, condition_model_names, dist_model_names, json_data_path, out_path, num_procs)¶

predict_condition_models(model_names, input_columns, metadata_cols, data_mode='forecast')¶

Apply condition models to forecast data.

Parameters:	model_names – List of names associated with each condition model used for prediction input_columns – List of columns in data used as input into the model metadata_cols – Columns from input data that should be included in the data frame with the predictions. data_mode – Which data subset to pull from for the predictions, “forecast” by default
Returns:	A dictionary of data frames containing probabilities of the event and specified metadata

predict_size_distribution_component_models(model_names, input_columns, output_columns, metadata_cols, data_mode='forecast', location=6)¶

Make predictions using fitted size distribution component models. PCA is used transform the shape and scale parameters so that their correlation is removed.

Parameters:	model_names – Name of the models for predictions input_columns – Data columns used for input into ML models output_columns – Names of output columns metadata_cols – Columns from input data that should be included in the data frame with the predictions. data_mode – Set of data used as input for prediction models location – Value of fixed location parameter
Returns:	Predictions in dictionary of data frames grouped by group type

predict_size_distribution_models(model_names, input_columns, metadata_cols, data_mode='forecast', location=6, calibrate=False)¶

Make predictions using fitted size distribution models. Each ML model predicts the normalized shape and scale parameters simultaneously using multitask learning. Only scikit learn models that support multi task learning can be used.

Parameters:

model_names – Name of the models for predictions
input_columns – Data columns used for input into ML models
metadata_cols – Columns from input data that should be included in the data frame with the predictions.
data_mode – Set of data used as input for prediction models
location – Value of fixed location parameter
calibrate – Whether or not to apply calibration model

Returns:

Predictions in dictionary of data frames grouped by group type

predict_size_models(model_names, input_columns, metadata_cols, data_mode='forecast')¶: Apply size models to forecast data. :param model_names: :param input_columns: :param metadata_cols: :param data_mode:

predict_track_models(model_names, input_columns, metadata_cols, data_mode='forecast')¶

Predict track offsets on forecast data. :param model_names: List of machine learning model names :param input_columns: List of input columns :param metadata_cols: List of metadata columns to include with predictions :param data_mode: train or forecast

Returns:

save_models(model_path)¶: Save machine learning models to pickle files.

hagelslag.processing.TrackModeler.output_forecast(step_forecasts, run_date, ensemble_name, member, track_num, json_data_path, out_path)¶

hagelslag.processing.TrackProcessing module¶

hagelslag.processing.TrackSampler module¶

class hagelslag.processing.TrackSampler.TrackSampler(member, group, run_date, model_names, start_hour, end_hour, grid_shape, dx, track_path, num_samples, copula_file=None)¶

Bases: object

Monte Carlo sampler of forecast storm tracks.

generate_copula_ranks()¶

load_track_forecasts()¶

output_track_probs(track_probs, path)¶

sample_condition()¶

sample_size(size_values=<Mock name='mock()' id='140392962338000'>)¶

sample_start_time(start_time_values=<Mock name='mock()' id='140392959241616'>)¶

sample_tracks(size_ranges, track_ranges, thresholds=<Mock name='mock()' id='140392957459408'>, dilation=13)¶

sample_translation_x(translation_x_values=<Mock name='mock()' id='140392987803984'>)¶

sample_translation_y(translation_y_values=<Mock name='mock()' id='140392959370768'>)¶

hagelslag.processing.TrackSampler.load_grid_info(grid_file)¶

hagelslag.processing.TrackSampler.main()¶

hagelslag.processing.TrackSampler.sample_member_run_tracks(member, group, run_date, model_names, start_hour, end_hour, grid_shape, dx, track_path, num_samples, thresholds, copula_file, out_path, size_ranges, track_ranges)¶

hagelslag.processing.tracker module¶

hagelslag.processing.tracker.extract_storm_objects(label_grid, data, x_grid, y_grid, times, dx=1, dt=1, buffer_radius=0)¶

After storms are labeled, this method extracts the storm objects from the grid and places them into STObjects. The STObjects contain intensity, location, and shape information about each storm at each timestep.

Parameters:	label_grid – 2D or 3D array output by label_storm_objects. data – 2D or 3D array used as input to label_storm_objects. x_grid – 2D array of x-coordinate data, preferably on a uniform spatial grid with units of length. y_grid – 2D array of y-coordinate data. times – List or array of time values, preferably as integers dx – grid spacing in same units as x_grid and y_grid. dt – period elapsed between times buffer_radius – number of extra pixels beyond bounding box of object to store in each STObject
Returns:	list of lists containing STObjects identified at each time.
Return type:	storm_objects

hagelslag.processing.tracker.extract_storm_patches(label_grid, data, x_grid, y_grid, times, dx=1, dt=1, patch_radius=16)¶

After storms are labeled, this method extracts boxes of equal size centered on each storm from the grid and places them into STObjects. The STObjects contain intensity, location, and shape information about each storm at each timestep.

Parameters:	label_grid – 2D or 3D array output by label_storm_objects. data – 2D or 3D array used as input to label_storm_objects. x_grid – 2D array of x-coordinate data, preferably on a uniform spatial grid with units of length. y_grid – 2D array of y-coordinate data. times – List or array of time values, preferably as integers dx – grid spacing in same units as x_grid and y_grid. dt – period elapsed between times patch_radius – Number of grid points from center of mass to extract
Returns:	list of lists containing STObjects identified at each time.
Return type:	storm_objects

hagelslag.processing.tracker.label_storm_objects(data, method, min_intensity, max_intensity, min_area=1, max_area=100, max_range=1, increment=1, gaussian_sd=0)¶

From a 2D grid or time series of 2D grids, this method labels storm objects with either the Enhanced Watershed, Watershed, or Hysteresis methods.

Parameters:	data – the gridded data to be labeled. Should be a 2D numpy array in (y, x) coordinate order or a 3D numpy array in (time, y, x) coordinate order method – “ew” for Enhanced Watershed, “ws” for regular watershed, and “hyst” for hysteresis min_intensity – Minimum intensity threshold for gridpoints contained within any objects max_intensity – For watershed, any points above max_intensity are considered as the same value as max intensity. For hysteresis, all objects have to contain at least 1 pixel that equals or exceeds this value min_area – (default 1) The minimum area of any object in pixels. max_area – (default 100) The area threshold in pixels at which the enhanced watershed ends growth. Object area may exceed this threshold if the pixels at the last watershed level exceed the object area. max_range – Maximum difference in bins for search before growth is stopped. increment – Discretization increment for the enhanced watershed gaussian_sd – Standard deviation of Gaussian filter applied to data
Returns:	an ndarray with the same shape as data in which each pixel is labeled with a positive integer value.
Return type:	label_grid

hagelslag.processing.tracker.track_storms(storm_objects, times, distance_components, distance_maxima, distance_weights, tracked_objects=None)¶

Given the output of extract_storm_objects, this method tracks storms through time and merges individual STObjects into a set of tracks.

Parameters:	storm_objects – list of list of STObjects that have not been tracked. times – List of times associated with each set of STObjects distance_components – list of function objects that make up components of distance function distance_maxima – array of maximum values for each distance for normalization purposes distance_weights – weight given to each component of the distance function. Should add to 1. tracked_objects – List of STObjects that have already been tracked.
Returns:
Return type:	tracked_objects

hagelslag.processing package¶

Submodules¶

hagelslag.processing.EnhancedWatershedSegmenter module¶

hagelslag.processing.EnsembleProducts module¶

hagelslag.processing.Hysteresis module¶

hagelslag.processing.ObjectMatcher module¶

hagelslag.processing.STObject module¶

hagelslag.processing.TrackModeler module¶

hagelslag.processing.TrackProcessing module¶

hagelslag.processing.TrackSampler module¶

hagelslag.processing.tracker module¶

Module contents¶

Table of Contents

Previous topic

Next topic

This Page