hagelslag.processing package

Submodules

hagelslag.processing.EnhancedWatershedSegmenter module

@author: David John Gagne (djgagne@ou.edu)

class hagelslag.processing.EnhancedWatershedSegmenter.EnhancedWatershed(min_intensity, data_increment, max_intensity, size_threshold_pixels, delta)

Bases: object

The enhanced watershed performs image segmentation using a modified version of the traditional watershed technique. It includes a size criteria and creates foothills around each object to keep them distinct. The object is used to store the quantization and size parameters. It can be used to watershed multiple grids.

min_intensity

minimum pixel value for pixel to be part of a region

Type:

int

data_increment

quantization interval. Use 1 if you don’t want to quantize

Type:

int

max_intensity

values greater than maxThresh are treated as the maximum threshold

Type:

int

size_threshold_pixels

clusters smaller than this threshold are ignored.

Type:

int

delta

maximum number of data increments the cluster is allowed to range over. Larger d results in clusters over larger scales.

Type:

int

find_local_maxima(pixels, q_data)

Finds the local maxima in the inputGrid and perform region growing to identify objects.

Parameters:
  • pixels – dictionary of quantized pixel values

  • q_data – 2D array representation of quantized input data

Returns:

array with labeled objects.

grow_centers(centers, q_data)

Once

Parameters:
  • centers

  • q_data

Returns:

static is_closest(point, center, centers, bin_num)
static is_valid(point, shape)
label(input_grid, only_objects=True)

Labels input grid using enhanced watershed algorithm.

Parameters:
  • input_grid (numpy.ndarray) – Grid to be labeled.

  • only_objects (bool) – Only return object pixel values on final grid

Returns:

Array of labeled pixels

quantize(input_grid)

Quantize a grid into discrete steps based on input parameters.

Parameters:

input_grid – 2-d array of values

Returns:

Dictionary of value pointing to pixel locations, and quantized 2-d array of data

remove_foothills(q_data, marked, bin_num, bin_lower, centers, foothills)

Mark points determined to be foothills as globbed, so that they are not included in future searches. Also searches neighboring points to foothill points to determine if they should also be considered foothills.

Parameters:
  • q_data – Quantized data

  • marked – Marked

  • bin_num – Current bin being searched

  • bin_lower – Next bin being searched

  • centers – dictionary of local maxima considered to be object centers

  • foothills – List of foothill points being removed.

set_maximum(q_data, marked, center, bin_lower, foothills, capture_index)

Grow a region at a certain bin level and check if the region has reached the maximum size.

Parameters:
  • q_data – Quantized data array

  • marked – Array marking points that are objects

  • center – Coordinates of the center pixel of the region being grown

  • bin_lower – Intensity level of lower bin being evaluated

  • foothills – List of points that are associated with a center but fall outside the the size or intensity criteria

  • capture_index

Returns:

True if the object is finished growing and False if the object should be grown again at the next threshold level.

static size_filter(labeled_grid, min_size)

Removes labeled objects that are smaller than min_size, and relabels the remaining objects.

Parameters:
  • labeled_grid – Grid that has been labeled

  • min_size – Minimium object size.

Returns:

Labeled array with re-numbered objects to account for those that have been removed

hagelslag.processing.EnhancedWatershedSegmenter.rescale_data(data, data_min, data_max, out_min=0.0, out_max=100.0)

Rescale your input data so that is ranges over integer values, which will perform better in the watershed.

Parameters:
  • data – 2D or 3D ndarray being rescaled

  • data_min – minimum value of input data for scaling purposes

  • data_max – maximum value of input data for scaling purposes

  • out_min – minimum value of scaled data

  • out_max – maximum value of scaled data

Returns:

Linearly scaled ndarray

hagelslag.processing.EnsembleProducts module

hagelslag.processing.Hysteresis module

hagelslag.processing.ObjectMatcher module

class hagelslag.processing.ObjectMatcher.ObjectMatcher(cost_function_components, weights, max_values)

Bases: object

ObjectMatcher calculates distances between two sets of objects and determines the optimal object assignments based on the Hungarian object matching algorithm. ObjectMatcher supports the use of the weighted average of multiple cost functions to determine the distance between objects. Upper limits to each distance component are used to exclude the matching of objects that are too far apart.

cost_function_components

List of distance functions for matching

weights

List of weights for each distance function

max_values

List of the maximum allowable distance for each distance function component.

cost_matrix(set_a, set_b, time_a, time_b)

Calculates the costs (distances) between the items in set a and set b at the specified times.

Parameters:
  • set_a – List of STObjects

  • set_b – List of STObjects

  • time_a – time at which objects in set_a are evaluated

  • time_b – time at whcih object in set_b are evaluated

Returns:

A numpy array with shape [len(set_a), len(set_b)] containing the cost matrix between the items in set a and the items in set b.

match_objects(set_a, set_b, time_a, time_b)

Match two sets of objects at particular times.

Parameters:
  • set_a – list of STObjects

  • set_b – list of STObjects

  • time_a – time at which set_a is being evaluated for matching

  • time_b – time at which set_b is being evaluated for matching

Returns:

List of tuples containing (set_a index, set_b index) for each match

total_cost_function(item_a, item_b, time_a, time_b)

Calculate total cost function between two items.

Parameters:
  • item_a – STObject

  • item_b – STObject

  • time_a – Timestep in item_a at which cost function is evaluated

  • time_b – Timestep in item_b at which cost function is evaluated

Returns:

The total weighted distance between item_a and item_b

class hagelslag.processing.ObjectMatcher.TrackMatcher(cost_function_components, weights, max_values)

Bases: object

Find the optimal pairings among two sets of STObject tracks.

cost_function_components

Array of cost function objects

weights

Array of weights for each cost function. All should sum to 1.

max_values

Array of distance values that correspond to the upper limit distance that should be considered.

match_tracks(set_a, set_b, closest_matches=False)

Find the optimal set of matching assignments between set a and set b. This function supports optimal 1:1 matching using the Munkres method and matching from every object in set a to the closest object in set b. In this situation set b accepts multiple matches from set a.

Parameters:
  • set_a

  • set_b

  • closest_matches

Returns:

neighbor_matches(set_a, set_b)
raw_cost_matrix(set_a, set_b)
track_cost_function(item_a, item_b)
track_cost_matrix(set_a, set_b)
class hagelslag.processing.ObjectMatcher.TrackStepMatcher(cost_function_components, max_values)

Bases: object

Determine if each step in a track is in close proximity to steps from another set of tracks

cost(track_a, time_a, track_b, time_b)
cost_matrix(set_a, set_b)
match(set_a, set_b)

For each step in each track from set_a, identify all steps in all tracks from set_b that meet all cost function criteria

Parameters:
  • set_a – List of STObjects

  • set_b – List of STObjects

Returns:

pandas.DataFrame

Return type:

track_pairings

hagelslag.processing.ObjectMatcher.area_difference(item_a, time_a, item_b, time_b, max_value)

RMS Difference in object areas.

Parameters:
  • item_a – STObject from the first set in ObjectMatcher

  • time_a – Time integer being evaluated

  • item_b – STObject from the second set in ObjectMatcher

  • time_b – Time integer being evaluated

  • max_value – Maximum distance value used as scaling value and upper constraint.

Returns:

Distance value between 0 and 1.

hagelslag.processing.ObjectMatcher.centroid_distance(item_a, time_a, item_b, time_b, max_value)

Euclidean distance between the centroids of item_a and item_b.

Parameters:
  • item_a – STObject from the first set in ObjectMatcher

  • time_a – Time integer being evaluated

  • item_b – STObject from the second set in ObjectMatcher

  • time_b – Time integer being evaluated

  • max_value – Maximum distance value used as scaling value and upper constraint.

Returns:

Distance value between 0 and 1.

hagelslag.processing.ObjectMatcher.closest_distance(item_a, time_a, item_b, time_b, max_value)

Euclidean distance between the pixels in item_a and item_b closest to each other.

Parameters:
  • item_a – STObject from the first set in ObjectMatcher

  • time_a – Time integer being evaluated

  • item_b – STObject from the second set in ObjectMatcher

  • time_b – Time integer being evaluated

  • max_value – Maximum distance value used as scaling value and upper constraint.

Returns:

Distance value between 0 and 1.

hagelslag.processing.ObjectMatcher.duration_distance(item_a, item_b, max_value)

Absolute difference in the duration of two items

Parameters:
  • item_a – STObject from the first set in TrackMatcher

  • item_b – STObject from the second set in TrackMatcher

  • max_value – Maximum distance value used as scaling value and upper constraint.

Returns:

Distance value between 0 and 1.

hagelslag.processing.ObjectMatcher.ellipse_distance(item_a, time_a, item_b, time_b, max_value)

Calculate differences in the properties of ellipses fitted to each object.

Parameters:
  • item_a – STObject from the first set in ObjectMatcher

  • time_a – Time integer being evaluated

  • item_b – STObject from the second set in ObjectMatcher

  • time_b – Time integer being evaluated

  • max_value – Maximum distance value used as scaling value and upper constraint.

Returns:

Distance value between 0 and 1.

hagelslag.processing.ObjectMatcher.max_intensity(item_a, time_a, item_b, time_b, max_value)

RMS difference in maximum intensity

Parameters:
  • item_a – STObject from the first set in ObjectMatcher

  • time_a – Time integer being evaluated

  • item_b – STObject from the second set in ObjectMatcher

  • time_b – Time integer being evaluated

  • max_value – Maximum distance value used as scaling value and upper constraint.

Returns:

Distance value between 0 and 1.

hagelslag.processing.ObjectMatcher.mean_area_distance(item_a, item_b, max_value)

Absolute difference in the means of the areas of each track over time.

Parameters:
  • item_a – STObject from the first set in TrackMatcher

  • item_b – STObject from the second set in TrackMatcher

  • max_value – Maximum distance value used as scaling value and upper constraint.

Returns:

Distance value between 0 and 1.

hagelslag.processing.ObjectMatcher.mean_min_time_distance(item_a, item_b, max_value)

Calculate the mean time difference among the time steps in each object.

Parameters:
  • item_a – STObject from the first set in TrackMatcher

  • item_b – STObject from the second set in TrackMatcher

  • max_value – Maximum distance value used as scaling value and upper constraint.

Returns:

Distance value between 0 and 1.

hagelslag.processing.ObjectMatcher.mean_minimum_centroid_distance(item_a, item_b, max_value)

RMS difference in the minimum distances from the centroids of one track to the centroids of another track

Parameters:
  • item_a – STObject from the first set in TrackMatcher

  • item_b – STObject from the second set in TrackMatcher

  • max_value – Maximum distance value used as scaling value and upper constraint.

Returns:

Distance value between 0 and 1.

hagelslag.processing.ObjectMatcher.nonoverlap(item_a, time_a, item_b, time_b, max_value)

Percentage of pixels in each object that do not overlap with the other object

Parameters:
  • item_a – STObject from the first set in ObjectMatcher

  • time_a – Time integer being evaluated

  • item_b – STObject from the second set in ObjectMatcher

  • time_b – Time integer being evaluated

  • max_value – Maximum distance value used as scaling value and upper constraint.

Returns:

Distance value between 0 and 1.

hagelslag.processing.ObjectMatcher.shifted_centroid_distance(item_a, time_a, item_b, time_b, max_value)

Centroid distance with motion corrections.

Parameters:
  • item_a – STObject from the first set in ObjectMatcher

  • time_a – Time integer being evaluated

  • item_b – STObject from the second set in ObjectMatcher

  • time_b – Time integer being evaluated

  • max_value – Maximum distance value used as scaling value and upper constraint.

Returns:

Distance value between 0 and 1.

hagelslag.processing.ObjectMatcher.start_centroid_distance(item_a, item_b, max_value)

Distance between the centroids of the first step in each object.

Parameters:
  • item_a – STObject from the first set in TrackMatcher

  • item_b – STObject from the second set in TrackMatcher

  • max_value – Maximum distance value used as scaling value and upper constraint.

Returns:

Distance value between 0 and 1.

hagelslag.processing.ObjectMatcher.start_time_distance(item_a, item_b, max_value)

Absolute difference between the starting times of each item.

Parameters:
  • item_a – STObject from the first set in TrackMatcher

  • item_b – STObject from the second set in TrackMatcher

  • max_value – Maximum distance value used as scaling value and upper constraint.

Returns:

Distance value between 0 and 1.

hagelslag.processing.ObjectMatcher.time_distance(item_a, time_a, item_b, time_b, max_value)

hagelslag.processing.STObject module

class hagelslag.processing.STObject.STObject(grid, mask, x, y, i, j, start_time, end_time, step=1, dx=3000, u=None, v=None)

Bases: object

The STObject stores data and location information for objects extracted from the ensemble grids.

grid

All of the data values. Supports a 2D array of values, a list of 2D arrays, or a 3D array.

Type:

ndarray

mask

Grid of 1’s and 0’s in which 1’s indicate the location of the object.

Type:

ndarray

x

Array of x-coordinate values in meters. Longitudes can also be placed here.

Type:

ndarray

y

Array of y-coordinate values in meters. Latitudes can also be placed here.

Type:

ndarray

i

Array of row indices from the full model domain.

Type:

ndarray

j

Array of column indices from the full model domain.

Type:

ndarray

start_time

The first time of the object existence.

end_time

The last time of the object existence.

step

number of hours between timesteps

dx

grid spacing

u

storm motion in x-direction

v

storm motion in y-direction

boundary_contour(time)

Calculate the contour around the edge of the binary mask for the object. For objects with interior holes or multiple connections, binary dilation, hole filling, and erosion are used to generate a single edge contour instead of multiple contours.

Parameters:

time

Returns:

array of shape (2, number of contour points) containing the x and y coordinates of the object edge.

boundary_polygon(time)

Get coordinates of object boundary in counter-clockwise order based on the convex hull of the object. For non-convex objects, the convex hull will not be representative of the object shape and boundary_contour should be used instead.

calc_attribute_statistic(attribute, statistic, time)

Calculate statistics based on the values of an attribute. The following statistics are supported: mean, max, min, std, ptp (range), median, skew (mean - median), and percentile_(percentile value).

Parameters:
  • attribute – Attribute extracted from model grid

  • statistic – Name of statistic being used.

  • time – timestep of the object being investigated

Returns:

The value of the statistic

calc_attribute_statistics(statistic_name)

Calculates summary statistics over the domains of each attribute.

Parameters:

statistic_name (string) – numpy statistic, such as mean, std, max, min

Returns:

dict of statistics from each attribute grid.

calc_shape_statistics(stat_names)

Calculate shape statistics using regionprops applied to the object mask.

Parameters:

stat_names – List of statistics to be extracted from those calculated by regionprops.

Returns:

Dictionary of shape statistics

calc_shape_step(stat_names, time)

Calculate shape statistics for a single time step

Parameters:
  • stat_names – List of shape statistics calculated from region props

  • time – Time being investigated

Returns:

List of shape statistics

calc_timestep_statistic(statistic, time)

Calculate statistics from the primary attribute of the StObject.

Parameters:
  • statistic – statistic being calculated

  • time – Timestep being investigated

Returns:

Value of the statistic

center_of_mass(time)

Calculate the center of mass at a given timestep.

Parameters:

time – Time at which the center of mass calculation is performed

Returns:

The x- and y-coordinates of the center of mass.

center_of_mass_ij(time)

Calculate the center of mass in terms of row and column coordinates at a given timestep.

Parameters:

time – Time at which the center of mass calculation is performed

Returns:

The x- and y-coordinates of the center of mass.

closest_distance(time, other_object, other_time)

The shortest distance between two objects at specified times.

Parameters:
  • time (int or datetime) – Valid time for this STObject

  • other_object – Another STObject being compared

  • other_time – The time within the other STObject being evaluated.

Returns:

Distance in units of the x-y coordinates

count_overlap(time, other_object, other_time)

Counts the number of points that overlap between this STObject and another STObject. Used for tracking.

estimate_motion(time, intensity_grid, max_u, max_v)

Estimate the motion of the object with cross-correlation on the intensity values from the previous time step.

Parameters:
  • time – time being evaluated.

  • intensity_grid – 2D array of intensities used in cross correlation.

  • max_u – Maximum x-component of motion. Used to limit search area.

  • max_v – Maximum y-component of motion. Used to limit search area

Returns:

u, v, and the minimum error.

extend(step)

Adds the data from another STObject to this object.

Parameters:

step – another STObject being added after the current one in time.

extract_attribute_array(data_array, var_name)

Extracts data from a 2D array that has the same dimensions as the grid used to identify the object.

Parameters:

data_array – 2D numpy array

extract_attribute_grid(model_grid, potential=False, future=False)

Extracts the data from a ModelOutput or ModelGrid object within the bounding box region of the STObject.

Parameters:
  • model_grid – A ModelGrid or ModelOutput Object

  • potential – Extracts from the time before instead of the same time as the object

extract_patch(patch_radius, full_x, full_y, full_i, full_j)

Extract patch of uniform radius from existing STObject. This is intended for extracting patches from STObjects that are built around the bounding box of the original object. Areas outside the object are padded with zeros.

Parameters:
  • patch_radius (int) – radius of patch in pixels

  • full_x – full x grid encompassing the original field from which the objects have been extracted.

  • full_y – full y grid

  • full_i – full i (row) grid

  • full_j – full j (row) grid

Returns:

new STObject containing the a patched slice of the original STObject.

extract_tendency_grid(model_grid)

Extracts the difference in model outputs

Parameters:

model_grid – ModelOutput or ModelGrid object.

get_corner(time)

Gets the corner array indices of the STObject at a given time that corresponds to the upper left corner of the bounding box for the STObject.

Parameters:

time – time at which the corner is being extracted.

Returns:

corner index.

max_intensity(time)

Calculate the maximum intensity found at a timestep.

max_size()

Gets the largest size of the object over all timesteps.

Returns:

Maximum size of the object in pixels

percentile_distance(time, other_object, other_time, percentile)
size(time)

Gets the size of the object at a given time.

Parameters:

time – Time value being queried.

Returns:

size of the object in pixels

to_geojson(filename, proj, metadata=None)

Output the data in the STObject to a geoJSON file.

Parameters:
  • filename – Name of the file

  • proj – PyProj object for converting the x and y coordinates back to latitude and longitue values.

  • metadata – Metadata describing the object to be included in the top-level properties.

to_geojson_feature(proj, output_grids=False)

Output the data in the STObject to a geoJSON file.

Parameters:
  • proj – PyProj object for converting the x and y coordinates back to latitude and longitude values.

  • output_grids – Whether or not to output the primary gridded fields to the geojson file.

trajectory()

Calculates the center of mass for each time step and outputs an array

Returns:

hagelslag.processing.STObject.read_geojson(filename)

Reads a geojson file containing an STObject and initializes a new STObject from the information in the file.

Parameters:

filename – Name of the geojson file

Returns:

an STObject

hagelslag.processing.TrackModeler module

class hagelslag.processing.TrackModeler.TrackModeler(ensemble_name, train_data_path, forecast_data_path, member_files, start_dates, end_dates, weighting_function, map_file, group_col='Microphysics')

Bases: object

TrackModeler is designed to load and process data generated by TrackProcessing and then use that data to fit machine learning models to predict whether or not hail will occur, hail size, and translation errors in time and space.

calc_copulas(output_file, model_names=('start-time', 'translation-x', 'translation-y'), label_columns=('Start_Time_Error', 'Translation_Error_X', 'Translation_Error_Y'))

Calculate a copula multivariate normal distribution from the training data for each group of ensemble members. Distributions are written to a pickle file for later use.

Parameters:
  • output_file – Pickle file

  • model_names – Names of the tracking models

  • label_columns – Names of the data columns used for labeling

Returns:

fit_condition_models(model_names, model_objs, input_columns, output_column='Matched', output_threshold=0.0)

Fit machine learning models to predict whether or not hail will occur.

Parameters:
  • model_names – List of strings with the names for the particular machine learning models

  • model_objs – scikit-learn style machine learning model objects.

  • input_columns – list of the names of the columns used as input for the machine learning model

  • output_column – name of the column used for labeling whether or not the event occurs

  • output_threshold – splitting threshold to determine if event has occurred. Default 0.0

fit_condition_threshold_models(model_names, model_objs, input_columns, output_column='Matched', output_threshold=0.5, num_folds=5, threshold_score='ets')

Fit models to predict hail/no-hail and use cross-validation to determine the probaility threshold that maximizes a skill score.

Parameters:
  • model_names – List of machine learning model names

  • model_objs – List of Scikit-learn ML models

  • input_columns – List of input variables in the training data

  • output_column – Column used for prediction

  • output_threshold – Values exceeding this threshold are considered positive events; below are nulls

  • num_folds – Number of folds in the cross-validation procedure

  • threshold_score – Score available in ContingencyTable used for determining the best probability threshold

Returns:

None

fit_size_distribution_component_models(model_names, model_objs, input_columns, output_columns)

This calculates 2 principal components for the hail size distribution between the shape and scale parameters. Separate machine learning models are fit to predict each component.

Parameters:
  • model_names – List of machine learning model names

  • model_objs – List of machine learning model objects.

  • input_columns – List of input variables

  • output_columns – Output columns, should contain Shape and Scale.

Returns:

fit_size_distribution_models(model_names, model_objs, input_columns, output_columns=None, calibrate=False)

Fits multitask machine learning models to predict the parameters of a size distribution

Parameters:
  • model_names – List of machine learning model names

  • model_objs – scikit-learn style machine learning model objects

  • input_columns – Training data columns used as input for ML model

  • output_columns – Training data columns used for prediction

  • calibrate – Whether or not to fit a log-linear regression to predictions from ML model

fit_size_models(model_names, model_objs, input_columns, output_column='Hail_Size', output_start=5, output_step=5, output_stop=100)

Fit size models to produce discrete pdfs of forecast hail sizes. :param model_names: List of model names :param model_objs: List of model objects :param input_columns: List of input variables :param output_column: Output variable name :param output_start: Hail size bin start :param output_step: hail size bin step :param output_stop: hail size bin stop

fit_track_models(model_names, model_objs, input_columns, output_columns, output_ranges)
Fit machine learning models to predict track error offsets.

model_names: model_objs: input_columns: output_columns: output_ranges:

load_data(mode='train', format='csv')

Load data from flat data files containing total track information and information about each timestep. The two sets are combined using merge operations on the Track IDs. Additional member information is gathered from the appropriate member file.

Parameters:
  • mode – “train” or “forecast”

  • format – file format being used. Default is “csv”

load_models(model_path)

Load models from pickle files. Note that models should be loaded with the same version of sklearn that they were saved with.

Parameters:

model_path – Path to model pickle files.

output_forecasts_csv(forecasts, mode, csv_path, run_date_format='%Y%m%d-%H%M')

Output hail forecast values to csv files by run date and ensemble member.

Parameters:
  • forecasts (dict) – Dictionary of DataFrames with forecast values

  • mode (str) – either “train” or “forecast”

  • csv_path (str) – Path where csv forecast files are saved

output_forecasts_json(forecasts, condition_model_names, size_model_names, dist_model_names, track_model_names, json_data_path, out_path)

Output forecasts to geoJSON format

Parameters:
  • forecasts (dict) – Dictionary of DataFrames with condition and size distribution forecasts

  • condition_model_names (list) – Names of all the condition ML models

  • size_model_names (list) – Names of all the size ML models

  • dist_model_names (list) – Names of all the size distribution ML models

  • track_model_names (list) – Names of models that predict the track offset (no longer used)

  • json_data_path (str) – Path to geoJSON storm files

  • out_path (str) – Path to where geoJSON forecast files are output

Returns:

output_forecasts_json_parallel(forecasts, condition_model_names, dist_model_names, json_data_path, out_path, num_procs)
predict_condition_models(model_names, input_columns, metadata_cols, data_mode='forecast')

Apply condition models to forecast data.

Parameters:
  • model_names – List of names associated with each condition model used for prediction

  • input_columns – List of columns in data used as input into the model

  • metadata_cols – Columns from input data that should be included in the data frame with the predictions.

  • data_mode – Which data subset to pull from for the predictions, “forecast” by default

Returns:

A dictionary of data frames containing probabilities of the event and specified metadata

predict_size_distribution_component_models(model_names, input_columns, output_columns, metadata_cols, data_mode='forecast', location=6)

Make predictions using fitted size distribution component models. PCA is used transform the shape and scale parameters so that their correlation is removed.

Parameters:
  • model_names – Name of the models for predictions

  • input_columns – Data columns used for input into ML models

  • output_columns – Names of output columns

  • metadata_cols – Columns from input data that should be included in the data frame with the predictions.

  • data_mode – Set of data used as input for prediction models

  • location – Value of fixed location parameter

Returns:

Predictions in dictionary of data frames grouped by group type

predict_size_distribution_models(model_names, input_columns, metadata_cols, data_mode='forecast', location=6, calibrate=False)

Make predictions using fitted size distribution models. Each ML model predicts the normalized shape and scale parameters simultaneously using multitask learning. Only scikit learn models that support multi task learning can be used.

Parameters:
  • model_names – Name of the models for predictions

  • input_columns – Data columns used for input into ML models

  • metadata_cols – Columns from input data that should be included in the data frame with the predictions.

  • data_mode – Set of data used as input for prediction models

  • location – Value of fixed location parameter

  • calibrate – Whether or not to apply calibration model

Returns:

Predictions in dictionary of data frames grouped by group type

predict_size_models(model_names, input_columns, metadata_cols, data_mode='forecast')

Apply size models to forecast data. :param model_names: :param input_columns: :param metadata_cols: :param data_mode:

predict_track_models(model_names, input_columns, metadata_cols, data_mode='forecast')

Predict track offsets on forecast data. :param model_names: List of machine learning model names :param input_columns: List of input columns :param metadata_cols: List of metadata columns to include with predictions :param data_mode: train or forecast

Returns:

save_models(model_path)

Save machine learning models to pickle files.

hagelslag.processing.TrackModeler.output_forecast(step_forecasts, run_date, ensemble_name, member, track_num, json_data_path, out_path)

hagelslag.processing.TrackProcessing module

hagelslag.processing.TrackSampler module

class hagelslag.processing.TrackSampler.TrackSampler(member, group, run_date, model_names, start_hour, end_hour, grid_shape, dx, track_path, num_samples, copula_file=None)

Bases: object

Monte Carlo sampler of forecast storm tracks.

generate_copula_ranks()
load_track_forecasts()
output_track_probs(track_probs, path)
sample_condition()
sample_size(size_values=<Mock name='mock()' id='140188329796960'>)
sample_start_time(start_time_values=<Mock name='mock()' id='140188329797152'>)
sample_tracks(size_ranges, track_ranges, thresholds=<Mock name='mock()' id='140188329796720'>, dilation=13)
sample_translation_x(translation_x_values=<Mock name='mock()' id='140188329797344'>)
sample_translation_y(translation_y_values=<Mock name='mock()' id='140188329797536'>)
hagelslag.processing.TrackSampler.load_grid_info(grid_file)
hagelslag.processing.TrackSampler.main()
hagelslag.processing.TrackSampler.sample_member_run_tracks(member, group, run_date, model_names, start_hour, end_hour, grid_shape, dx, track_path, num_samples, thresholds, copula_file, out_path, size_ranges, track_ranges)

hagelslag.processing.tracker module

Module contents