hagelslag.processing package¶
Submodules¶
hagelslag.processing.EnhancedWatershedSegmenter module¶
@author: David John Gagne (djgagne@ou.edu)
- class hagelslag.processing.EnhancedWatershedSegmenter.EnhancedWatershed(min_intensity, data_increment, max_intensity, size_threshold_pixels, delta)¶
Bases:
objectThe enhanced watershed performs image segmentation using a modified version of the traditional watershed technique. It includes a size criteria and creates foothills around each object to keep them distinct. The object is used to store the quantization and size parameters. It can be used to watershed multiple grids.
- min_intensity¶
minimum pixel value for pixel to be part of a region
- Type:
int
- data_increment¶
quantization interval. Use 1 if you don’t want to quantize
- Type:
int
- max_intensity¶
values greater than maxThresh are treated as the maximum threshold
- Type:
int
- size_threshold_pixels¶
clusters smaller than this threshold are ignored.
- Type:
int
- delta¶
maximum number of data increments the cluster is allowed to range over. Larger d results in clusters over larger scales.
- Type:
int
- find_local_maxima(pixels, q_data)¶
Finds the local maxima in the inputGrid and perform region growing to identify objects.
- Parameters:
pixels – dictionary of quantized pixel values
q_data – 2D array representation of quantized input data
- Returns:
array with labeled objects.
- grow_centers(centers, q_data)¶
Once
- Parameters:
centers –
q_data –
Returns:
- static is_closest(point, center, centers, bin_num)¶
- static is_valid(point, shape)¶
- label(input_grid, only_objects=True)¶
Labels input grid using enhanced watershed algorithm.
- Parameters:
input_grid (numpy.ndarray) – Grid to be labeled.
only_objects (bool) – Only return object pixel values on final grid
- Returns:
Array of labeled pixels
- quantize(input_grid)¶
Quantize a grid into discrete steps based on input parameters.
- Parameters:
input_grid – 2-d array of values
- Returns:
Dictionary of value pointing to pixel locations, and quantized 2-d array of data
- remove_foothills(q_data, marked, bin_num, bin_lower, centers, foothills)¶
Mark points determined to be foothills as globbed, so that they are not included in future searches. Also searches neighboring points to foothill points to determine if they should also be considered foothills.
- Parameters:
q_data – Quantized data
marked – Marked
bin_num – Current bin being searched
bin_lower – Next bin being searched
centers – dictionary of local maxima considered to be object centers
foothills – List of foothill points being removed.
- set_maximum(q_data, marked, center, bin_lower, foothills, capture_index)¶
Grow a region at a certain bin level and check if the region has reached the maximum size.
- Parameters:
q_data – Quantized data array
marked – Array marking points that are objects
center – Coordinates of the center pixel of the region being grown
bin_lower – Intensity level of lower bin being evaluated
foothills – List of points that are associated with a center but fall outside the the size or intensity criteria
capture_index –
- Returns:
True if the object is finished growing and False if the object should be grown again at the next threshold level.
- static size_filter(labeled_grid, min_size)¶
Removes labeled objects that are smaller than min_size, and relabels the remaining objects.
- Parameters:
labeled_grid – Grid that has been labeled
min_size – Minimium object size.
- Returns:
Labeled array with re-numbered objects to account for those that have been removed
- hagelslag.processing.EnhancedWatershedSegmenter.rescale_data(data, data_min, data_max, out_min=0.0, out_max=100.0)¶
Rescale your input data so that is ranges over integer values, which will perform better in the watershed.
- Parameters:
data – 2D or 3D ndarray being rescaled
data_min – minimum value of input data for scaling purposes
data_max – maximum value of input data for scaling purposes
out_min – minimum value of scaled data
out_max – maximum value of scaled data
- Returns:
Linearly scaled ndarray
hagelslag.processing.EnsembleProducts module¶
hagelslag.processing.Hysteresis module¶
hagelslag.processing.ObjectMatcher module¶
- class hagelslag.processing.ObjectMatcher.ObjectMatcher(cost_function_components, weights, max_values)¶
Bases:
objectObjectMatcher calculates distances between two sets of objects and determines the optimal object assignments based on the Hungarian object matching algorithm. ObjectMatcher supports the use of the weighted average of multiple cost functions to determine the distance between objects. Upper limits to each distance component are used to exclude the matching of objects that are too far apart.
- cost_function_components¶
List of distance functions for matching
- weights¶
List of weights for each distance function
- max_values¶
List of the maximum allowable distance for each distance function component.
- cost_matrix(set_a, set_b, time_a, time_b)¶
Calculates the costs (distances) between the items in set a and set b at the specified times.
- Parameters:
set_a – List of STObjects
set_b – List of STObjects
time_a – time at which objects in set_a are evaluated
time_b – time at whcih object in set_b are evaluated
- Returns:
A numpy array with shape [len(set_a), len(set_b)] containing the cost matrix between the items in set a and the items in set b.
- match_objects(set_a, set_b, time_a, time_b)¶
Match two sets of objects at particular times.
- Parameters:
set_a – list of STObjects
set_b – list of STObjects
time_a – time at which set_a is being evaluated for matching
time_b – time at which set_b is being evaluated for matching
- Returns:
List of tuples containing (set_a index, set_b index) for each match
- total_cost_function(item_a, item_b, time_a, time_b)¶
Calculate total cost function between two items.
- Parameters:
item_a – STObject
item_b – STObject
time_a – Timestep in item_a at which cost function is evaluated
time_b – Timestep in item_b at which cost function is evaluated
- Returns:
The total weighted distance between item_a and item_b
- class hagelslag.processing.ObjectMatcher.TrackMatcher(cost_function_components, weights, max_values)¶
Bases:
objectFind the optimal pairings among two sets of STObject tracks.
- cost_function_components¶
Array of cost function objects
- weights¶
Array of weights for each cost function. All should sum to 1.
- max_values¶
Array of distance values that correspond to the upper limit distance that should be considered.
- match_tracks(set_a, set_b, closest_matches=False)¶
Find the optimal set of matching assignments between set a and set b. This function supports optimal 1:1 matching using the Munkres method and matching from every object in set a to the closest object in set b. In this situation set b accepts multiple matches from set a.
- Parameters:
set_a –
set_b –
closest_matches –
Returns:
- neighbor_matches(set_a, set_b)¶
- raw_cost_matrix(set_a, set_b)¶
- track_cost_function(item_a, item_b)¶
- track_cost_matrix(set_a, set_b)¶
- class hagelslag.processing.ObjectMatcher.TrackStepMatcher(cost_function_components, max_values)¶
Bases:
objectDetermine if each step in a track is in close proximity to steps from another set of tracks
- cost(track_a, time_a, track_b, time_b)¶
- cost_matrix(set_a, set_b)¶
- match(set_a, set_b)¶
For each step in each track from set_a, identify all steps in all tracks from set_b that meet all cost function criteria
- Parameters:
set_a – List of STObjects
set_b – List of STObjects
- Returns:
pandas.DataFrame
- Return type:
track_pairings
- hagelslag.processing.ObjectMatcher.area_difference(item_a, time_a, item_b, time_b, max_value)¶
RMS Difference in object areas.
- Parameters:
item_a – STObject from the first set in ObjectMatcher
time_a – Time integer being evaluated
item_b – STObject from the second set in ObjectMatcher
time_b – Time integer being evaluated
max_value – Maximum distance value used as scaling value and upper constraint.
- Returns:
Distance value between 0 and 1.
- hagelslag.processing.ObjectMatcher.centroid_distance(item_a, time_a, item_b, time_b, max_value)¶
Euclidean distance between the centroids of item_a and item_b.
- Parameters:
item_a – STObject from the first set in ObjectMatcher
time_a – Time integer being evaluated
item_b – STObject from the second set in ObjectMatcher
time_b – Time integer being evaluated
max_value – Maximum distance value used as scaling value and upper constraint.
- Returns:
Distance value between 0 and 1.
- hagelslag.processing.ObjectMatcher.closest_distance(item_a, time_a, item_b, time_b, max_value)¶
Euclidean distance between the pixels in item_a and item_b closest to each other.
- Parameters:
item_a – STObject from the first set in ObjectMatcher
time_a – Time integer being evaluated
item_b – STObject from the second set in ObjectMatcher
time_b – Time integer being evaluated
max_value – Maximum distance value used as scaling value and upper constraint.
- Returns:
Distance value between 0 and 1.
- hagelslag.processing.ObjectMatcher.duration_distance(item_a, item_b, max_value)¶
Absolute difference in the duration of two items
- Parameters:
item_a – STObject from the first set in TrackMatcher
item_b – STObject from the second set in TrackMatcher
max_value – Maximum distance value used as scaling value and upper constraint.
- Returns:
Distance value between 0 and 1.
- hagelslag.processing.ObjectMatcher.ellipse_distance(item_a, time_a, item_b, time_b, max_value)¶
Calculate differences in the properties of ellipses fitted to each object.
- Parameters:
item_a – STObject from the first set in ObjectMatcher
time_a – Time integer being evaluated
item_b – STObject from the second set in ObjectMatcher
time_b – Time integer being evaluated
max_value – Maximum distance value used as scaling value and upper constraint.
- Returns:
Distance value between 0 and 1.
- hagelslag.processing.ObjectMatcher.max_intensity(item_a, time_a, item_b, time_b, max_value)¶
RMS difference in maximum intensity
- Parameters:
item_a – STObject from the first set in ObjectMatcher
time_a – Time integer being evaluated
item_b – STObject from the second set in ObjectMatcher
time_b – Time integer being evaluated
max_value – Maximum distance value used as scaling value and upper constraint.
- Returns:
Distance value between 0 and 1.
- hagelslag.processing.ObjectMatcher.mean_area_distance(item_a, item_b, max_value)¶
Absolute difference in the means of the areas of each track over time.
- Parameters:
item_a – STObject from the first set in TrackMatcher
item_b – STObject from the second set in TrackMatcher
max_value – Maximum distance value used as scaling value and upper constraint.
- Returns:
Distance value between 0 and 1.
- hagelslag.processing.ObjectMatcher.mean_min_time_distance(item_a, item_b, max_value)¶
Calculate the mean time difference among the time steps in each object.
- Parameters:
item_a – STObject from the first set in TrackMatcher
item_b – STObject from the second set in TrackMatcher
max_value – Maximum distance value used as scaling value and upper constraint.
- Returns:
Distance value between 0 and 1.
- hagelslag.processing.ObjectMatcher.mean_minimum_centroid_distance(item_a, item_b, max_value)¶
RMS difference in the minimum distances from the centroids of one track to the centroids of another track
- Parameters:
item_a – STObject from the first set in TrackMatcher
item_b – STObject from the second set in TrackMatcher
max_value – Maximum distance value used as scaling value and upper constraint.
- Returns:
Distance value between 0 and 1.
- hagelslag.processing.ObjectMatcher.nonoverlap(item_a, time_a, item_b, time_b, max_value)¶
Percentage of pixels in each object that do not overlap with the other object
- Parameters:
item_a – STObject from the first set in ObjectMatcher
time_a – Time integer being evaluated
item_b – STObject from the second set in ObjectMatcher
time_b – Time integer being evaluated
max_value – Maximum distance value used as scaling value and upper constraint.
- Returns:
Distance value between 0 and 1.
- hagelslag.processing.ObjectMatcher.shifted_centroid_distance(item_a, time_a, item_b, time_b, max_value)¶
Centroid distance with motion corrections.
- Parameters:
item_a – STObject from the first set in ObjectMatcher
time_a – Time integer being evaluated
item_b – STObject from the second set in ObjectMatcher
time_b – Time integer being evaluated
max_value – Maximum distance value used as scaling value and upper constraint.
- Returns:
Distance value between 0 and 1.
- hagelslag.processing.ObjectMatcher.start_centroid_distance(item_a, item_b, max_value)¶
Distance between the centroids of the first step in each object.
- Parameters:
item_a – STObject from the first set in TrackMatcher
item_b – STObject from the second set in TrackMatcher
max_value – Maximum distance value used as scaling value and upper constraint.
- Returns:
Distance value between 0 and 1.
- hagelslag.processing.ObjectMatcher.start_time_distance(item_a, item_b, max_value)¶
Absolute difference between the starting times of each item.
- Parameters:
item_a – STObject from the first set in TrackMatcher
item_b – STObject from the second set in TrackMatcher
max_value – Maximum distance value used as scaling value and upper constraint.
- Returns:
Distance value between 0 and 1.
- hagelslag.processing.ObjectMatcher.time_distance(item_a, time_a, item_b, time_b, max_value)¶
hagelslag.processing.STObject module¶
- class hagelslag.processing.STObject.STObject(grid, mask, x, y, i, j, start_time, end_time, step=1, dx=3000, u=None, v=None)¶
Bases:
objectThe STObject stores data and location information for objects extracted from the ensemble grids.
- grid¶
All of the data values. Supports a 2D array of values, a list of 2D arrays, or a 3D array.
- Type:
ndarray
- mask¶
Grid of 1’s and 0’s in which 1’s indicate the location of the object.
- Type:
ndarray
- x¶
Array of x-coordinate values in meters. Longitudes can also be placed here.
- Type:
ndarray
- y¶
Array of y-coordinate values in meters. Latitudes can also be placed here.
- Type:
ndarray
- i¶
Array of row indices from the full model domain.
- Type:
ndarray
- j¶
Array of column indices from the full model domain.
- Type:
ndarray
- start_time¶
The first time of the object existence.
- end_time¶
The last time of the object existence.
- step¶
number of hours between timesteps
- dx¶
grid spacing
- u¶
storm motion in x-direction
- v¶
storm motion in y-direction
- boundary_contour(time)¶
Calculate the contour around the edge of the binary mask for the object. For objects with interior holes or multiple connections, binary dilation, hole filling, and erosion are used to generate a single edge contour instead of multiple contours.
- Parameters:
time –
- Returns:
array of shape (2, number of contour points) containing the x and y coordinates of the object edge.
- boundary_polygon(time)¶
Get coordinates of object boundary in counter-clockwise order based on the convex hull of the object. For non-convex objects, the convex hull will not be representative of the object shape and boundary_contour should be used instead.
- calc_attribute_statistic(attribute, statistic, time)¶
Calculate statistics based on the values of an attribute. The following statistics are supported: mean, max, min, std, ptp (range), median, skew (mean - median), and percentile_(percentile value).
- Parameters:
attribute – Attribute extracted from model grid
statistic – Name of statistic being used.
time – timestep of the object being investigated
- Returns:
The value of the statistic
- calc_attribute_statistics(statistic_name)¶
Calculates summary statistics over the domains of each attribute.
- Parameters:
statistic_name (string) – numpy statistic, such as mean, std, max, min
- Returns:
dict of statistics from each attribute grid.
- calc_shape_statistics(stat_names)¶
Calculate shape statistics using regionprops applied to the object mask.
- Parameters:
stat_names – List of statistics to be extracted from those calculated by regionprops.
- Returns:
Dictionary of shape statistics
- calc_shape_step(stat_names, time)¶
Calculate shape statistics for a single time step
- Parameters:
stat_names – List of shape statistics calculated from region props
time – Time being investigated
- Returns:
List of shape statistics
- calc_timestep_statistic(statistic, time)¶
Calculate statistics from the primary attribute of the StObject.
- Parameters:
statistic – statistic being calculated
time – Timestep being investigated
- Returns:
Value of the statistic
- center_of_mass(time)¶
Calculate the center of mass at a given timestep.
- Parameters:
time – Time at which the center of mass calculation is performed
- Returns:
The x- and y-coordinates of the center of mass.
- center_of_mass_ij(time)¶
Calculate the center of mass in terms of row and column coordinates at a given timestep.
- Parameters:
time – Time at which the center of mass calculation is performed
- Returns:
The x- and y-coordinates of the center of mass.
- closest_distance(time, other_object, other_time)¶
The shortest distance between two objects at specified times.
- Parameters:
time (int or datetime) – Valid time for this STObject
other_object – Another STObject being compared
other_time – The time within the other STObject being evaluated.
- Returns:
Distance in units of the x-y coordinates
- count_overlap(time, other_object, other_time)¶
Counts the number of points that overlap between this STObject and another STObject. Used for tracking.
- estimate_motion(time, intensity_grid, max_u, max_v)¶
Estimate the motion of the object with cross-correlation on the intensity values from the previous time step.
- Parameters:
time – time being evaluated.
intensity_grid – 2D array of intensities used in cross correlation.
max_u – Maximum x-component of motion. Used to limit search area.
max_v – Maximum y-component of motion. Used to limit search area
- Returns:
u, v, and the minimum error.
- extend(step)¶
Adds the data from another STObject to this object.
- Parameters:
step – another STObject being added after the current one in time.
- extract_attribute_array(data_array, var_name)¶
Extracts data from a 2D array that has the same dimensions as the grid used to identify the object.
- Parameters:
data_array – 2D numpy array
- extract_attribute_grid(model_grid, potential=False, future=False)¶
Extracts the data from a ModelOutput or ModelGrid object within the bounding box region of the STObject.
- Parameters:
model_grid – A ModelGrid or ModelOutput Object
potential – Extracts from the time before instead of the same time as the object
- extract_patch(patch_radius, full_x, full_y, full_i, full_j)¶
Extract patch of uniform radius from existing STObject. This is intended for extracting patches from STObjects that are built around the bounding box of the original object. Areas outside the object are padded with zeros.
- Parameters:
patch_radius (int) – radius of patch in pixels
full_x – full x grid encompassing the original field from which the objects have been extracted.
full_y – full y grid
full_i – full i (row) grid
full_j – full j (row) grid
- Returns:
new STObject containing the a patched slice of the original STObject.
- extract_tendency_grid(model_grid)¶
Extracts the difference in model outputs
- Parameters:
model_grid – ModelOutput or ModelGrid object.
- get_corner(time)¶
Gets the corner array indices of the STObject at a given time that corresponds to the upper left corner of the bounding box for the STObject.
- Parameters:
time – time at which the corner is being extracted.
- Returns:
corner index.
- max_intensity(time)¶
Calculate the maximum intensity found at a timestep.
- max_size()¶
Gets the largest size of the object over all timesteps.
- Returns:
Maximum size of the object in pixels
- percentile_distance(time, other_object, other_time, percentile)¶
- size(time)¶
Gets the size of the object at a given time.
- Parameters:
time – Time value being queried.
- Returns:
size of the object in pixels
- to_geojson(filename, proj, metadata=None)¶
Output the data in the STObject to a geoJSON file.
- Parameters:
filename – Name of the file
proj – PyProj object for converting the x and y coordinates back to latitude and longitue values.
metadata – Metadata describing the object to be included in the top-level properties.
- to_geojson_feature(proj, output_grids=False)¶
Output the data in the STObject to a geoJSON file.
- Parameters:
proj – PyProj object for converting the x and y coordinates back to latitude and longitude values.
output_grids – Whether or not to output the primary gridded fields to the geojson file.
- trajectory()¶
Calculates the center of mass for each time step and outputs an array
Returns:
- hagelslag.processing.STObject.read_geojson(filename)¶
Reads a geojson file containing an STObject and initializes a new STObject from the information in the file.
- Parameters:
filename – Name of the geojson file
- Returns:
an STObject
hagelslag.processing.TrackModeler module¶
- class hagelslag.processing.TrackModeler.TrackModeler(ensemble_name, train_data_path, forecast_data_path, member_files, start_dates, end_dates, weighting_function, map_file, group_col='Microphysics')¶
Bases:
objectTrackModeler is designed to load and process data generated by TrackProcessing and then use that data to fit machine learning models to predict whether or not hail will occur, hail size, and translation errors in time and space.
- calc_copulas(output_file, model_names=('start-time', 'translation-x', 'translation-y'), label_columns=('Start_Time_Error', 'Translation_Error_X', 'Translation_Error_Y'))¶
Calculate a copula multivariate normal distribution from the training data for each group of ensemble members. Distributions are written to a pickle file for later use.
- Parameters:
output_file – Pickle file
model_names – Names of the tracking models
label_columns – Names of the data columns used for labeling
Returns:
- fit_condition_models(model_names, model_objs, input_columns, output_column='Matched', output_threshold=0.0)¶
Fit machine learning models to predict whether or not hail will occur.
- Parameters:
model_names – List of strings with the names for the particular machine learning models
model_objs – scikit-learn style machine learning model objects.
input_columns – list of the names of the columns used as input for the machine learning model
output_column – name of the column used for labeling whether or not the event occurs
output_threshold – splitting threshold to determine if event has occurred. Default 0.0
- fit_condition_threshold_models(model_names, model_objs, input_columns, output_column='Matched', output_threshold=0.5, num_folds=5, threshold_score='ets')¶
Fit models to predict hail/no-hail and use cross-validation to determine the probaility threshold that maximizes a skill score.
- Parameters:
model_names – List of machine learning model names
model_objs – List of Scikit-learn ML models
input_columns – List of input variables in the training data
output_column – Column used for prediction
output_threshold – Values exceeding this threshold are considered positive events; below are nulls
num_folds – Number of folds in the cross-validation procedure
threshold_score – Score available in ContingencyTable used for determining the best probability threshold
- Returns:
None
- fit_size_distribution_component_models(model_names, model_objs, input_columns, output_columns)¶
This calculates 2 principal components for the hail size distribution between the shape and scale parameters. Separate machine learning models are fit to predict each component.
- Parameters:
model_names – List of machine learning model names
model_objs – List of machine learning model objects.
input_columns – List of input variables
output_columns – Output columns, should contain Shape and Scale.
Returns:
- fit_size_distribution_models(model_names, model_objs, input_columns, output_columns=None, calibrate=False)¶
Fits multitask machine learning models to predict the parameters of a size distribution
- Parameters:
model_names – List of machine learning model names
model_objs – scikit-learn style machine learning model objects
input_columns – Training data columns used as input for ML model
output_columns – Training data columns used for prediction
calibrate – Whether or not to fit a log-linear regression to predictions from ML model
- fit_size_models(model_names, model_objs, input_columns, output_column='Hail_Size', output_start=5, output_step=5, output_stop=100)¶
Fit size models to produce discrete pdfs of forecast hail sizes. :param model_names: List of model names :param model_objs: List of model objects :param input_columns: List of input variables :param output_column: Output variable name :param output_start: Hail size bin start :param output_step: hail size bin step :param output_stop: hail size bin stop
- fit_track_models(model_names, model_objs, input_columns, output_columns, output_ranges)¶
- Fit machine learning models to predict track error offsets.
model_names: model_objs: input_columns: output_columns: output_ranges:
- load_data(mode='train', format='csv')¶
Load data from flat data files containing total track information and information about each timestep. The two sets are combined using merge operations on the Track IDs. Additional member information is gathered from the appropriate member file.
- Parameters:
mode – “train” or “forecast”
format – file format being used. Default is “csv”
- load_models(model_path)¶
Load models from pickle files. Note that models should be loaded with the same version of sklearn that they were saved with.
- Parameters:
model_path – Path to model pickle files.
- output_forecasts_csv(forecasts, mode, csv_path, run_date_format='%Y%m%d-%H%M')¶
Output hail forecast values to csv files by run date and ensemble member.
- Parameters:
forecasts (dict) – Dictionary of DataFrames with forecast values
mode (str) – either “train” or “forecast”
csv_path (str) – Path where csv forecast files are saved
- output_forecasts_json(forecasts, condition_model_names, size_model_names, dist_model_names, track_model_names, json_data_path, out_path)¶
Output forecasts to geoJSON format
- Parameters:
forecasts (dict) – Dictionary of DataFrames with condition and size distribution forecasts
condition_model_names (list) – Names of all the condition ML models
size_model_names (list) – Names of all the size ML models
dist_model_names (list) – Names of all the size distribution ML models
track_model_names (list) – Names of models that predict the track offset (no longer used)
json_data_path (str) – Path to geoJSON storm files
out_path (str) – Path to where geoJSON forecast files are output
Returns:
- output_forecasts_json_parallel(forecasts, condition_model_names, dist_model_names, json_data_path, out_path, num_procs)¶
- predict_condition_models(model_names, input_columns, metadata_cols, data_mode='forecast')¶
Apply condition models to forecast data.
- Parameters:
model_names – List of names associated with each condition model used for prediction
input_columns – List of columns in data used as input into the model
metadata_cols – Columns from input data that should be included in the data frame with the predictions.
data_mode – Which data subset to pull from for the predictions, “forecast” by default
- Returns:
A dictionary of data frames containing probabilities of the event and specified metadata
- predict_size_distribution_component_models(model_names, input_columns, output_columns, metadata_cols, data_mode='forecast', location=6)¶
Make predictions using fitted size distribution component models. PCA is used transform the shape and scale parameters so that their correlation is removed.
- Parameters:
model_names – Name of the models for predictions
input_columns – Data columns used for input into ML models
output_columns – Names of output columns
metadata_cols – Columns from input data that should be included in the data frame with the predictions.
data_mode – Set of data used as input for prediction models
location – Value of fixed location parameter
- Returns:
Predictions in dictionary of data frames grouped by group type
- predict_size_distribution_models(model_names, input_columns, metadata_cols, data_mode='forecast', location=6, calibrate=False)¶
Make predictions using fitted size distribution models. Each ML model predicts the normalized shape and scale parameters simultaneously using multitask learning. Only scikit learn models that support multi task learning can be used.
- Parameters:
model_names – Name of the models for predictions
input_columns – Data columns used for input into ML models
metadata_cols – Columns from input data that should be included in the data frame with the predictions.
data_mode – Set of data used as input for prediction models
location – Value of fixed location parameter
calibrate – Whether or not to apply calibration model
- Returns:
Predictions in dictionary of data frames grouped by group type
- predict_size_models(model_names, input_columns, metadata_cols, data_mode='forecast')¶
Apply size models to forecast data. :param model_names: :param input_columns: :param metadata_cols: :param data_mode:
- predict_track_models(model_names, input_columns, metadata_cols, data_mode='forecast')¶
Predict track offsets on forecast data. :param model_names: List of machine learning model names :param input_columns: List of input columns :param metadata_cols: List of metadata columns to include with predictions :param data_mode: train or forecast
Returns:
- save_models(model_path)¶
Save machine learning models to pickle files.
- hagelslag.processing.TrackModeler.output_forecast(step_forecasts, run_date, ensemble_name, member, track_num, json_data_path, out_path)¶
hagelslag.processing.TrackProcessing module¶
hagelslag.processing.TrackSampler module¶
- class hagelslag.processing.TrackSampler.TrackSampler(member, group, run_date, model_names, start_hour, end_hour, grid_shape, dx, track_path, num_samples, copula_file=None)¶
Bases:
objectMonte Carlo sampler of forecast storm tracks.
- generate_copula_ranks()¶
- load_track_forecasts()¶
- output_track_probs(track_probs, path)¶
- sample_condition()¶
- sample_size(size_values=<Mock name='mock()' id='140674156340624'>)¶
- sample_start_time(start_time_values=<Mock name='mock()' id='140674156340816'>)¶
- sample_tracks(size_ranges, track_ranges, thresholds=<Mock name='mock()' id='140674156340384'>, dilation=13)¶
- sample_translation_x(translation_x_values=<Mock name='mock()' id='140674156341008'>)¶
- sample_translation_y(translation_y_values=<Mock name='mock()' id='140674156341200'>)¶
- hagelslag.processing.TrackSampler.load_grid_info(grid_file)¶
- hagelslag.processing.TrackSampler.main()¶
- hagelslag.processing.TrackSampler.sample_member_run_tracks(member, group, run_date, model_names, start_hour, end_hour, grid_shape, dx, track_path, num_samples, thresholds, copula_file, out_path, size_ranges, track_ranges)¶