hagelslag.processing package¶
Submodules¶
hagelslag.processing.EnhancedWatershedSegmenter module¶
@author: David John Gagne (djgagne@ou.edu)
-
class
hagelslag.processing.EnhancedWatershedSegmenter.EnhancedWatershed(min_intensity, data_increment, max_intensity, size_threshold_pixels, delta)¶ Bases:
objectThe enhanced watershed performs image segmentation using a modified version of the traditional watershed technique. It includes a size criteria and creates foothills around each object to keep them distinct. The object is used to store the quantization and size parameters. It can be used to watershed multiple grids.
-
min_intensity¶ minimum pixel value for pixel to be part of a region
Type: int
-
data_increment¶ quantization interval. Use 1 if you don’t want to quantize
Type: int
-
max_intensity¶ values greater than maxThresh are treated as the maximum threshold
Type: int
-
size_threshold_pixels¶ clusters smaller than this threshold are ignored.
Type: int
-
delta¶ maximum number of data increments the cluster is allowed to range over. Larger d results in clusters over larger scales.
Type: int
-
find_local_maxima(pixels, q_data)¶ Finds the local maxima in the inputGrid and perform region growing to identify objects.
Parameters: - pixels – dictionary of quantized pixel values
- q_data – 2D array representation of quantized input data
Returns: array with labeled objects.
-
grow_centers(centers, q_data)¶ Once
Parameters: - centers –
- q_data –
Returns:
-
static
is_closest(point, center, centers, bin_num)¶
-
static
is_valid(point, shape)¶
-
label(input_grid, only_objects=True)¶ Labels input grid using enhanced watershed algorithm.
Parameters: - input_grid (numpy.ndarray) – Grid to be labeled.
- only_objects (bool) – Only return object pixel values on final grid
Returns: Array of labeled pixels
-
quantize(input_grid)¶ Quantize a grid into discrete steps based on input parameters.
Parameters: input_grid – 2-d array of values Returns: Dictionary of value pointing to pixel locations, and quantized 2-d array of data
-
remove_foothills(q_data, marked, bin_num, bin_lower, centers, foothills)¶ Mark points determined to be foothills as globbed, so that they are not included in future searches. Also searches neighboring points to foothill points to determine if they should also be considered foothills.
Parameters: - q_data – Quantized data
- marked – Marked
- bin_num – Current bin being searched
- bin_lower – Next bin being searched
- centers – dictionary of local maxima considered to be object centers
- foothills – List of foothill points being removed.
-
set_maximum(q_data, marked, center, bin_lower, foothills, capture_index)¶ Grow a region at a certain bin level and check if the region has reached the maximum size.
Parameters: - q_data – Quantized data array
- marked – Array marking points that are objects
- center – Coordinates of the center pixel of the region being grown
- bin_lower – Intensity level of lower bin being evaluated
- foothills – List of points that are associated with a center but fall outside the the size or intensity criteria
- capture_index –
Returns: True if the object is finished growing and False if the object should be grown again at the next threshold level.
-
static
size_filter(labeled_grid, min_size)¶ Removes labeled objects that are smaller than min_size, and relabels the remaining objects.
Parameters: - labeled_grid – Grid that has been labeled
- min_size – Minimium object size.
Returns: Labeled array with re-numbered objects to account for those that have been removed
-
-
hagelslag.processing.EnhancedWatershedSegmenter.rescale_data(data, data_min, data_max, out_min=0.0, out_max=100.0)¶ Rescale your input data so that is ranges over integer values, which will perform better in the watershed.
Parameters: - data – 2D or 3D ndarray being rescaled
- data_min – minimum value of input data for scaling purposes
- data_max – maximum value of input data for scaling purposes
- out_min – minimum value of scaled data
- out_max – maximum value of scaled data
Returns: Linearly scaled ndarray
hagelslag.processing.EnsembleProducts module¶
hagelslag.processing.Hysteresis module¶
-
class
hagelslag.processing.Hysteresis.Hysteresis(min_intensity, max_intensity)¶ Bases:
objectObject segmentation method that identifies objects as contiguous areas with all pixels above a low threshold and contain at least one pixel above a high threshold.
-
min_intensity¶ lower threshold value
-
max_intensity¶ higher threshold value
-
label(input_grid)¶ Label input grid with hysteresis method.
Parameters: input_grid – 2D array of values. Returns: Labeled output grid.
-
static
size_filter(labeled_grid, min_size)¶ Remove labeled objects that do not meet size threshold criteria.
Parameters: - labeled_grid – 2D output from label method.
- min_size – minimum size of object in pixels.
Returns: labeled grid with smaller objects removed.
-
hagelslag.processing.ObjectMatcher module¶
-
class
hagelslag.processing.ObjectMatcher.ObjectMatcher(cost_function_components, weights, max_values)¶ Bases:
objectObjectMatcher calculates distances between two sets of objects and determines the optimal object assignments based on the Hungarian object matching algorithm. ObjectMatcher supports the use of the weighted average of multiple cost functions to determine the distance between objects. Upper limits to each distance component are used to exclude the matching of objects that are too far apart.
-
cost_function_components¶ List of distance functions for matching
-
weights¶ List of weights for each distance function
-
max_values¶ List of the maximum allowable distance for each distance function component.
-
cost_matrix(set_a, set_b, time_a, time_b)¶ Calculates the costs (distances) between the items in set a and set b at the specified times.
Parameters: - set_a – List of STObjects
- set_b – List of STObjects
- time_a – time at which objects in set_a are evaluated
- time_b – time at whcih object in set_b are evaluated
Returns: A numpy array with shape [len(set_a), len(set_b)] containing the cost matrix between the items in set a and the items in set b.
-
match_objects(set_a, set_b, time_a, time_b)¶ Match two sets of objects at particular times.
Parameters: - set_a – list of STObjects
- set_b – list of STObjects
- time_a – time at which set_a is being evaluated for matching
- time_b – time at which set_b is being evaluated for matching
Returns: List of tuples containing (set_a index, set_b index) for each match
-
total_cost_function(item_a, item_b, time_a, time_b)¶ Calculate total cost function between two items.
Parameters: - item_a – STObject
- item_b – STObject
- time_a – Timestep in item_a at which cost function is evaluated
- time_b – Timestep in item_b at which cost function is evaluated
Returns: The total weighted distance between item_a and item_b
-
-
class
hagelslag.processing.ObjectMatcher.TrackMatcher(cost_function_components, weights, max_values)¶ Bases:
objectFind the optimal pairings among two sets of STObject tracks.
-
cost_function_components¶ Array of cost function objects
-
weights¶ Array of weights for each cost function. All should sum to 1.
-
max_values¶ Array of distance values that correspond to the upper limit distance that should be considered.
-
match_tracks(set_a, set_b, closest_matches=False)¶ Find the optimal set of matching assignments between set a and set b. This function supports optimal 1:1 matching using the Munkres method and matching from every object in set a to the closest object in set b. In this situation set b accepts multiple matches from set a.
Parameters: - set_a –
- set_b –
- closest_matches –
Returns:
-
neighbor_matches(set_a, set_b)¶
-
raw_cost_matrix(set_a, set_b)¶
-
track_cost_function(item_a, item_b)¶
-
track_cost_matrix(set_a, set_b)¶
-
-
class
hagelslag.processing.ObjectMatcher.TrackStepMatcher(cost_function_components, max_values)¶ Bases:
objectDetermine if each step in a track is in close proximity to steps from another set of tracks
-
cost(track_a, time_a, track_b, time_b)¶
-
cost_matrix(set_a, set_b)¶
-
match(set_a, set_b)¶ For each step in each track from set_a, identify all steps in all tracks from set_b that meet all cost function criteria
Parameters: - set_a – List of STObjects
- set_b – List of STObjects
Returns: pandas.DataFrame
Return type: track_pairings
-
-
hagelslag.processing.ObjectMatcher.area_difference(item_a, time_a, item_b, time_b, max_value)¶ RMS Difference in object areas.
Parameters: - item_a – STObject from the first set in ObjectMatcher
- time_a – Time integer being evaluated
- item_b – STObject from the second set in ObjectMatcher
- time_b – Time integer being evaluated
- max_value – Maximum distance value used as scaling value and upper constraint.
Returns: Distance value between 0 and 1.
-
hagelslag.processing.ObjectMatcher.centroid_distance(item_a, time_a, item_b, time_b, max_value)¶ Euclidean distance between the centroids of item_a and item_b.
Parameters: - item_a – STObject from the first set in ObjectMatcher
- time_a – Time integer being evaluated
- item_b – STObject from the second set in ObjectMatcher
- time_b – Time integer being evaluated
- max_value – Maximum distance value used as scaling value and upper constraint.
Returns: Distance value between 0 and 1.
-
hagelslag.processing.ObjectMatcher.closest_distance(item_a, time_a, item_b, time_b, max_value)¶ Euclidean distance between the pixels in item_a and item_b closest to each other.
Parameters: - item_a – STObject from the first set in ObjectMatcher
- time_a – Time integer being evaluated
- item_b – STObject from the second set in ObjectMatcher
- time_b – Time integer being evaluated
- max_value – Maximum distance value used as scaling value and upper constraint.
Returns: Distance value between 0 and 1.
-
hagelslag.processing.ObjectMatcher.duration_distance(item_a, item_b, max_value)¶ Absolute difference in the duration of two items
Parameters: - item_a – STObject from the first set in TrackMatcher
- item_b – STObject from the second set in TrackMatcher
- max_value – Maximum distance value used as scaling value and upper constraint.
Returns: Distance value between 0 and 1.
-
hagelslag.processing.ObjectMatcher.ellipse_distance(item_a, time_a, item_b, time_b, max_value)¶ Calculate differences in the properties of ellipses fitted to each object.
Parameters: - item_a – STObject from the first set in ObjectMatcher
- time_a – Time integer being evaluated
- item_b – STObject from the second set in ObjectMatcher
- time_b – Time integer being evaluated
- max_value – Maximum distance value used as scaling value and upper constraint.
Returns: Distance value between 0 and 1.
-
hagelslag.processing.ObjectMatcher.max_intensity(item_a, time_a, item_b, time_b, max_value)¶ RMS difference in maximum intensity
Parameters: - item_a – STObject from the first set in ObjectMatcher
- time_a – Time integer being evaluated
- item_b – STObject from the second set in ObjectMatcher
- time_b – Time integer being evaluated
- max_value – Maximum distance value used as scaling value and upper constraint.
Returns: Distance value between 0 and 1.
-
hagelslag.processing.ObjectMatcher.mean_area_distance(item_a, item_b, max_value)¶ Absolute difference in the means of the areas of each track over time.
Parameters: - item_a – STObject from the first set in TrackMatcher
- item_b – STObject from the second set in TrackMatcher
- max_value – Maximum distance value used as scaling value and upper constraint.
Returns: Distance value between 0 and 1.
-
hagelslag.processing.ObjectMatcher.mean_min_time_distance(item_a, item_b, max_value)¶ Calculate the mean time difference among the time steps in each object.
Parameters: - item_a – STObject from the first set in TrackMatcher
- item_b – STObject from the second set in TrackMatcher
- max_value – Maximum distance value used as scaling value and upper constraint.
Returns: Distance value between 0 and 1.
-
hagelslag.processing.ObjectMatcher.mean_minimum_centroid_distance(item_a, item_b, max_value)¶ RMS difference in the minimum distances from the centroids of one track to the centroids of another track
Parameters: - item_a – STObject from the first set in TrackMatcher
- item_b – STObject from the second set in TrackMatcher
- max_value – Maximum distance value used as scaling value and upper constraint.
Returns: Distance value between 0 and 1.
-
hagelslag.processing.ObjectMatcher.nonoverlap(item_a, time_a, item_b, time_b, max_value)¶ Percentage of pixels in each object that do not overlap with the other object
Parameters: - item_a – STObject from the first set in ObjectMatcher
- time_a – Time integer being evaluated
- item_b – STObject from the second set in ObjectMatcher
- time_b – Time integer being evaluated
- max_value – Maximum distance value used as scaling value and upper constraint.
Returns: Distance value between 0 and 1.
-
hagelslag.processing.ObjectMatcher.shifted_centroid_distance(item_a, time_a, item_b, time_b, max_value)¶ Centroid distance with motion corrections.
Parameters: - item_a – STObject from the first set in ObjectMatcher
- time_a – Time integer being evaluated
- item_b – STObject from the second set in ObjectMatcher
- time_b – Time integer being evaluated
- max_value – Maximum distance value used as scaling value and upper constraint.
Returns: Distance value between 0 and 1.
-
hagelslag.processing.ObjectMatcher.start_centroid_distance(item_a, item_b, max_value)¶ Distance between the centroids of the first step in each object.
Parameters: - item_a – STObject from the first set in TrackMatcher
- item_b – STObject from the second set in TrackMatcher
- max_value – Maximum distance value used as scaling value and upper constraint.
Returns: Distance value between 0 and 1.
-
hagelslag.processing.ObjectMatcher.start_time_distance(item_a, item_b, max_value)¶ Absolute difference between the starting times of each item.
Parameters: - item_a – STObject from the first set in TrackMatcher
- item_b – STObject from the second set in TrackMatcher
- max_value – Maximum distance value used as scaling value and upper constraint.
Returns: Distance value between 0 and 1.
-
hagelslag.processing.ObjectMatcher.time_distance(item_a, time_a, item_b, time_b, max_value)¶
hagelslag.processing.STObject module¶
-
class
hagelslag.processing.STObject.STObject(grid, mask, x, y, i, j, start_time, end_time, step=1, dx=3000, u=None, v=None)¶ Bases:
objectThe STObject stores data and location information for objects extracted from the ensemble grids.
-
grid¶ All of the data values. Supports a 2D array of values, a list of 2D arrays, or a 3D array.
Type: ndarray
-
mask¶ Grid of 1’s and 0’s in which 1’s indicate the location of the object.
Type: ndarray
-
x¶ Array of x-coordinate values in meters. Longitudes can also be placed here.
Type: ndarray
-
y¶ Array of y-coordinate values in meters. Latitudes can also be placed here.
Type: ndarray
-
i¶ Array of row indices from the full model domain.
Type: ndarray
-
j¶ Array of column indices from the full model domain.
Type: ndarray
-
start_time¶ The first time of the object existence.
-
end_time¶ The last time of the object existence.
-
step¶ number of hours between timesteps
-
dx¶ grid spacing
-
u¶ storm motion in x-direction
-
v¶ storm motion in y-direction
-
boundary_contour(time)¶ Calculate the contour around the edge of the binary mask for the object. For objects with interior holes or multiple connections, binary dilation, hole filling, and erosion are used to generate a single edge contour instead of multiple contours.
Parameters: time – Returns: array of shape (2, number of contour points) containing the x and y coordinates of the object edge.
-
boundary_polygon(time)¶ Get coordinates of object boundary in counter-clockwise order based on the convex hull of the object. For non-convex objects, the convex hull will not be representative of the object shape and boundary_contour should be used instead.
-
calc_attribute_statistic(attribute, statistic, time)¶ Calculate statistics based on the values of an attribute. The following statistics are supported: mean, max, min, std, ptp (range), median, skew (mean - median), and percentile_(percentile value).
Parameters: - attribute – Attribute extracted from model grid
- statistic – Name of statistic being used.
- time – timestep of the object being investigated
Returns: The value of the statistic
-
calc_attribute_statistics(statistic_name)¶ Calculates summary statistics over the domains of each attribute.
Parameters: statistic_name (string) – numpy statistic, such as mean, std, max, min Returns: dict of statistics from each attribute grid.
-
calc_shape_statistics(stat_names)¶ Calculate shape statistics using regionprops applied to the object mask.
Parameters: stat_names – List of statistics to be extracted from those calculated by regionprops. Returns: Dictionary of shape statistics
-
calc_shape_step(stat_names, time)¶ Calculate shape statistics for a single time step
Parameters: - stat_names – List of shape statistics calculated from region props
- time – Time being investigated
Returns: List of shape statistics
-
calc_timestep_statistic(statistic, time)¶ Calculate statistics from the primary attribute of the StObject.
Parameters: - statistic – statistic being calculated
- time – Timestep being investigated
Returns: Value of the statistic
-
center_of_mass(time)¶ Calculate the center of mass at a given timestep.
Parameters: time – Time at which the center of mass calculation is performed Returns: The x- and y-coordinates of the center of mass.
-
center_of_mass_ij(time)¶ Calculate the center of mass in terms of row and column coordinates at a given timestep.
Parameters: time – Time at which the center of mass calculation is performed Returns: The x- and y-coordinates of the center of mass.
-
closest_distance(time, other_object, other_time)¶ The shortest distance between two objects at specified times.
Parameters: - time (int or datetime) – Valid time for this STObject
- other_object – Another STObject being compared
- other_time – The time within the other STObject being evaluated.
Returns: Distance in units of the x-y coordinates
-
count_overlap(time, other_object, other_time)¶ Counts the number of points that overlap between this STObject and another STObject. Used for tracking.
-
estimate_motion(time, intensity_grid, max_u, max_v)¶ Estimate the motion of the object with cross-correlation on the intensity values from the previous time step.
Parameters: - time – time being evaluated.
- intensity_grid – 2D array of intensities used in cross correlation.
- max_u – Maximum x-component of motion. Used to limit search area.
- max_v – Maximum y-component of motion. Used to limit search area
Returns: u, v, and the minimum error.
-
extend(step)¶ Adds the data from another STObject to this object.
Parameters: step – another STObject being added after the current one in time.
-
extract_attribute_array(data_array, var_name)¶ Extracts data from a 2D array that has the same dimensions as the grid used to identify the object.
Parameters: data_array – 2D numpy array
-
extract_attribute_grid(model_grid, potential=False, future=False)¶ Extracts the data from a ModelOutput or ModelGrid object within the bounding box region of the STObject.
Parameters: - model_grid – A ModelGrid or ModelOutput Object
- potential – Extracts from the time before instead of the same time as the object
-
extract_patch(patch_radius, full_x, full_y, full_i, full_j)¶ Extract patch of uniform radius from existing STObject. This is intended for extracting patches from STObjects that are built around the bounding box of the original object. Areas outside the object are padded with zeros.
Parameters: - patch_radius (int) – radius of patch in pixels
- full_x – full x grid encompassing the original field from which the objects have been extracted.
- full_y – full y grid
- full_i – full i (row) grid
- full_j – full j (row) grid
Returns: new STObject containing the a patched slice of the original STObject.
-
extract_tendency_grid(model_grid)¶ Extracts the difference in model outputs
Parameters: model_grid – ModelOutput or ModelGrid object.
-
get_corner(time)¶ Gets the corner array indices of the STObject at a given time that corresponds to the upper left corner of the bounding box for the STObject.
Parameters: time – time at which the corner is being extracted. Returns: corner index.
-
max_intensity(time)¶ Calculate the maximum intensity found at a timestep.
-
max_size()¶ Gets the largest size of the object over all timesteps.
Returns: Maximum size of the object in pixels
-
percentile_distance(time, other_object, other_time, percentile)¶
-
size(time)¶ Gets the size of the object at a given time.
Parameters: time – Time value being queried. Returns: size of the object in pixels
-
to_geojson(filename, proj, metadata=None)¶ Output the data in the STObject to a geoJSON file.
Parameters: - filename – Name of the file
- proj – PyProj object for converting the x and y coordinates back to latitude and longitue values.
- metadata – Metadata describing the object to be included in the top-level properties.
-
to_geojson_feature(proj, output_grids=False)¶ Output the data in the STObject to a geoJSON file.
Parameters: - proj – PyProj object for converting the x and y coordinates back to latitude and longitude values.
- output_grids – Whether or not to output the primary gridded fields to the geojson file.
-
trajectory()¶ Calculates the center of mass for each time step and outputs an array
Returns:
-
-
hagelslag.processing.STObject.read_geojson(filename)¶ Reads a geojson file containing an STObject and initializes a new STObject from the information in the file.
Parameters: filename – Name of the geojson file Returns: an STObject
hagelslag.processing.TrackModeler module¶
-
class
hagelslag.processing.TrackModeler.TrackModeler(ensemble_name, train_data_path, forecast_data_path, member_files, start_dates, end_dates, weighting_function, map_file, group_col='Microphysics')¶ Bases:
objectTrackModeler is designed to load and process data generated by TrackProcessing and then use that data to fit machine learning models to predict whether or not hail will occur, hail size, and translation errors in time and space.
-
calc_copulas(output_file, model_names=('start-time', 'translation-x', 'translation-y'), label_columns=('Start_Time_Error', 'Translation_Error_X', 'Translation_Error_Y'))¶ Calculate a copula multivariate normal distribution from the training data for each group of ensemble members. Distributions are written to a pickle file for later use.
Parameters: - output_file – Pickle file
- model_names – Names of the tracking models
- label_columns – Names of the data columns used for labeling
Returns:
-
fit_condition_models(model_names, model_objs, input_columns, output_column='Matched', output_threshold=0.0)¶ Fit machine learning models to predict whether or not hail will occur.
Parameters: - model_names – List of strings with the names for the particular machine learning models
- model_objs – scikit-learn style machine learning model objects.
- input_columns – list of the names of the columns used as input for the machine learning model
- output_column – name of the column used for labeling whether or not the event occurs
- output_threshold – splitting threshold to determine if event has occurred. Default 0.0
-
fit_condition_threshold_models(model_names, model_objs, input_columns, output_column='Matched', output_threshold=0.5, num_folds=5, threshold_score='ets')¶ Fit models to predict hail/no-hail and use cross-validation to determine the probaility threshold that maximizes a skill score.
Parameters: - model_names – List of machine learning model names
- model_objs – List of Scikit-learn ML models
- input_columns – List of input variables in the training data
- output_column – Column used for prediction
- output_threshold – Values exceeding this threshold are considered positive events; below are nulls
- num_folds – Number of folds in the cross-validation procedure
- threshold_score – Score available in ContingencyTable used for determining the best probability threshold
Returns: None
-
fit_size_distribution_component_models(model_names, model_objs, input_columns, output_columns)¶ This calculates 2 principal components for the hail size distribution between the shape and scale parameters. Separate machine learning models are fit to predict each component.
Parameters: - model_names – List of machine learning model names
- model_objs – List of machine learning model objects.
- input_columns – List of input variables
- output_columns – Output columns, should contain Shape and Scale.
Returns:
-
fit_size_distribution_models(model_names, model_objs, input_columns, output_columns=None, calibrate=False)¶ Fits multitask machine learning models to predict the parameters of a size distribution
Parameters: - model_names – List of machine learning model names
- model_objs – scikit-learn style machine learning model objects
- input_columns – Training data columns used as input for ML model
- output_columns – Training data columns used for prediction
- calibrate – Whether or not to fit a log-linear regression to predictions from ML model
-
fit_size_models(model_names, model_objs, input_columns, output_column='Hail_Size', output_start=5, output_step=5, output_stop=100)¶ Fit size models to produce discrete pdfs of forecast hail sizes. :param model_names: List of model names :param model_objs: List of model objects :param input_columns: List of input variables :param output_column: Output variable name :param output_start: Hail size bin start :param output_step: hail size bin step :param output_stop: hail size bin stop
-
fit_track_models(model_names, model_objs, input_columns, output_columns, output_ranges)¶ - Fit machine learning models to predict track error offsets.
- model_names: model_objs: input_columns: output_columns: output_ranges:
-
load_data(mode='train', format='csv')¶ Load data from flat data files containing total track information and information about each timestep. The two sets are combined using merge operations on the Track IDs. Additional member information is gathered from the appropriate member file.
Parameters: - mode – “train” or “forecast”
- format – file format being used. Default is “csv”
-
load_models(model_path)¶ Load models from pickle files. Note that models should be loaded with the same version of sklearn that they were saved with.
Parameters: model_path – Path to model pickle files.
-
output_forecasts_csv(forecasts, mode, csv_path, run_date_format='%Y%m%d-%H%M')¶ Output hail forecast values to csv files by run date and ensemble member.
Parameters: - forecasts (dict) – Dictionary of DataFrames with forecast values
- mode (str) – either “train” or “forecast”
- csv_path (str) – Path where csv forecast files are saved
-
output_forecasts_json(forecasts, condition_model_names, size_model_names, dist_model_names, track_model_names, json_data_path, out_path)¶ Output forecasts to geoJSON format
Parameters: - forecasts (dict) – Dictionary of DataFrames with condition and size distribution forecasts
- condition_model_names (list) – Names of all the condition ML models
- size_model_names (list) – Names of all the size ML models
- dist_model_names (list) – Names of all the size distribution ML models
- track_model_names (list) – Names of models that predict the track offset (no longer used)
- json_data_path (str) – Path to geoJSON storm files
- out_path (str) – Path to where geoJSON forecast files are output
Returns:
-
output_forecasts_json_parallel(forecasts, condition_model_names, dist_model_names, json_data_path, out_path, num_procs)¶
-
predict_condition_models(model_names, input_columns, metadata_cols, data_mode='forecast')¶ Apply condition models to forecast data.
Parameters: - model_names – List of names associated with each condition model used for prediction
- input_columns – List of columns in data used as input into the model
- metadata_cols – Columns from input data that should be included in the data frame with the predictions.
- data_mode – Which data subset to pull from for the predictions, “forecast” by default
Returns: A dictionary of data frames containing probabilities of the event and specified metadata
-
predict_size_distribution_component_models(model_names, input_columns, output_columns, metadata_cols, data_mode='forecast', location=6)¶ Make predictions using fitted size distribution component models. PCA is used transform the shape and scale parameters so that their correlation is removed.
Parameters: - model_names – Name of the models for predictions
- input_columns – Data columns used for input into ML models
- output_columns – Names of output columns
- metadata_cols – Columns from input data that should be included in the data frame with the predictions.
- data_mode – Set of data used as input for prediction models
- location – Value of fixed location parameter
Returns: Predictions in dictionary of data frames grouped by group type
-
predict_size_distribution_models(model_names, input_columns, metadata_cols, data_mode='forecast', location=6, calibrate=False)¶ Make predictions using fitted size distribution models. Each ML model predicts the normalized shape and scale parameters simultaneously using multitask learning. Only scikit learn models that support multi task learning can be used.
Parameters: - model_names – Name of the models for predictions
- input_columns – Data columns used for input into ML models
- metadata_cols – Columns from input data that should be included in the data frame with the predictions.
- data_mode – Set of data used as input for prediction models
- location – Value of fixed location parameter
- calibrate – Whether or not to apply calibration model
Returns: Predictions in dictionary of data frames grouped by group type
-
predict_size_models(model_names, input_columns, metadata_cols, data_mode='forecast')¶ Apply size models to forecast data. :param model_names: :param input_columns: :param metadata_cols: :param data_mode:
-
predict_track_models(model_names, input_columns, metadata_cols, data_mode='forecast')¶ Predict track offsets on forecast data. :param model_names: List of machine learning model names :param input_columns: List of input columns :param metadata_cols: List of metadata columns to include with predictions :param data_mode: train or forecast
Returns:
-
save_models(model_path)¶ Save machine learning models to pickle files.
-
-
hagelslag.processing.TrackModeler.output_forecast(step_forecasts, run_date, ensemble_name, member, track_num, json_data_path, out_path)¶
hagelslag.processing.TrackProcessing module¶
hagelslag.processing.TrackSampler module¶
-
class
hagelslag.processing.TrackSampler.TrackSampler(member, group, run_date, model_names, start_hour, end_hour, grid_shape, dx, track_path, num_samples, copula_file=None)¶ Bases:
objectMonte Carlo sampler of forecast storm tracks.
-
generate_copula_ranks()¶
-
load_track_forecasts()¶
-
output_track_probs(track_probs, path)¶
-
sample_condition()¶
-
sample_size(size_values=<Mock name='mock()' id='140392962338000'>)¶
-
sample_start_time(start_time_values=<Mock name='mock()' id='140392959241616'>)¶
-
sample_tracks(size_ranges, track_ranges, thresholds=<Mock name='mock()' id='140392957459408'>, dilation=13)¶
-
sample_translation_x(translation_x_values=<Mock name='mock()' id='140392987803984'>)¶
-
sample_translation_y(translation_y_values=<Mock name='mock()' id='140392959370768'>)¶
-
-
hagelslag.processing.TrackSampler.load_grid_info(grid_file)¶
-
hagelslag.processing.TrackSampler.main()¶
-
hagelslag.processing.TrackSampler.sample_member_run_tracks(member, group, run_date, model_names, start_hour, end_hour, grid_shape, dx, track_path, num_samples, thresholds, copula_file, out_path, size_ranges, track_ranges)¶
hagelslag.processing.tracker module¶
-
hagelslag.processing.tracker.extract_storm_objects(label_grid, data, x_grid, y_grid, times, dx=1, dt=1, buffer_radius=0)¶ After storms are labeled, this method extracts the storm objects from the grid and places them into STObjects. The STObjects contain intensity, location, and shape information about each storm at each timestep.
Parameters: - label_grid – 2D or 3D array output by label_storm_objects.
- data – 2D or 3D array used as input to label_storm_objects.
- x_grid – 2D array of x-coordinate data, preferably on a uniform spatial grid with units of length.
- y_grid – 2D array of y-coordinate data.
- times – List or array of time values, preferably as integers
- dx – grid spacing in same units as x_grid and y_grid.
- dt – period elapsed between times
- buffer_radius – number of extra pixels beyond bounding box of object to store in each STObject
Returns: list of lists containing STObjects identified at each time.
Return type: storm_objects
-
hagelslag.processing.tracker.extract_storm_patches(label_grid, data, x_grid, y_grid, times, dx=1, dt=1, patch_radius=16)¶ After storms are labeled, this method extracts boxes of equal size centered on each storm from the grid and places them into STObjects. The STObjects contain intensity, location, and shape information about each storm at each timestep.
Parameters: - label_grid – 2D or 3D array output by label_storm_objects.
- data – 2D or 3D array used as input to label_storm_objects.
- x_grid – 2D array of x-coordinate data, preferably on a uniform spatial grid with units of length.
- y_grid – 2D array of y-coordinate data.
- times – List or array of time values, preferably as integers
- dx – grid spacing in same units as x_grid and y_grid.
- dt – period elapsed between times
- patch_radius – Number of grid points from center of mass to extract
Returns: list of lists containing STObjects identified at each time.
Return type: storm_objects
-
hagelslag.processing.tracker.label_storm_objects(data, method, min_intensity, max_intensity, min_area=1, max_area=100, max_range=1, increment=1, gaussian_sd=0)¶ From a 2D grid or time series of 2D grids, this method labels storm objects with either the Enhanced Watershed, Watershed, or Hysteresis methods.
Parameters: - data – the gridded data to be labeled. Should be a 2D numpy array in (y, x) coordinate order or a 3D numpy array in (time, y, x) coordinate order
- method – “ew” for Enhanced Watershed, “ws” for regular watershed, and “hyst” for hysteresis
- min_intensity – Minimum intensity threshold for gridpoints contained within any objects
- max_intensity – For watershed, any points above max_intensity are considered as the same value as max intensity. For hysteresis, all objects have to contain at least 1 pixel that equals or exceeds this value
- min_area – (default 1) The minimum area of any object in pixels.
- max_area – (default 100) The area threshold in pixels at which the enhanced watershed ends growth. Object area may exceed this threshold if the pixels at the last watershed level exceed the object area.
- max_range – Maximum difference in bins for search before growth is stopped.
- increment – Discretization increment for the enhanced watershed
- gaussian_sd – Standard deviation of Gaussian filter applied to data
Returns: an ndarray with the same shape as data in which each pixel is labeled with a positive integer value.
Return type: label_grid
-
hagelslag.processing.tracker.track_storms(storm_objects, times, distance_components, distance_maxima, distance_weights, tracked_objects=None)¶ Given the output of extract_storm_objects, this method tracks storms through time and merges individual STObjects into a set of tracks.
Parameters: - storm_objects – list of list of STObjects that have not been tracked.
- times – List of times associated with each set of STObjects
- distance_components – list of function objects that make up components of distance function
- distance_maxima – array of maximum values for each distance for normalization purposes
- distance_weights – weight given to each component of the distance function. Should add to 1.
- tracked_objects – List of STObjects that have already been tracked.
Returns: Return type: tracked_objects