hagelslag.util package¶
Submodules¶
hagelslag.util.Config module¶
-
class
hagelslag.util.Config.Config(filename, required_attributes=())¶ Bases:
objectClass that loads options from a config file and converts them into attributes.
hagelslag.util.convert_mrms_grids module¶
hagelslag.util.create_model_grid_us_mask module¶
-
hagelslag.util.create_model_grid_us_mask.create_map_grid(map_file)¶
-
hagelslag.util.create_model_grid_us_mask.create_mask_grid(mask_shape_file, mapping_data, proj_dict, grid_dict)¶
-
hagelslag.util.create_model_grid_us_mask.main()¶
-
hagelslag.util.create_model_grid_us_mask.output_netcdf_file(filename, mask_grid, proj_dict, grid_dict)¶
hagelslag.util.derived_vars module¶
-
hagelslag.util.derived_vars.melting_layer_height(height_surface, height_700mb, height_500mb, temperature_700mb, temperature_500mb)¶
-
hagelslag.util.derived_vars.relative_humidity_pressure_level(temperature, specific_humidity, pressure)¶
hagelslag.util.make_proj_grids module¶
-
hagelslag.util.make_proj_grids.get_proj_obj(proj_dict)¶
-
hagelslag.util.make_proj_grids.main()¶
-
hagelslag.util.make_proj_grids.make_proj_grids(proj_dict, grid_dict)¶
-
hagelslag.util.make_proj_grids.read_arps_map_file(map_filename)¶
-
hagelslag.util.make_proj_grids.read_ncar_map_file(map_filename)¶
hagelslag.util.merge_forecast_data module¶
-
hagelslag.util.merge_forecast_data.main()¶
-
hagelslag.util.merge_forecast_data.merge_input_csv_forecast_json(input_csv_file, forecast_json_path, condition_models, dist_models)¶ Reads forecasts from json files and merges them with the input data from the step csv files.
Parameters: - input_csv_file – Name of the input data csv file being processed
- forecast_json_path – Path to the forecast json files toplevel directory
- condition_models – List of models used to forecast hail or no hail
- dist_models – List of models used to forecast the hail size distribution
Returns:
hagelslag.util.munkres module¶
Introduction¶
The Munkres module provides an implementation of the Munkres algorithm (also called the Hungarian algorithm or the Kuhn-Munkres algorithm), useful for solving the Assignment Problem.
Assignment Problem¶
Let C be an nxn matrix representing the costs of each of n workers to perform any of n jobs. The assignment problem is to assign jobs to workers in a way that minimizes the total cost. Since each worker can perform only one job and each job can be assigned to only one worker the assignments represent an independent set of the matrix C.
One way to generate the optimal set is to create all permutations of the indexes necessary to traverse the matrix so that no row and column are used more than once. For instance, given this matrix (expressed in Python):
matrix = [[5, 9, 1],
[10, 3, 2],
[8, 7, 4]]
You could use this code to generate the traversal indexes:
def permute(a, results):
if len(a) == 1:
results.insert(len(results), a)
else:
for i in range(0, len(a)):
element = a[i]
a_copy = [a[j] for j in range(0, len(a)) if j != i]
subresults = []
permute(a_copy, subresults)
for subresult in subresults:
result = [element] + subresult
results.insert(len(results), result)
results = []
permute(range(len(matrix)), results) # [0, 1, 2] for a 3x3 matrix
After the call to permute(), the results matrix would look like this:
[[0, 1, 2],
[0, 2, 1],
[1, 0, 2],
[1, 2, 0],
[2, 0, 1],
[2, 1, 0]]
You could then use that index matrix to loop over the original cost matrix and calculate the smallest cost of the combinations:
n = len(matrix)
minval = sys.maxsize
for row in range(n):
cost = 0
for col in range(n):
cost += matrix[row][col]
minval = min(cost, minval)
print minval
While this approach works fine for small matrices, it does not scale. It executes in O(n!) time: Calculating the permutations for an nxn matrix requires n! operations. For a 12x12 matrix, that’s 479,001,600 traversals. Even if you could manage to perform each traversal in just one millisecond, it would still take more than 133 hours to perform the entire traversal. A 20x20 matrix would take 2,432,902,008,176,640,000 operations. At an optimistic millisecond per operation, that’s more than 77 million years.
The Munkres algorithm runs in O(n^3) time, rather than O(n!). This package provides an implementation of that algorithm.
This version is based on http://www.public.iastate.edu/~ddoty/HungarianAlgorithm.html.
This version was written for Python by Brian Clapper from the (Ada) algorithm
at the above web site. (The Algorithm::Munkres Perl version, in CPAN, was
clearly adapted from the same web site.)
Usage¶
Construct a Munkres object:
from munkres import Munkres
m = Munkres()
Then use it to compute the lowest cost assignment from a cost matrix. Here’s a sample program:
from munkres import Munkres, print_matrix
matrix = [[5, 9, 1],
[10, 3, 2],
[8, 7, 4]]
m = Munkres()
indexes = m.compute(matrix)
print_matrix(matrix, msg='Lowest cost through this matrix:')
total = 0
for row, column in indexes:
value = matrix[row][column]
total += value
print '(%d, %d) -> %d' % (row, column, value)
print 'total cost: %d' % total
Running that program produces:
Lowest cost through this matrix:
[5, 9, 1]
[10, 3, 2]
[8, 7, 4]
(0, 0) -> 5
(1, 1) -> 3
(2, 2) -> 4
total cost=12
The instantiated Munkres object can be used multiple times on different matrices.
Non-square Cost Matrices¶
The Munkres algorithm assumes that the cost matrix is square. However, it’s possible to use a rectangular matrix if you first pad it with 0 values to make it square. This module automatically pads rectangular cost matrices to make them square.
Notes:
- The module operates on a copy of the caller’s matrix, so any padding will not be seen by the caller.
- The cost matrix must be rectangular or square. An irregular matrix will not work.
Calculating Profit, Rather than Cost¶
The cost matrix is just that: A cost matrix. The Munkres algorithm finds the combination of elements (one from each row and column) that results in the smallest cost. It’s also possible to use the algorithm to maximize profit. To do that, however, you have to convert your profit matrix to a cost matrix. The simplest way to do that is to subtract all elements from a large value. For example:
from munkres import Munkres, print_matrix
matrix = [[5, 9, 1],
[10, 3, 2],
[8, 7, 4]]
cost_matrix = []
for row in matrix:
cost_row = []
for col in row:
cost_row += [sys.maxsize - col]
cost_matrix += [cost_row]
m = Munkres()
indexes = m.compute(cost_matrix)
print_matrix(matrix, msg='Highest profit through this matrix:')
total = 0
for row, column in indexes:
value = matrix[row][column]
total += value
print '(%d, %d) -> %d' % (row, column, value)
print 'total profit=%d' % total
Running that program produces:
Highest profit through this matrix:
[5, 9, 1]
[10, 3, 2]
[8, 7, 4]
(0, 1) -> 9
(1, 0) -> 10
(2, 2) -> 4
total profit=23
The munkres module provides a convenience method for creating a cost
matrix from a profit matrix. Since it doesn’t know whether the matrix contains
floating point numbers, decimals, or integers, you have to provide the
conversion function; but the convenience method takes care of the actual
creation of the cost matrix:
import munkres
cost_matrix = munkres.make_cost_matrix(matrix,
lambda cost: sys.maxsize - cost)
So, the above profit-calculation program can be recast as:
from munkres import Munkres, print_matrix, make_cost_matrix
matrix = [[5, 9, 1],
[10, 3, 2],
[8, 7, 4]]
cost_matrix = make_cost_matrix(matrix, lambda cost: sys.maxsize - cost)
m = Munkres()
indexes = m.compute(cost_matrix)
print_matrix(matrix, msg='Lowest cost through this matrix:')
total = 0
for row, column in indexes:
value = matrix[row][column]
total += value
print '(%d, %d) -> %d' % (row, column, value)
print 'total profit=%d' % total
References
- http://www.public.iastate.edu/~ddoty/HungarianAlgorithm.html
- Harold W. Kuhn. The Hungarian Method for the assignment problem. Naval Research Logistics Quarterly, 2:83-97, 1955.
- Harold W. Kuhn. Variants of the Hungarian method for assignment problems. Naval Research Logistics Quarterly, 3: 253-258, 1956.
- Munkres, J. Algorithms for the Assignment and Transportation Problems. Journal of the Society of Industrial and Applied Mathematics, 5(1):32-38, March, 1957.
- http://en.wikipedia.org/wiki/Hungarian_algorithm
Copyright and License¶
This software is released under a BSD license, adapted from <http://opensource.org/licenses/bsd-license.php>
Copyright (c) 2008 Brian M. Clapper All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
- Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
- Neither the name “clapper.org” nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
class
hagelslag.util.munkres.Munkres¶ Bases:
objectCalculate the Munkres solution to the classical assignment problem. See the module documentation for usage.
-
compute(cost_matrix)¶ Compute the indexes for the lowest-cost pairings between rows and columns in the database. Returns a list of (row, column) tuples that can be used to traverse the matrix.
Parameters: - cost_matrix : list of lists
The cost matrix. If this cost matrix is not square, it will be padded with zeros, via a call to
pad_matrix(). (This method does not modify the caller’s matrix. It operates on a copy of the matrix.)WARNING: This code handles square and rectangular matrices. It does not handle irregular matrices.
Return type: list
Returns: A list of
(row, column)tuples that describe the lowest cost path through the matrix
-
static
make_cost_matrix(profit_matrix, inversion_function)¶ DEPRECATED
Please use the module function
make_cost_matrix().
-
pad_matrix(matrix, pad_value=0)¶ Pad a possibly non-square matrix to make it square.
Parameters: - matrix : list of lists
matrix to pad
- pad_value : int
value to use to pad the matrix
Return type: list of lists
Returns: a new, possibly padded, matrix
-
-
hagelslag.util.munkres.make_cost_matrix(profit_matrix, inversion_function)¶ Create a cost matrix from a profit matrix by calling ‘inversion_function’ to invert each value. The inversion function must take one numeric argument (of any type) and return another numeric argument which is presumed to be the cost inverse of the original profit.
This is a static method. Call it like this:
For example:
Parameters: - profit_matrix : list of lists
The matrix to convert from a profit to a cost matrix
- inversion_function : function
The function to use to invert each entry in the profit matrix
Return type: list of lists
Returns: The converted matrix
hagelslag.util.output_tree_ensembles module¶
Read a scikit-learn tree ensemble object and output the object into a human-readable text format.
-
hagelslag.util.output_tree_ensembles.load_tree_object(filename)¶ Load scikit-learn decision tree ensemble object from file.
Parameters: filename (str) – Name of the pickle file containing the tree object. Returns: Return type: tree ensemble object
-
hagelslag.util.output_tree_ensembles.main()¶
-
hagelslag.util.output_tree_ensembles.output_tree_ensemble(tree_ensemble_obj, output_filename, attribute_names=None)¶ Write each decision tree in an ensemble to a file.
Parameters: - tree_ensemble_obj (sklearn.ensemble object) – Random Forest or Gradient Boosted Regression object
- output_filename (str) – File where trees are written
- attribute_names (list) – List of attribute names to be used in place of indices if available.
-
hagelslag.util.output_tree_ensembles.print_tree_recursive(tree_obj, node_index, attribute_names=None)¶ Recursively writes a string representation of a decision tree object.
Parameters: - tree_obj (sklearn.tree._tree.Tree object) – A base decision tree object
- node_index (int) – Index of the node being printed
- attribute_names (list) – List of attribute names
Returns: tree_str – String representation of decision tree in the same format as the parf library.
Return type: str
hagelslag.util.test_size_distributions module¶
-
hagelslag.util.test_size_distributions.main()¶
-
hagelslag.util.test_size_distributions.run_kstests(json_path, run_date, member)¶