impactlab_tools.utils package¶
Submodules¶
impactlab_tools.utils.binning module¶
- impactlab_tools.utils.binning.binned_statistic_1d(da, dim, bins=10, statistic='count', value_range=None)[source]¶
Bin a data array by values and summarize along a dimension
- Parameters:
da (xr.DataArray) – DataArray to be binned
dim (str) – Dimension along which to summarize the binned values
statistic (string or callable, optional) –
The statistic to compute (default is ‘count’). The following statistics are available:
- ’mean’compute the mean of values for points within each bin. Empty
bins will be represented by NaN.
- ’median’compute the median of values for points within each bin.
Empty bins will be represented by NaN.
- ’count’compute the count of points within each bin. This is
identical to an unweighted histogram. values array is not referenced.
- ’sum’compute the sum of values for points within each bin. This is
identical to a weighted histogram.
- ’min’compute the minimum of values for points within each bin.
Empty bins will be represented by NaN.
- ’max’compute the maximum of values for point within each bin.
Empty bins will be represented by NaN.
- functiona user-defined function which takes a 1D array of values,
and outputs a single numerical statistic. This function will be called on the values in each bin. Empty bins will be represented by function([]), or NaN if this returns an error.
bins (int or sequence of scalars, optional) – If bins is an int, it defines the number of equal-width bins in the given range (10 by default). If bins is a sequence, it defines the bin edges, including the rightmost edge, allowing for non-uniform bin widths. Values in x that are smaller than lowest bin edge are assigned to bin number 0, values beyond the highest bin are assigned to bins[-1]. If the bin edges are specified, the number of bins will be, (nx = len(bins)-1).
- value_range(float, float) or [(float, float)], optional
The lower and upper range of the bins. If not provided, value_range is simply (x.min(), x.max()). Values outside the range are ignored.
- Returns:
binned – A data array with bins along the summary dimension
- Return type:
xr.DataArray
Examples
>>> da = xr.DataArray( ... np.arange(16).reshape(4,4), ... dims=('a', 'b'), ... coords={'a': list('abcd'), 'b': list('wxyz')}) ... >>> da <xarray.DataArray (a: 4, b: 4)> array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [ 12, 13, 14, 15]]) Coordinates: * a (a) <U1 'a' 'b' 'c' 'd' * b (b) <U1 'w' 'x' 'y' 'z' >>> binned_statistic_1d( ... da, 'b', [0, 2, 5, 20]) ... <xarray.DataArray (a: 4, goh realroups: 3)> array([[ 2., 2., 0.], [ 0., 1., 3.], [ 0., 0., 4.], [ 0., 0., 4.]]) Coordinates: * a (a) <U1 'a' 'b' 'c' 'd' * groups (groups) object '(0, 2]' '(2, 5]' '(5, 20]' >>> binned_statistic_1d(da, 'a', statistic='sum') <xarray.DataArray (groups: 10, b: 4)> array([[ 0., 1., 2., 3.], [ 0., 0., 0., 0.], [ 0., 0., 0., 0.], [ 4., 5., 6., 7.], [ 0., 0., 0., 0.], [ 0., 0., 0., 0.], [ 8., 9., 10., 11.], [ 0., 0., 0., 0.], [ 0., 0., 0., 0.], [ 12., 13., 14., 15.]]) Coordinates: * groups (groups) object '(0.0, 1.5]' '(1.5, 3.0]' '(3.0, 4.5]' ... * b (b) <U1 'w' 'x' 'y' 'z'
impactlab_tools.utils.files module¶
Utilities for path handling
Provides server-specific paths, configured in a server configuration file.
- impactlab_tools.utils.files.configpath(path)[source]¶
Return an configured absolute path. If the path is absolute, it will be left alone; otherwise, it is assumed to be a subpath of the configured shareddir.
- impactlab_tools.utils.files.get_allargv_config()[source]¶
Load a configuration from the command line, merging all arguments into a single configuration dictionary.
Handles the following kinds of command-line arguments:
*.yml: a full configuration YaML file, merged into the result dictionary
–config=VALUE: a full configuration YaML file, as above
–KEY=VALUE: a single configuration value to be set
anything else: sets an entry in the config file for that anything to have a value of True.
Later command-line arguments always overide earlier ones.
- impactlab_tools.utils.files.get_argv_config(index=1)[source]¶
Load a configuration file specified as the index argv argument.
In the future, this should also load specific configurable options from the command-line.
- impactlab_tools.utils.files.get_file_config(filepath)[source]¶
Load a configuration file from a given path.
Return a subpath of the configured shareddir
shareddir
is path to the root directory containing the support/data files needed to run impact projections.shareddir
is found first by looking for theIMPERICS_SHAREDDIR
shell/environment variable. If this is not defined, it looks for a “shareddir” entry in a “../server.yml” file.- Parameters:
subpath (str) – Subdirectory path joined onto
shareddir
.
impactlab_tools.utils.configdict module¶
Class for representing tool configuration files
- class impactlab_tools.utils.configdict.ConfigDict(*args, **kwargs)[source]¶
Bases:
UserDict
Chain-able dictionary to hold projection configurations.
A ConfigDict is a dictionary-like interface to a chainmap/linked list. Nested dicts can be access like a traditional dictionary but it searches parent dictionaries for keys:values not found. All string keys normalized, by transforming all characters to lowercase, and all underscores to hyphens.
- parent¶
Parent ConfigDict object to query for keys if not in
self.data
.- Type:
ConfigDict or None
- key_access_stack¶
Dictionary with values giving the
inspect.stack()
from the most recent time a key was retrieved (viaself.__getitem__()
).- Type:
See also
gather_configtree
Chains nested-dicts into a connected tree of ConfigDict(s)
Examples
>>> d = {'a': 1, 'b': {'a': 2}, 'c': 3, 'd-4': 4, 'e_5': 5, 'F': 6} >>> cd = ConfigDict(d) >>> cd['b'] {'a': 2} 'F' key is now lowercase. >>> cd['f'] 6 '_' is now '-' >>> cd['e-5'] 5 Keys that have been accessed. >>> cd.key_access_stack.keys() dict_keys(['b', 'f', 'e-5'])
- accessed_all_keys(search='local', parse_lists=False)[source]¶
Were all the keys used in the config tree?
- Parameters:
search ({'local', 'parents', 'children'}) –
What should the search cover? Options are:
"local"
Only check whether keys were used locally (in self).
"parents"
Recursively check keys in parents, moving up the tree, after checking local keys.
"children"
Recursively check keys in children, moving down the tree, after checking local keys.
parse_lists (bool, optional) – If True when search is “children”, check if self or its children contain a list and check the list for ConfDicts and whether they used their keys. This is slow. Note this only parses lists, strictly, not all Sequences.
- Return type:
Examples
>>> d = {'a': 1, 'b': {'a': 2}, 'c': 3, 'd-4': 4, 'e_5': 5, 'F': 6} >>> root_config = gather_configtree(d) >>> child_config = root_config['b'] >>> child_config['a'] 2 We can check whether all the keys in `child_config` have been accessed. >>> child_config.accessed_all_keys() True Same but also checking that all keys up the tree in parents have been used. >>> child_config.accessed_all_keys('parents') False Several keys in root_config were not accessed, so False is returned. Can also check key use locally and down the tree in nested, child ConfigDict instances. >>> root_config.accessed_all_keys('children') False ...which is still False in this case -- all keys in nested child_config have been used, but not all of the local keys in root_config have been used.
- merge(x, xparent=False)[source]¶
Merge, returning new copy
- Parameters:
x (ConfigDict or dict)
xparent (bool, optional) – Attach
x.parent
toout.parent
? If False, attachesself.parent
. Only works if x isConfigDict
.
- Returns:
out – Merged ConfigDict, using copied values from
self
.- Return type:
- impactlab_tools.utils.configdict.gather_configtree(d, parse_lists=False)[source]¶
Chains nested-dicts into a connected tree of ConfigDict(s)
- Parameters:
d (dict or MutableMapping) – Cast to
ConfigDict
. Nested dicts within are also recursively cast and assigned parents, reflecting their nested structure.parse_lists (bool, optional) – If d or its children contain a list of dicts, do you want to convert these listed dicts to ConfDicts and assign them parents. This is slow. Note this only parses lists, strictly, not all Sequences.
- Returns:
out
- Return type:
Examples
>>> nest = {'a': 1, 'b': {'a': 2}, 'c': 3, 'd-4': 4, 'e_5': 5, 'F': 6} >>> tree = gather_configtree(nest) >>> tree['b']['a'] 2
Returns the value for “a” in the nested dictionary “b”. However, if we request a key that is not available in this nested “b” dictionary, it will search through all parents.
>>> tree['b']['d-4'] 4
A KeyError is only thrown if the search has been exhausted with no matching keys found.
impactlab_tools.utils.versions module¶
- impactlab_tools.utils.versions.check_version(input_list, check_git=False)[source]¶
Returns version information given a list of module dependencies
- Parameters:
- Returns:
A dictionary of the modules: keys are the module names, each key has value of another dictionary, containing:
”source”: how is the module installed (“pip”, “local”, “git”, or None):
source is “pip” if it’s an open-sourced python package installed through pip.
source is “pip-local” if it’s a self-made tool installed through pip.
source is “git” if it’s a git managed repo of scripts, not installed through pip.
source is None if the module cannot be found.
”version”: If it’s an open source module (source: pip), this is the version numbers of it.
”git_hash”: If it’s a local module (source: local, or git).
- Return type:
Example
>>> input_list = [ ... "scipy", "numpy", "Cheetah", "computer", ... "impact-calculations", "metacsv"] ... >>> check_version(input_list, check_git=True) { "scipy": {"source": "pip", "version": "0.19"}, "numpy": {"source": "pip", "version": "1.12.1"}, "Cheetah": {"source": "pip", "version": "2.4.4"}, "computer": { "source": "git", "git_hash": "662870e0fa914b4fa958e78ebe02b858c31fe41d"}, "impact-calculations": { "source": "git", "git_hash": "e7c1b53b1d9e6571c0555a560c919f9645693b45"}, "metacsv": {"source": "pip", "version": "0.0.9"} }
impactlab_tools.utils.weighting module¶
- impactlab_tools.utils.weighting.weighted_quantile(values, quantiles, sample_weight=None, values_sorted=False, old_style=False, axis=None)[source]¶
Compute quantiles of a weighted distribution
similar to
weighted_quantile_1d()
but supports weighting along any (numbered) dimensionNote
quantiles should be in [0, 1]!
- Parameters:
values (numpy.array) – numpy.array with data
quantiles (array-like) – quantiles of distribution to return
sample_weight (numpy.array) – weights array-like of the same length as array
values_sorted (bool) – if True, then will avoid sorting of initial array
old_style (bool) – if True, will correct output to be consistent with numpy.percentile.
- Returns:
computed quantiles from weighted distribution
- Return type:
numpy.array
- impactlab_tools.utils.weighting.weighted_quantile_1d(values, quantiles, sample_weight=None, values_sorted=False, old_style=False)[source]¶
Very close to numpy.percentile, but supports weights
Thanks to Alleo! http://stackoverflow.com/questions/21844024/weighted-percentile-using-numpy/29677616#29677616
Note
quantiles should be in [0, 1]!
- Parameters:
values (numpy.array) – numpy.array with data
quantiles (array-like) – quantiles of distribution to return
sample_weight (numpy.array) – weights array-like of the same length as array
values_sorted (bool) – if True, then will avoid sorting of initial array
old_style (bool) – if True, will correct output to be consistent with numpy.percentile.
- Returns:
computed quantiles from weighted distribution
- Return type:
numpy.array
- impactlab_tools.utils.weighting.weighted_quantile_xr(data, quantiles, sample_weight, dim, values_sorted=False)[source]¶
Compute quantiles of a weighted distribution
similar to
weighted_quantile()
operates on a named dimension of anxarray.DataArray
.Note
quantiles should be in [0, 1]!
- Parameters:
data (DataArray or Dataset) –
xarray.DataArray
or :py:class`xarray.Dataset` with data indexed bydim
quantiles (array-like) – quantiles of distribution to return
sample_weight (numpy.array) – weights array-like of the same length as array
values_sorted (bool) – if True, then will avoid sorting of initial array
dim (str) – Dimension along which to weight the data
- Returns:
computed quantiles from weighted distribution
- Return type:
xarray.DataArray