impactlab_tools.utils package

Submodules

impactlab_tools.utils.binning module

impactlab_tools.utils.binning.binned_statistic_1d(da, dim, bins=10, statistic='count', value_range=None)[source]

Bin a data array by values and summarize along a dimension

Parameters:
  • da (xr.DataArray) – DataArray to be binned

  • dim (str) – Dimension along which to summarize the binned values

  • statistic (string or callable, optional) –

    The statistic to compute (default is ‘count’). The following statistics are available:

    • ’mean’compute the mean of values for points within each bin. Empty

      bins will be represented by NaN.

    • ’median’compute the median of values for points within each bin.

      Empty bins will be represented by NaN.

    • ’count’compute the count of points within each bin. This is

      identical to an unweighted histogram. values array is not referenced.

    • ’sum’compute the sum of values for points within each bin. This is

      identical to a weighted histogram.

    • ’min’compute the minimum of values for points within each bin.

      Empty bins will be represented by NaN.

    • ’max’compute the maximum of values for point within each bin.

      Empty bins will be represented by NaN.

    • functiona user-defined function which takes a 1D array of values,

      and outputs a single numerical statistic. This function will be called on the values in each bin. Empty bins will be represented by function([]), or NaN if this returns an error.

  • bins (int or sequence of scalars, optional) – If bins is an int, it defines the number of equal-width bins in the given range (10 by default). If bins is a sequence, it defines the bin edges, including the rightmost edge, allowing for non-uniform bin widths. Values in x that are smaller than lowest bin edge are assigned to bin number 0, values beyond the highest bin are assigned to bins[-1]. If the bin edges are specified, the number of bins will be, (nx = len(bins)-1).

value_range(float, float) or [(float, float)], optional

The lower and upper range of the bins. If not provided, value_range is simply (x.min(), x.max()). Values outside the range are ignored.

Returns:

binned – A data array with bins along the summary dimension

Return type:

xr.DataArray

Examples

>>> da = xr.DataArray(
...     np.arange(16).reshape(4,4),
...     dims=('a', 'b'),
...     coords={'a': list('abcd'), 'b': list('wxyz')})
...
>>> da 
<xarray.DataArray (a: 4, b: 4)>
array([[  0,  1,  2,  3],
       [  4,  5,  6,  7],
       [  8,  9, 10, 11],
       [ 12, 13, 14, 15]])
Coordinates:
  * a        (a) <U1 'a' 'b' 'c' 'd'
  * b        (b) <U1 'w' 'x' 'y' 'z'

>>> binned_statistic_1d(
...     da, 'b', [0, 2, 5, 20])
...     
<xarray.DataArray (a: 4, goh realroups: 3)>
array([[ 2., 2., 0.],
       [ 0., 1., 3.],
       [ 0., 0., 4.],
       [ 0., 0., 4.]])
Coordinates:
  * a        (a) <U1 'a' 'b' 'c' 'd'
  * groups   (groups) object '(0, 2]' '(2, 5]' '(5, 20]'

>>> binned_statistic_1d(da, 'a', statistic='sum') 
<xarray.DataArray (groups: 10, b: 4)>
array([[  0.,  1.,  2.,  3.],
       [  0.,  0.,  0.,  0.],
       [  0.,  0.,  0.,  0.],
       [  4.,  5.,  6.,  7.],
       [  0.,  0.,  0.,  0.],
       [  0.,  0.,  0.,  0.],
       [  8.,  9., 10., 11.],
       [  0.,  0.,  0.,  0.],
       [  0.,  0.,  0.,  0.],
       [ 12., 13., 14., 15.]])
Coordinates:
  * groups   (groups) object '(0.0, 1.5]' '(1.5, 3.0]' '(3.0, 4.5]' ...
  * b        (b) <U1 'w' 'x' 'y' 'z'

impactlab_tools.utils.files module

Utilities for path handling

Provides server-specific paths, configured in a server configuration file.

impactlab_tools.utils.files.configpath(path)[source]

Return an configured absolute path. If the path is absolute, it will be left alone; otherwise, it is assumed to be a subpath of the configured shareddir.

impactlab_tools.utils.files.get_allargv_config()[source]

Load a configuration from the command line, merging all arguments into a single configuration dictionary.

Handles the following kinds of command-line arguments:

  • *.yml: a full configuration YaML file, merged into the result dictionary

  • –config=VALUE: a full configuration YaML file, as above

  • –KEY=VALUE: a single configuration value to be set

  • anything else: sets an entry in the config file for that anything to have a value of True.

Later command-line arguments always overide earlier ones.

impactlab_tools.utils.files.get_argv_config(index=1)[source]

Load a configuration file specified as the index argv argument.

In the future, this should also load specific configurable options from the command-line.

impactlab_tools.utils.files.get_file_config(filepath)[source]

Load a configuration file from a given path.

impactlab_tools.utils.files.sharedpath(subpath)[source]

Return a subpath of the configured shareddir

shareddir is path to the root directory containing the support/data files needed to run impact projections. shareddir is found first by looking for the IMPERICS_SHAREDDIR shell/environment variable. If this is not defined, it looks for a “shareddir” entry in a “../server.yml” file.

Parameters:

subpath (str) – Subdirectory path joined onto shareddir.

impactlab_tools.utils.files.use_config(config)[source]

Use the given configuration for path functions.

impactlab_tools.utils.configdict module

Class for representing tool configuration files

class impactlab_tools.utils.configdict.ConfigDict(*args, **kwargs)[source]

Bases: UserDict

Chain-able dictionary to hold projection configurations.

A ConfigDict is a dictionary-like interface to a chainmap/linked list. Nested dicts can be access like a traditional dictionary but it searches parent dictionaries for keys:values not found. All string keys normalized, by transforming all characters to lowercase, and all underscores to hyphens.

parent

Parent ConfigDict object to query for keys if not in self.data.

Type:

ConfigDict or None

key_access_stack

Dictionary with values giving the inspect.stack() from the most recent time a key was retrieved (via self.__getitem__()).

Type:

dict

data

The ‘local’ dictionary, not in parents.

Type:

dict

See also

gather_configtree

Chains nested-dicts into a connected tree of ConfigDict(s)

Examples

>>> d = {'a': 1, 'b': {'a': 2}, 'c': 3, 'd-4': 4, 'e_5': 5, 'F': 6}
>>> cd = ConfigDict(d)
>>> cd['b']
{'a': 2}

'F' key is now lowercase.

>>> cd['f']
6

'_' is now '-'

>>> cd['e-5']
5

Keys that have been accessed.

>>> cd.key_access_stack.keys() 
dict_keys(['b', 'f', 'e-5'])
accessed_all_keys(search='local', parse_lists=False)[source]

Were all the keys used in the config tree?

Parameters:
  • search ({'local', 'parents', 'children'}) –

    What should the search cover? Options are:

    "local"

    Only check whether keys were used locally (in self).

    "parents"

    Recursively check keys in parents, moving up the tree, after checking local keys.

    "children"

    Recursively check keys in children, moving down the tree, after checking local keys.

  • parse_lists (bool, optional) – If True when search is “children”, check if self or its children contain a list and check the list for ConfDicts and whether they used their keys. This is slow. Note this only parses lists, strictly, not all Sequences.

Return type:

bool

Examples

>>> d = {'a': 1, 'b': {'a': 2}, 'c': 3, 'd-4': 4, 'e_5': 5, 'F': 6}
>>> root_config = gather_configtree(d)
>>> child_config = root_config['b']
>>> child_config['a']
2

We can check whether all the keys in `child_config` have been
accessed.

>>> child_config.accessed_all_keys()
True

Same but also checking that all keys up the tree in parents have
been used.

>>> child_config.accessed_all_keys('parents')
False

Several keys in root_config were not accessed, so False is
returned.

Can also check key use locally and down the tree in nested, child
ConfigDict instances.

>>> root_config.accessed_all_keys('children')
False

...which is still False in this case -- all keys in nested
child_config have been used, but not all of the local keys in
root_config have been used.
merge(x, xparent=False)[source]

Merge, returning new copy

Parameters:
  • x (ConfigDict or dict)

  • xparent (bool, optional) – Attach x.parent to out.parent? If False, attaches self.parent. Only works if x is ConfigDict.

Returns:

out – Merged ConfigDict, using copied values from self.

Return type:

ConfigDict

impactlab_tools.utils.configdict.gather_configtree(d, parse_lists=False)[source]

Chains nested-dicts into a connected tree of ConfigDict(s)

Parameters:
  • d (dict or MutableMapping) – Cast to ConfigDict. Nested dicts within are also recursively cast and assigned parents, reflecting their nested structure.

  • parse_lists (bool, optional) – If d or its children contain a list of dicts, do you want to convert these listed dicts to ConfDicts and assign them parents. This is slow. Note this only parses lists, strictly, not all Sequences.

Returns:

out

Return type:

ConfigDict

Examples

>>> nest = {'a': 1, 'b': {'a': 2}, 'c': 3, 'd-4': 4, 'e_5': 5, 'F': 6}
>>> tree = gather_configtree(nest)
>>> tree['b']['a']
2

Returns the value for “a” in the nested dictionary “b”. However, if we request a key that is not available in this nested “b” dictionary, it will search through all parents.

>>> tree['b']['d-4']
4

A KeyError is only thrown if the search has been exhausted with no matching keys found.

impactlab_tools.utils.versions module

impactlab_tools.utils.versions.check_version(input_list, check_git=False)[source]

Returns version information given a list of module dependencies

Parameters:
  • input_list (list) – list of strings, all module names

  • check_git (bool) – True if the caller also wants to check the git hash of a repo (input_list contains its name) that’s under the user’s home dir.

Returns:

A dictionary of the modules: keys are the module names, each key has value of another dictionary, containing:

  • ”source”: how is the module installed (“pip”, “local”, “git”, or None):

    • source is “pip” if it’s an open-sourced python package installed through pip.

    • source is “pip-local” if it’s a self-made tool installed through pip.

    • source is “git” if it’s a git managed repo of scripts, not installed through pip.

    • source is None if the module cannot be found.

  • ”version”: If it’s an open source module (source: pip), this is the version numbers of it.

  • ”git_hash”: If it’s a local module (source: local, or git).

Return type:

dict

Example

>>> input_list = [
...    "scipy", "numpy", "Cheetah", "computer",
...    "impact-calculations", "metacsv"]
...
>>> check_version(input_list, check_git=True) 
{
    "scipy": {"source": "pip", "version": "0.19"},
    "numpy": {"source": "pip", "version": "1.12.1"},
    "Cheetah": {"source": "pip", "version": "2.4.4"},
    "computer": {
        "source": "git",
        "git_hash": "662870e0fa914b4fa958e78ebe02b858c31fe41d"},
    "impact-calculations": {
        "source": "git",
        "git_hash": "e7c1b53b1d9e6571c0555a560c919f9645693b45"},
    "metacsv": {"source": "pip", "version": "0.0.9"}
}

impactlab_tools.utils.weighting module

impactlab_tools.utils.weighting.weighted_quantile(values, quantiles, sample_weight=None, values_sorted=False, old_style=False, axis=None)[source]

Compute quantiles of a weighted distribution

similar to weighted_quantile_1d() but supports weighting along any (numbered) dimension

Note

quantiles should be in [0, 1]!

Parameters:
  • values (numpy.array) – numpy.array with data

  • quantiles (array-like) – quantiles of distribution to return

  • sample_weight (numpy.array) – weights array-like of the same length as array

  • values_sorted (bool) – if True, then will avoid sorting of initial array

  • old_style (bool) – if True, will correct output to be consistent with numpy.percentile.

Returns:

computed quantiles from weighted distribution

Return type:

numpy.array

impactlab_tools.utils.weighting.weighted_quantile_1d(values, quantiles, sample_weight=None, values_sorted=False, old_style=False)[source]

Very close to numpy.percentile, but supports weights

Thanks to Alleo! http://stackoverflow.com/questions/21844024/weighted-percentile-using-numpy/29677616#29677616

Note

quantiles should be in [0, 1]!

Parameters:
  • values (numpy.array) – numpy.array with data

  • quantiles (array-like) – quantiles of distribution to return

  • sample_weight (numpy.array) – weights array-like of the same length as array

  • values_sorted (bool) – if True, then will avoid sorting of initial array

  • old_style (bool) – if True, will correct output to be consistent with numpy.percentile.

Returns:

computed quantiles from weighted distribution

Return type:

numpy.array

impactlab_tools.utils.weighting.weighted_quantile_xr(data, quantiles, sample_weight, dim, values_sorted=False)[source]

Compute quantiles of a weighted distribution

similar to weighted_quantile() operates on a named dimension of an xarray.DataArray.

Note

quantiles should be in [0, 1]!

Parameters:
  • data (DataArray or Dataset) – xarray.DataArray or :py:class`xarray.Dataset` with data indexed by dim

  • quantiles (array-like) – quantiles of distribution to return

  • sample_weight (numpy.array) – weights array-like of the same length as array

  • values_sorted (bool) – if True, then will avoid sorting of initial array

  • dim (str) – Dimension along which to weight the data

Returns:

computed quantiles from weighted distribution

Return type:

xarray.DataArray

Module contents