Triple-Index Utilities#

balsa.routines.best_intermediates.best_intermediate_subset_zones(pk_subset_cost: Series, pk_table: DataFrame, kq_table: DataFrame, cost_col: str, *, n_subset: int = 1, n_final: int = 1, add_pkq_cost: ndarray[Any, dtype[ScalarType]] = None, flag_array: ndarray[Any, dtype[ScalarType]] = None, maximize_subset: bool = True, maximize_final: bool = True, null_index: int = 0, other_columns: bool = True, intermediate_name: str = 'intermediate_zone', availability_name: str = 'available', n_threads: int = 1, squeeze: bool = True) DataFrame | Dict[int, DataFrame]#

Numba-accelerated.

Triple-index operation for two matrices, finding the most- or least-cost intermediate zones from a subset. Takes a first leg matrix (“pk”) and a second leg matrix (“kq”) to produce a combined “pq” matrix with the best intermediate “k”. Also works to construct multiple “pq” matrices for the top _n_final_ intermediate “k” zones.

There is no restriction on the label dtypes, as long as the “pk” (leg 1) and “kq” (leg 2) tables share the same “k” index.

Both the input matrices must be provided in “tall” format - as Pandas Series with a 2-level MultiIndex. Essentially, the “pk” (leg 1) and “kq” (leg 2) tables are DataFrames with multiple matrices defined within. The output table(s) are also returned in a tall format.

When constructing the result tables, columns in the “pk” (leg 1) and “kq” (leg 2) tables are “carried forward” such that the results columns will be the union of columns in the input tables. Columns in one table only will be carried forward unmodified and retain their data type. Columns in both tables will be added together, and thus MUST be numeric.

In the specified cost column, a value of -inf (or inf when minimizing) is respected as the sentinel value for unavailable. “pk” or “kq” interchanges with this sentinel value will not be considered.

Parameters:
  • pk_subset_cost (pd.Series) – A Series with 2-level MultiIndex containing values to use for subsetting

  • pk_table (pd.DataFrame) – A DataFrame with 2-level MultiIndex of the shape ((p, k), A). Must include the specified cost column

  • kq_table (pd.DataFrame) – A DataFrame with 2-level MultiIndex of the shape ((k, q), E). Must include the specified cost column

  • cost_col (str) – Name of the column in the access and egress table to use as the cost to minimize/maximize. Values of +/- inf are respected to indicate unavailable choices.

  • n_subset (int, optional) – Defaults to 1. The number of intermediate ranks to subset (e.g., find the _n_subset_ best intermediate zones to select the _n_final_ best intermediate zones). If n_subset <= 0, it will be corrected to 1.

  • n_final (int, optional) – Defaults to 1. The number of ranks to return (e.g., find the _n_final_ best intermediate zones). If n_final <= 0, it will be corrected to 1.

  • add_pkq_cost (ndarray) – A 3-dimensional numpy array containing additional two-part path (“pkq”) costs to be included in triple-index operation.

  • flag_array (ndarray, optional) – Defaults to None. An array of boolean flags indicating the “pq” zone pairs to evaluate.

  • maximize_subset (bool, optional) – Defaults to True. If True, this function maximize the result when determining the “pk” selection. If False, it minimizes it.

  • maximize_final (bool, optional) – Defaults to True. If True, this function maximize the result when determining the “pq” selection. If False, it minimizes it.

  • null_index (int, optional) – Defaults to 0. Fill value used if NO intermediate zone is available.

  • other_columns (bool, optional) – Defaults to True. If True, the result DataFrame will include all columns in the “pk” and “kq” tables. The result table will be of the shape ((p, q), A | E + 3)

  • intermediate_name (str, optional) – Defaults to 'intermediate_zone'. Name of the column in the result table containing the selected intermediate zone.

  • availability_name (str, optional) – Defaults to 'available'. Name of the column in the result table containing a flag whether ANY intermediate zone was found to be available.

  • n_threads (int, optional) – Defaults to 1. Number of threads to use.

  • squeeze (bool, optional) – Defaults to True. If n_final == 1 and squeeze=True, a single DataFrame is returned. Otherwise, a dictionary of DataFrames will be returned.

Returns:

If n_final == 1 and squeeze=True. A DataFrame of the shape ((p, q), A | E + 3), containing the

intermediate “k” zone selected, the associated max/min cost, and a flag indicating its availability. Additional columns from the “pk” (leg 1) and “kq” (leg 2) tables, indexed for the appropriately chosen intermediate zone, will also be included if other_columns=True.

Dict[int, DataFrame]: If n_final > 1. The keys represent the ranks, so result[1] is the best intermediate

zone, result[2] is the second-best, etc. The value DataFrames are in the same format as if n_final == 1, just with different intermediate zones chosen.

Return type:

DataFrame

balsa.routines.best_intermediates.best_intermediate_zones(pk_table: DataFrame, kq_table: DataFrame, cost_col: str, *, n: int = 1, add_pkq_cost: ndarray[Any, dtype[ScalarType]] = None, flag_array: ndarray[Any, dtype[ScalarType]] = None, maximize: bool = True, null_index: int = 0, other_columns: bool = True, intermediate_name: str = 'intermediate_zone', availability_name: str = 'available', n_threads: int = 1, squeeze: bool = True) DataFrame | Dict[int, DataFrame]#

Numba-accelerated.

Triple-index operation for two matrices, finding the most- or least-cost intermediate zones. Takes a first leg matrix (“pk”) and a second leg matrix (“kq”) to produce a combined “pq” matrix with the best intermediate “k”. Also works to construct multiple “pq” matrices for the top _n_ intermediate “k” zones.

There is no restriction on the label dtypes, as long as the “pk” (leg 1) and “kq” (leg 2) tables share the same “k” index.

Both the input matrices must be provided in “tall” format - as Pandas Series with a 2-level MultiIndex. Essentially, the “pk” (leg 1) and “kq” (leg 2) tables are DataFrames with multiple matrices defined within. The output table(s) are also returned in a tall format.

When constructing the result tables, columns in the “pk” (leg 1) and “kq” (leg 2) tables are “carried forward” such that the results columns will be the union of columns in the input tables. Columns in one table only will be carried forward unmodified and retain their data type. Columns in both tables will be added together, and thus MUST be numeric.

In the specified cost column, a value of -inf (or inf when minimizing) is respected as the sentinel value for unavailable. “pk” or “kq” interchanges with this sentinel value will not be considered.

Parameters:
  • pk_table (pd.DataFrame) – A DataFrame with 2-level MultiIndex of the shape ((p, k), A). Must include the specified cost column

  • kq_table (pd.DataFrame) – A DataFrame with 2-level MultiIndex of the shape ((k, q), E). Must include the specified cost column

  • cost_col (str) – Name of the column in the access and egress table to use as the cost to minimize/maximize. Values of +/- inf are respected to indicate unavailable choices.

  • n (int, optional) – Defaults to 1. The number of ranks to return (e.g., find the _n_ best intermediate zones). If n <= 0, it will be corrected to 1.

  • add_pkq_cost (ndarray) – A 3-dimensional numpy array containing additional two-part path (“pkq”) costs to be included in triple-index operation.

  • flag_array (ndarray, optional) – Defaults to None. An array of boolean flags indicating the “pq” zone pairs to evaluate.

  • maximize (bool, optional) – Defaults to True. If True, this function maximize the result. If False, it minimizes it.

  • null_index (int, optional) – Defaults to 0. Fill value used if NO intermediate zone is available.

  • other_columns (bool, optional) – Defaults to True. If True, the result DataFrame will include all columns in the “pk” and “kq” tables. The result table will be of the shape ((p, q), A | E + 3)

  • intermediate_name (str, optional) – Defaults to 'intermediate_zone'. Name of the column in the result table containing the selected intermediate zone.

  • availability_name (str, optional) – Defaults to 'available'. Name of the column in the result table containing a flag whether ANY intermediate zone was found to be available.

  • n_threads (int, optional) – Defaults to 1. Number of threads to use.

  • squeeze (bool, optional) – Defaults to True. If n == 1 and squeeze=True, a single DataFrame is returned. Otherwise, a dictionary of DataFrames will be returned.

Returns:

If n == 1 and squeeze=True. A DataFrame of the shape ((p, q), A | E + 3), containing the

intermediate “k” zone selected, the associated max/min cost, and a flag indicating its availability. Additional columns from the “pk” (leg 1) and “kq” (leg 2) tables, indexed for the appropriately chosen intermediate zone, will also be included if other_columns=True.

Dict[int, DataFrame]: If n > 1. The keys represent the ranks, so result[1] is the best intermediate zone,

result[2] is the second-best, etc. The value DataFrames are in the same format as if n == 1, just with different intermediate zones chosen.

Return type:

DataFrame