Modelling Utilities#
- balsa.routines.modelling.distance_array(x0: ndarray | Series, y0: ndarray | Series, x1: ndarray | Series, y1: ndarray | Series, *, method: str = 'euclidean', **kwargs) ndarray | Series #
Fast method to compute distance between 2 (x, y) points, represented by 4 separate arrays, using the NumExpr package. Supports several equations for computing distances
- Parameters:
x0 (numpy.ndarray | pandas.Series) – X or Lon coordinate of first point
y0 (numpy.ndarray | pandas.Series) – Y or Lat coordinate of first point
x1 (numpy.ndarray | pandas.Series) – X or Lon coordinate of second point
y1 (numpy.ndarray | pandas.Series) – Y or Lat coordinate of second point
method (str, optional) – Defaults to
'EUCLIDEAN'
. Specifies the method by which to compute distance. Valid options are:'EUCLIDEAN'
: Computes straight-line, ‘as-the-crow flies’ distance.'MANHATTAN'
: Computes the Manhattan distance'HAVERSINE'
: Computes distance based on lon/lat.**kwargs – Additional scalars to pass into the evaluation context
- Kwargs:
- coord_unit (float):
Factor applies directly to the result, defaulting to 1.0 (no conversion). Useful when the coordinates are provided in one unit (e.g. m) and the desired result is in a different unit (e.g. km). Only used for Euclidean or Manhattan distance
- earth_radius_factor (float):
Factor to convert from km to other units when using Haversine distance
- Returns:
Distance from the vectors of first points to the vectors of second points. A Series is returned when one or more coordinate arrays are given as a Series object
- Return type:
numpy.ndarray or pandas.Series
- balsa.routines.modelling.distance_matrix(x0: ndarray | Series, y0: ndarray | Series, *, labels0: Iterable | Index = None, tall: bool = False, x1: ndarray | Series = None, y1: ndarray | Series = None, labels1: ndarray | Series = None, method: str = 'EUCLIDEAN', **kwargs) Series | DataFrame | ndarray #
Fastest method of computing a distance matrix from vectors of coordinates, using the NumExpr package. Supports several equations for computing distances.
Accepts two or four vectors of x-y coordinates. If only two vectors are provided (x0, y0), the result will be the 2D product of this vector with itself (vector0 * vector0). If all four are provided (x0, y0, x1, y1), the result will be the 2D product of the first and second vector (vector0 * vector1).
- Parameters:
x0 (numpy.ndarray | pandas.Series) – Vector of x-coordinates, of length N0. Can be a Series to specify labels.
y0 (numpy.ndarray | pandas.Series) – Vector of y-coordinates, of length N0. Can be a Series to specify labels.
labels0 (pandas.Index-like, optional) – Defaults to
None
. Override set of labels to use if x0 and y0 are both raw Numpy arraysx1 (numpy.ndarray | pandas.Series, optional) – Defaults to
None
. A second vector of x-coordinates, of length N1. Can be a Series to specify labelsy1 (numpy.ndarray | pandas.Series, optional) – Defaults to
None
. A second vector of y-coordinates, of length N1. Can be a Series to specify labelslabels1 (pandas.Index-like) – Override set of labels to use if x1 and y1 are both raw Numpy arrays
tall (bool, optional) – Defaults to
False
. If True, returns a vector whose shape is N0 x N1. Otherwise, returns a matrix whose shape is (N0, N1).method (str, optional) – Defaults to
'EUCLIDEAN'
. Specifies the method by which to compute distance. Valid options are:'EUCLIDEAN'
: Computes straight-line, ‘as-the-crow flies’ distance.'MANHATTAN'
: Computes the Manhattan distance'HAVERSINE'
: Computes distance based on lon/lat.**kwargs – Additional scalars to pass into the evaluation context
- Kwargs:
- coord_unit (float):
Factor applies directly to the result, defaulting to 1.0 (no conversion). Useful when the coordinates are provided in one unit (e.g. m) and the desired result is in a different unit (e.g. km). Only used for Euclidean or Manhattan distance
- earth_radius_factor (float):
Factor to convert from km to other units when using Haversine distance
- Returns:
A Series will be returned when
tall=True
, and labels can be inferred and will always have 2-level MultiIndex. A DataFrame will be returned whentall=False
and labels can be inferred. A ndarray will be returned when labels could not be inferred; iftall=True
the array will be 1-dimensional, with shape (N x N,). Otherwise, it will 2-dimensional with shape (N, N)- Return type:
pandas.Series, pandas.DataFrame or numpy.ndarray
Note
The type of the returned object depends on whether labels can be inferred from the arguments. This is always true when the labels argument is specified, and the returned value will use cross-product of the labels vector.
Otherwise, the function will try and infer the labels from the x and y objects, if one or both of them are provided as Series.
- balsa.routines.modelling.tlfd(values: ndarray | Series, *, bin_start: int = 0, bin_end: int = 200, bin_step: int = 2, weights: ndarray | Series = None, intrazonal: ndarray | Series = None, label_type: str = 'MULTI', include_top: bool = False) Series #
Generates a Trip Length Frequency Distribution (i.e. a histogram) from given data. Produces a “pretty” Pandas object suitable for charting.
- Parameters:
values (numpy.ndarray | pandas.Series) – A vector of trip lengths, with a length of “N”. Can be provided from a table of trips, or from a matrix (in “tall” format).
bin_start (int, optional) – Defaults is
0
. The minimum bin value, in the same units asvalues
.bin_end (int, optional) – Defaults to
200
. The maximum bin value, in the same units asvalues
. Values over this limit are either ignored, or counted under a separate category (seeinclude_top
)bin_step (int, optional) – Default is
2
. The size of each bin, in the same unit asvalues
.weights (numpy.ndarray | pandas.Series, optional) – Defaults to
None
. A vector of weights to use of length “N”, to produce a weighted histogram.intrazonal (numpy.ndarray | pandas.Series, optional) – Defaults to
None
. A boolean vector indicating which values are considered “intrazonal”. When specified, prepends anintrazonal
category to the front of the histogram.label_type (str, optional) – Defaults to
'MULTI'
. The format of the returned index. Options are: -MULTI
: The returned index will be a 2-level MultiIndex [‘from’, ‘to’]; -TEXT
: The returned index will be text-based: “0 to 2”; -BOTTOM
: The returned index will be the bottom of each bin; and -TOP
: The returned index will be the top of each bin.include_top (bool, optional) – Defaults to
False
. If True, the function will count all values (and weights, if provided) above the bin_top, and add them to the returned Series. This bin is described as going from bin_top to inf.
- Returns:
The weighted or unweighted histogram, depending on the options configured above.
- Return type:
pandas.Series