IO Utilities#
Routines for reading and writing demand modelling files, particularly matrices and TMGToolbox network packages.
Matrices#
Common#
- balsa.routines.io.common.coerce_matrix(matrix: ndarray | DataFrame | Series, *, allow_raw: bool = True, force_square: bool = True) ndarray #
Infers a NumPy array from given input
- Parameters:
matrix (numpy.ndarray | pandas.DataFrame | pandas.Series) –
allow_raw (bool, optional) – Defaults to
True
.force_square (bool, optional) – Defaults to
True
.
- Returns:
A 2D ndarray of type float32
- Return type:
numpy.ndarray
- balsa.routines.io.common.expand_array(a: ndarray, n: ndarray, *, axis: int = None) ndarray #
Expands an array across all dimensions by a set amount
- Parameters:
a (numpy.ndarray) – The array to expand
n (numpy.ndarray) – The (non-negative) number of items to expand by.
axis (int, optional) – Defaults to
None
. The axis to expand along, or None to expand along all axes.
- Returns:
The expanded array
- Return type:
numpy.ndarray
- balsa.routines.io.common.open_file(file_handle: str | Path | FileIO, **kwargs)#
Context manager for opening files provided as several different types. Supports a file handler as a str, unicode,
pathlib.Path
, or an already-opened handler.- Parameters:
file_handle (str | Path | FileIO) – The item to be opened or is already open.
**kwargs – Keyword args passed to
open()
. Usually mode=’w’.
- Yields:
File – The opened file handler. Automatically closed once out of context.
Inro (Emme) format#
- balsa.routines.io.inro.peek_mdf(file: str | FileIO | Path, *, as_index: bool = True) List[List[int]] | List[Index] #
Partially opens an MDF file to get the zone system of its rows and its columns.
- Parameters:
file (str | FileIO | Path) – The file to read.
as_index (bool, optional) – Defaults to
True
. Set toTrue
to return a pandas.Index object rather than List[int]
- Returns:
One item for each dimension. If
as_index=True
, the items will be pandas.Index objects, otherwise they will be List[int]- Return type:
List[int] or List[pandas.Index]
- balsa.routines.io.inro.read_emx(file: str | FileIO | Path, *, zones: int | Iterable[int] | Index = None, tall: bool = False) ndarray | DataFrame | Series #
Reads an “internal” Emme matrix (found in <Emme Project>/Database/emmemat); with an ‘.emx’ extension. This data format does not contain information about zones. Its size is determined by the dimensions of the Emmebank (
Emmebank.dimensions['centroids']
), regardless of the number of zones actually used in all scenarios.- Parameters:
file (str | File | Path) – The file to read.
zones (int | Iterable[int] | pandas.Index, optional) – Defaults to
None
. An Index or Iterable will be interpreted as the zone labels for the matrix rows and columns; returning a DataFrame or Series (depending ontall
). If an integer is provided, the returned ndarray will be truncated to this ‘number of zones’. Otherwise, the returned ndarray will be size to the maximum number of zone dimensioned by the Emmebank.tall (bool, optional) – Defaults to
False
. If True, a 1D data structure will be returned. Ifzone_index
is provided, a Series will be returned, otherwise a 1D ndarray.
- Returns:
numpy.ndarray, pandas.DataFrame, or pandas.Series.
Examples
For a project with 20 zones:
>>> matrix = read_emx("Database/emmemat/mf1.emx") >>> print type(matrix), matrix.shape (numpy.ndarray, (20, 20))
>>> matrix = read_emx("Database/emmemat/mf1.emx", zones=10) >>> print type(matrix), matrix.shape (numpy.ndarray, (10, 10))
>>> matrix = read_emx("Database/emmemat/mf1.emx", zones=range(10)) >>> print type(matrix), matrix.shape <class 'pandas.core.frame.DataFrame'> (10, 10)
>>> matrix = read_emx("Database/emmemat/mf1.emx", zones=range(10), tall=True) >>> print type(matrix), matrix.shape <class 'pandas.core.series.Series'> 100
- balsa.routines.io.inro.read_mdf(file: str | FileIO | Path, *, raw: bool = False, tall: bool = False) ndarray | DataFrame | Series #
Reads Emme’s official matrix “binary serialization” format, created using
inro.emme.matrix.MatrixData.save()
. There is no official extension for this type of file; ‘.mdf’ is recommended. ‘.emxd’ is also sometimes encountered.- Parameters:
file (str | FileIO | Path) – The file to read.
raw (bool, optional) – Defaults to
False
. IfTrue
, returns an unlabelled ndarray. Otherwise, a DataFrame will be returned.tall (bool, optional) – Defaults to
False
. IfTrue
, a 1D data structure will be returned. Ifraw=False
, a Series will be returned, otherwise a 1D ndarray.
- Returns:
The matrix stored in the file.
- Return type:
numpy.ndarray, pandas.DataFrame, or pandas.Series
- balsa.routines.io.inro.to_emx(matrix: DataFrame | Series | ndarray, file: str | FileIO | Path, emmebank_zones: int)#
Writes an “internal” Emme matrix (found in <Emme Project>/Database/emmemat); with an ‘.emx’ extension. The number of zones that the Emmebank is dimensioned for must be known in order for the file to be written correctly.
- Parameters:
matrix (pandas.DataFrame | pandas.Series | numpy.ndarray) – The matrix to write to disk. If a Series is given, it MUST have a MultiIndex with exactly 2 levels to unstack.
file (str | FileIO | Path) – The path or file handler to write to.
emmebank_zones (int) – The number of zones the target Emmebank is dimensioned for.
- balsa.routines.io.inro.to_mdf(matrix: DataFrame | Series, file: str | FileIO | Path)#
Writes a matrix to Emme’s official “binary serialization” format, which can be loaded in Emme using
inro.emme.matrix.MatrixData.load()
. There is no official extension for this type of file; ‘.mdf’ is recommended.- Parameters:
matrix (pandas.DataFrame | panda.Series) – The matrix to write to disk. If a Series is given, it MUST have a MultiIndex with exactly 2 levels to unstack.
file (str | FileIO | Path) – The path or file handler to write to.
OMX format#
The Python library openmatrix already exists, but doesn’t provide a lot of interoperability with Pandas. Balsa provides wrapper functions that produce Pandas DataFrames and Series directly from OMX files.
- balsa.routines.io.omx.read_omx(src_fp: str | PathLike, *, tables: Iterable[str] = None, mapping: str = None, tall: bool = False, raw: bool = False, squeeze: bool = True) DataFrame | Series | ndarray | Dict[str, DataFrame | Series | ndarray] #
Reads Open Matrix (OMX) files. An OMX file can contain multiple matrices, so this function typically returns a Dict.
- Parameters:
src_fp (str | PathLike) – OMX file from which to read. Cannot be an open file handler.
tables (Iterable[str], optional) – List of matrices to read from the file. If None, all matrices will be read.
mapping (str, optional) – The zone number mapping to use, if known in advance.
tall (bool, optional) – If True, matrices will be returned in 1D format. Otherwise, a 2D object is returned.
raw (bool, optional) – If True, matrices will be returned as raw Numpy arrays. Otherwise, Pandas objects are returned
squeeze (bool, optional) – If True, and the file contains exactly one matrix, return that matrix instead of a Dict.
- Returns:
The matrix, or matrices contained in the OMX file.
- balsa.routines.io.omx.to_omx(dst_fp: str | PathLike, tables: Dict[str, DataFrame | Series | ndarray], *, zone_index: Index = None, title: str = '', descriptions: Dict[str, str] = None, attrs: Dict[str, Dict] = None, mapping_name: str = 'zone_numbers')#
Creates a new (or overwrites an old) OMX file with a collection of matrices.
- Parameters:
dst_fp (str | PathLike) – OMX to write.
(Dict[str (tables) – Collection of matrices to write. MUST be a dict, to permit the encoding of matrix metadata, and must contain the same types: all Numpy arrays, all Series, or all DataFrames. Checking is done to ensure that all items have the same shape and labels.
np.ndarray] (pd.DataFrame | pd.Series |) – Collection of matrices to write. MUST be a dict, to permit the encoding of matrix metadata, and must contain the same types: all Numpy arrays, all Series, or all DataFrames. Checking is done to ensure that all items have the same shape and labels.
zone_index – (pd.Index, optional): Override zone labels to use. Generally only useful if writing a dict of raw Numpy arrays.
title (str, optional) – The title saved in the OMX file.
descriptions (Dict[str, str], optional) – A dict of descriptions (one for each given matrix).
attrs (Dict[str, Dict], optional) – A dict of dicts (one for each given matrix).
mapping_name (str, optional) – Name of the mapping internal to the OMX file
Fortran format#
- balsa.routines.io.fortran.read_fortran_rectangle(file: str | FileIO | Path, n_columns: int, *, zones: int | Iterable[int] | Index = None, tall: bool = False, reindex_rows: bool = False, fill_value: int | float = None) ndarray | DataFrame | Series #
Reads a FORTRAN-friendly .bin file (a.k.a. ‘simple binary format’) which is known to NOT be square. Also works with square matrices.
This file format is an array of 4-bytes, where each row is prefaced by an integer referring to the 1-based positional index that FORTRAN uses. The rest of the data are in 4-byte floats. To read this, the number of columns present must be known, since the format does not self-specify.
- Parameters:
file (str | FileIO | Path) – The file to read.
n_columns (int) – The number of columns in the matrix.
zones (int | Iterable[int] | pandas.Index, optional) – Defaults to
None
. An Index or Iterable will be interpreted as the zone labels for the matrix rows and columns; returning a DataFrame or Series (depending on tall). If an integer is provided, the returned ndarray will be truncated to this ‘number of zones’.tall (bool, optional) – Defaults to
False
. If true, a ‘tall’ version of the matrix will be returned.reindex_rows (bool, optional) – Defaults to
False
. If true, and zones is an Index, the returned DataFrame will be reindexed to fill-in any missing rows.fill_value (optional) – Defaults to
None
. The value to pass topandas.reindex()
- Returns:
numpy.ndarray, pandas.DataFrame or pandas.Series
- Raises:
AssertionError – if the shape is not valid.
- balsa.routines.io.fortran.read_fortran_square(file: str | FileIO | Path, *, zones: int | Iterable[int] | Index = None, tall: bool = False) ndarray | DataFrame | Series #
Reads a FORTRAN-friendly .bin file (a.k.a. ‘simple binary format’) which is known to be square.
This file format is an array of 4-bytes, where each row is prefaced by an integer referring to the 1-based positional index that FORTRAN uses. The rest of the data are in 4-byte floats. To read this, the number of columns present must be known, since the format does not self-specify. This method can infer the shape if it is square.
- Parameters:
file (str | FileIO | Path) – The file to read.
zones (int | pandas.Index | Iterable[int], optional) – Defaults to
None
. An Index or Iterable will be interpreted as the zone labels for the matrix rows and columns; returning a DataFrame or Series (depending ontall
). If an integer is provided, the returned ndarray will be truncated to this ‘number of zones’. Otherwise, the returned ndarray will be size to the maximum number of zone dimensioned by the Emmebank.tall (bool, optional) – Defaults to
False
. If True, a 1D data structure will be returned. Ifzone_index
is provided, a Series will be returned, otherwise a 1D ndarray.
- Returns:
numpy.ndarray, pandas.DataFrame, or pandas.Series
- balsa.routines.io.fortran.to_fortran(matrix: ndarray | DataFrame | Series, file: str | FileIO | Path, *, n_columns: int = None, min_index: int = 1, force_square: bool = True)#
Writes a FORTRAN-friendly .bin file (a.k.a. ‘simple binary format’), in a square format.
- Parameters:
matrix (pandas.DataFrame | pandas.Series | numpy.ndarray) – The matrix to write to disk. If a Series is given, it MUST have a MultiIndex with exactly 2 levels to unstack.
file (str | FileIO | Path) – The path or file handler to write to.
n_columns (int, optional) – Defaults to
None
. Specifies a desired “width” of the matrix file. For example,n_columns=4000
on a 3500x3500 matrix will pad the width with 500 extra columns containing 0. IfNone
is provided or the value is <= the width of the given matrix, no padding will be performed.min_index (int, optional) – Defaults to
1
. The lowest numbered row. Used when slicing matricesforce_square (bool, optional) – Defaults to
True
.
Network Packages (NWP)#
For more information on the TMGToolbox Network Package format, please visit https://tmg.utoronto.ca/doc/1.6/tmgtoolbox/input_output/ExportNetworkPackage.html
- balsa.routines.io.nwp.parse_tmg_ncs_line_id(s: Series) Tuple[Series, Series] #
A function to parse line IDs based on TMG Network Coding Standard conventions. Returns pandas Series objects corresponding to the parsed operator and route IDs
- balsa.routines.io.nwp.process_emme_eng_notation_series(s: ~pandas.core.series.Series, *, to_dtype=<class 'float'>) Series #
A function to convert Pandas Series containing values in Emme’s engineering notation
- balsa.routines.io.nwp.read_nwp_base_network(nwp_fp: str | PathLike) Tuple[DataFrame, DataFrame] #
A function to read the base network from a Network Package file (exported from Emme using the TMG Toolbox) into DataFrames.
- Parameters:
nwp_fp (str | PathLike) – File path to the network package.
- Returns:
A tuple of DataFrames containing the nodes and links
- Return type:
Tuple[pd.DataFrame, pd.DataFrame]
- balsa.routines.io.nwp.read_nwp_exatts_list(nwp_fp: str | PathLike, **kwargs) DataFrame #
A function to read the extra attributes present in a Network Package file (exported from Emme using the TMG Toolbox).
- Parameters:
nwp_fp (str | PathLike) – File path to the network package.
**kwargs – Any valid keyword arguments used by
pandas.read_csv()
.
- Returns:
pd.DataFrame
- balsa.routines.io.nwp.read_nwp_link_attributes(nwp_fp: str | PathLike, *, attributes: str | List[str] = None, **kwargs) DataFrame #
A function to read link attributes from a Network Package file (exported from Emme using the TMG Toolbox).
- Parameters:
nwp_fp (str | PathLike) – File path to the network package.
attributes (str | List[str], optional) – Defaults to
None
. Names of link attributes to extract. Note that'inode'
and'jnode'
will be included by default.**kwargs – Any valid keyword arguments used by
pandas.read_csv()
.
- Returns:
pd.DataFrame
- balsa.routines.io.nwp.read_nwp_node_attributes(nwp_fp: str | PathLike, *, attributes: str | List[str] = None, **kwargs) DataFrame #
A function to read node attributes from a Network Package file (exported from Emme using the TMG Toolbox).
- Parameters:
nwp_fp (str | PathLike) – File path to the network package.
attributes (str | List[str], optional) – Defaults to
None
. Names of node attributes to extract. Note that'inode'
will be included by default.**kwargs – Any valid keyword arguments used by
pandas.read_csv()
.
- Returns:
pd.DataFrame
- balsa.routines.io.nwp.read_nwp_traffic_results(nwp_fp: str | PathLike) DataFrame #
A function to read the traffic assignment results from a Network Package file (exported from Emme using the TMG Toolbox).
- Parameters:
nwp_fp (str | PathLike) – File path to the network package.
- Returns:
pd.DataFrame
- balsa.routines.io.nwp.read_nwp_traffic_results_at_countpost(nwp_fp: str | PathLike, countpost_att: str) DataFrame #
A function to read the traffic assignment results at countposts from a Network Package file (exported from Emme using the TMG Toolbox).
- Parameters:
nwp_fp (str | PathLike) – File path to the network package.
countpost_att (str) – The name of the extra link attribute containing countpost identifiers. Results will be filtered using this attribute.
- Returns:
pd.DataFrame
- balsa.routines.io.nwp.read_nwp_transit_line_attributes(nwp_fp: str | PathLike, *, attributes: str | List[str] = None, **kwargs) DataFrame #
A function to read transit line attributes from a Network Package file (exported from Emme using the TMG Toolbox).
- Parameters:
nwp_fp (str | PathLike) – File path to the network package.
attributes (str | List[str], optional) – Defaults to
None
. Names of transit line attributes to extract. Note that'line'
will be included by default.**kwargs – Any valid keyword arguments used by
pandas.read_csv()
.
- Returns:
pd.DataFrame
- balsa.routines.io.nwp.read_nwp_transit_network(nwp_fp: str | PathLike) Tuple[DataFrame, DataFrame] #
A function to read the transit network from a Network Package file (exported from Emme using the TMG Toolbox) into DataFrames.
- Parameters:
nwp_fp (str | PathLike) – File path to the network package.
- Returns:
A tuple of DataFrames containing the transt lines and segments.
- Return type:
Tuple[pd.DataFrame, pd.DataFrame]
- balsa.routines.io.nwp.read_nwp_transit_result_summary(nwp_fp: str | PathLike, *, parse_line_id: bool = True) DataFrame #
A function to read and summarize the transit assignment boardings and max volumes from a Network Package file (exported from Emme using the TMG Toolbox) by operator and route.
- Parameters:
nwp_fp (str | PathLike) – File path to the network package.
parse_line_id (bool, optional) – Defaults to
True
. Option to parse operator and route IDs from line IDs. Please note that transit line IDs must adhere to the TMG NCS16 for this option to work properly.
- Returns:
pd.DataFrame
- balsa.routines.io.nwp.read_nwp_transit_segment_results(nwp_fp: str | PathLike) DataFrame #
A function to read and summarize the transit segment boardings, alightings, and volumes from a Network Package file (exported from Emme using the TMG Toolbox).
- Parameters:
nwp_fp (str | PathLike) – File path to the network package.
- Returns:
pd.DataFrame
- balsa.routines.io.nwp.read_nwp_transit_station_results(nwp_fp: str | PathLike, station_line_nodes: List[int]) DataFrame #
A function to read and summarize the transit boardings (on) and alightings (offs) at stations from a Network Package file (exported from Emme using the TMG Toolbox).
Note
Ensure that station nodes being specified are on the transit line itself and are not station centroids.
- Parameters:
nwp_fp (str | PathLike) – File path to the network package.
station_line_nodes (List[int]) – List of transit line nodes representing transit stops/stations
- Returns:
pd.DataFrame
- balsa.routines.io.nwp.read_nwp_transit_vehicles(nwp_fp: str | PathLike) DataFrame #
A function to read the transit vehicles from a Network Package file (exported from Emme using the TMG Toolbox) into DataFrames.
- Parameters:
nwp_fp (str | PathLike) – File path to the network package.
- Returns:
DataFrame containing the transit vehicles.
- Return type:
pd.DataFrame