IO Utilities#

Routines for reading and writing demand modelling files, particularly matrices and TMGToolbox network packages.

Matrices#

Common#

balsa.routines.io.common.coerce_matrix(matrix: ndarray | DataFrame | Series, *, allow_raw: bool = True, force_square: bool = True) ndarray#

Infers a NumPy array from given input

Parameters:
  • matrix (numpy.ndarray | pandas.DataFrame | pandas.Series) –

  • allow_raw (bool, optional) – Defaults to True.

  • force_square (bool, optional) – Defaults to True.

Returns:

A 2D ndarray of type float32

Return type:

numpy.ndarray

balsa.routines.io.common.expand_array(a: ndarray, n: ndarray, *, axis: int = None) ndarray#

Expands an array across all dimensions by a set amount

Parameters:
  • a (numpy.ndarray) – The array to expand

  • n (numpy.ndarray) – The (non-negative) number of items to expand by.

  • axis (int, optional) – Defaults to None. The axis to expand along, or None to expand along all axes.

Returns:

The expanded array

Return type:

numpy.ndarray

balsa.routines.io.common.open_file(file_handle: str | Path | FileIO, **kwargs)#

Context manager for opening files provided as several different types. Supports a file handler as a str, unicode, pathlib.Path, or an already-opened handler.

Parameters:
  • file_handle (str | Path | FileIO) – The item to be opened or is already open.

  • **kwargs – Keyword args passed to open(). Usually mode=’w’.

Yields:

File – The opened file handler. Automatically closed once out of context.

Inro (Emme) format#

balsa.routines.io.inro.peek_mdf(file: str | FileIO | Path, *, as_index: bool = True) List[List[int]] | List[Index]#

Partially opens an MDF file to get the zone system of its rows and its columns.

Parameters:
  • file (str | FileIO | Path) – The file to read.

  • as_index (bool, optional) – Defaults to True. Set to True to return a pandas.Index object rather than List[int]

Returns:

One item for each dimension. If as_index=True, the items will be pandas.Index objects, otherwise they will be List[int]

Return type:

List[int] or List[pandas.Index]

balsa.routines.io.inro.read_emx(file: str | FileIO | Path, *, zones: int | Iterable[int] | Index = None, tall: bool = False) ndarray | DataFrame | Series#

Reads an “internal” Emme matrix (found in <Emme Project>/Database/emmemat); with an ‘.emx’ extension. This data format does not contain information about zones. Its size is determined by the dimensions of the Emmebank (Emmebank.dimensions['centroids']), regardless of the number of zones actually used in all scenarios.

Parameters:
  • file (str | File | Path) – The file to read.

  • zones (int | Iterable[int] | pandas.Index, optional) – Defaults to None. An Index or Iterable will be interpreted as the zone labels for the matrix rows and columns; returning a DataFrame or Series (depending on tall). If an integer is provided, the returned ndarray will be truncated to this ‘number of zones’. Otherwise, the returned ndarray will be size to the maximum number of zone dimensioned by the Emmebank.

  • tall (bool, optional) – Defaults to False. If True, a 1D data structure will be returned. If zone_index is provided, a Series will be returned, otherwise a 1D ndarray.

Returns:

numpy.ndarray, pandas.DataFrame, or pandas.Series.

Examples

For a project with 20 zones:

>>> matrix = read_emx("Database/emmemat/mf1.emx")
>>> print type(matrix), matrix.shape
(numpy.ndarray, (20, 20))
>>> matrix = read_emx("Database/emmemat/mf1.emx", zones=10)
>>> print type(matrix), matrix.shape
(numpy.ndarray, (10, 10))
>>> matrix = read_emx("Database/emmemat/mf1.emx", zones=range(10))
>>> print type(matrix), matrix.shape
<class 'pandas.core.frame.DataFrame'> (10, 10)
>>> matrix = read_emx("Database/emmemat/mf1.emx", zones=range(10), tall=True)
>>> print type(matrix), matrix.shape
<class 'pandas.core.series.Series'> 100
balsa.routines.io.inro.read_mdf(file: str | FileIO | Path, *, raw: bool = False, tall: bool = False) ndarray | DataFrame | Series#

Reads Emme’s official matrix “binary serialization” format, created using inro.emme.matrix.MatrixData.save(). There is no official extension for this type of file; ‘.mdf’ is recommended. ‘.emxd’ is also sometimes encountered.

Parameters:
  • file (str | FileIO | Path) – The file to read.

  • raw (bool, optional) – Defaults to False. If True, returns an unlabelled ndarray. Otherwise, a DataFrame will be returned.

  • tall (bool, optional) – Defaults to False. If True, a 1D data structure will be returned. If raw=False, a Series will be returned, otherwise a 1D ndarray.

Returns:

The matrix stored in the file.

Return type:

numpy.ndarray, pandas.DataFrame, or pandas.Series

balsa.routines.io.inro.to_emx(matrix: DataFrame | Series | ndarray, file: str | FileIO | Path, emmebank_zones: int)#

Writes an “internal” Emme matrix (found in <Emme Project>/Database/emmemat); with an ‘.emx’ extension. The number of zones that the Emmebank is dimensioned for must be known in order for the file to be written correctly.

Parameters:
  • matrix (pandas.DataFrame | pandas.Series | numpy.ndarray) – The matrix to write to disk. If a Series is given, it MUST have a MultiIndex with exactly 2 levels to unstack.

  • file (str | FileIO | Path) – The path or file handler to write to.

  • emmebank_zones (int) – The number of zones the target Emmebank is dimensioned for.

balsa.routines.io.inro.to_mdf(matrix: DataFrame | Series, file: str | FileIO | Path)#

Writes a matrix to Emme’s official “binary serialization” format, which can be loaded in Emme using inro.emme.matrix.MatrixData.load(). There is no official extension for this type of file; ‘.mdf’ is recommended.

Parameters:
  • matrix (pandas.DataFrame | panda.Series) – The matrix to write to disk. If a Series is given, it MUST have a MultiIndex with exactly 2 levels to unstack.

  • file (str | FileIO | Path) – The path or file handler to write to.

OMX format#

The Python library openmatrix already exists, but doesn’t provide a lot of interoperability with Pandas. Balsa provides wrapper functions that produce Pandas DataFrames and Series directly from OMX files.

balsa.routines.io.omx.read_omx(src_fp: str | PathLike, *, tables: Iterable[str] = None, mapping: str = None, tall: bool = False, raw: bool = False, squeeze: bool = True) DataFrame | Series | ndarray | Dict[str, DataFrame | Series | ndarray]#

Reads Open Matrix (OMX) files. An OMX file can contain multiple matrices, so this function typically returns a Dict.

Parameters:
  • src_fp (str | PathLike) – OMX file from which to read. Cannot be an open file handler.

  • tables (Iterable[str], optional) – List of matrices to read from the file. If None, all matrices will be read.

  • mapping (str, optional) – The zone number mapping to use, if known in advance.

  • tall (bool, optional) – If True, matrices will be returned in 1D format. Otherwise, a 2D object is returned.

  • raw (bool, optional) – If True, matrices will be returned as raw Numpy arrays. Otherwise, Pandas objects are returned

  • squeeze (bool, optional) – If True, and the file contains exactly one matrix, return that matrix instead of a Dict.

Returns:

The matrix, or matrices contained in the OMX file.

balsa.routines.io.omx.to_omx(dst_fp: str | PathLike, tables: Dict[str, DataFrame | Series | ndarray], *, zone_index: Index = None, title: str = '', descriptions: Dict[str, str] = None, attrs: Dict[str, Dict] = None, mapping_name: str = 'zone_numbers')#

Creates a new (or overwrites an old) OMX file with a collection of matrices.

Parameters:
  • dst_fp (str | PathLike) – OMX to write.

  • (Dict[str (tables) – Collection of matrices to write. MUST be a dict, to permit the encoding of matrix metadata, and must contain the same types: all Numpy arrays, all Series, or all DataFrames. Checking is done to ensure that all items have the same shape and labels.

  • np.ndarray] (pd.DataFrame | pd.Series |) – Collection of matrices to write. MUST be a dict, to permit the encoding of matrix metadata, and must contain the same types: all Numpy arrays, all Series, or all DataFrames. Checking is done to ensure that all items have the same shape and labels.

  • zone_index – (pd.Index, optional): Override zone labels to use. Generally only useful if writing a dict of raw Numpy arrays.

  • title (str, optional) – The title saved in the OMX file.

  • descriptions (Dict[str, str], optional) – A dict of descriptions (one for each given matrix).

  • attrs (Dict[str, Dict], optional) – A dict of dicts (one for each given matrix).

  • mapping_name (str, optional) – Name of the mapping internal to the OMX file

Fortran format#

balsa.routines.io.fortran.read_fortran_rectangle(file: str | FileIO | Path, n_columns: int, *, zones: int | Iterable[int] | Index = None, tall: bool = False, reindex_rows: bool = False, fill_value: int | float = None) ndarray | DataFrame | Series#

Reads a FORTRAN-friendly .bin file (a.k.a. ‘simple binary format’) which is known to NOT be square. Also works with square matrices.

This file format is an array of 4-bytes, where each row is prefaced by an integer referring to the 1-based positional index that FORTRAN uses. The rest of the data are in 4-byte floats. To read this, the number of columns present must be known, since the format does not self-specify.

Parameters:
  • file (str | FileIO | Path) – The file to read.

  • n_columns (int) – The number of columns in the matrix.

  • zones (int | Iterable[int] | pandas.Index, optional) – Defaults to None. An Index or Iterable will be interpreted as the zone labels for the matrix rows and columns; returning a DataFrame or Series (depending on tall). If an integer is provided, the returned ndarray will be truncated to this ‘number of zones’.

  • tall (bool, optional) – Defaults to False. If true, a ‘tall’ version of the matrix will be returned.

  • reindex_rows (bool, optional) – Defaults to False. If true, and zones is an Index, the returned DataFrame will be reindexed to fill-in any missing rows.

  • fill_value (optional) – Defaults to None. The value to pass to pandas.reindex()

Returns:

numpy.ndarray, pandas.DataFrame or pandas.Series

Raises:

AssertionError – if the shape is not valid.

balsa.routines.io.fortran.read_fortran_square(file: str | FileIO | Path, *, zones: int | Iterable[int] | Index = None, tall: bool = False) ndarray | DataFrame | Series#

Reads a FORTRAN-friendly .bin file (a.k.a. ‘simple binary format’) which is known to be square.

This file format is an array of 4-bytes, where each row is prefaced by an integer referring to the 1-based positional index that FORTRAN uses. The rest of the data are in 4-byte floats. To read this, the number of columns present must be known, since the format does not self-specify. This method can infer the shape if it is square.

Parameters:
  • file (str | FileIO | Path) – The file to read.

  • zones (int | pandas.Index | Iterable[int], optional) – Defaults to None. An Index or Iterable will be interpreted as the zone labels for the matrix rows and columns; returning a DataFrame or Series (depending on tall). If an integer is provided, the returned ndarray will be truncated to this ‘number of zones’. Otherwise, the returned ndarray will be size to the maximum number of zone dimensioned by the Emmebank.

  • tall (bool, optional) – Defaults to False. If True, a 1D data structure will be returned. If zone_index is provided, a Series will be returned, otherwise a 1D ndarray.

Returns:

numpy.ndarray, pandas.DataFrame, or pandas.Series

balsa.routines.io.fortran.to_fortran(matrix: ndarray | DataFrame | Series, file: str | FileIO | Path, *, n_columns: int = None, min_index: int = 1, force_square: bool = True)#

Writes a FORTRAN-friendly .bin file (a.k.a. ‘simple binary format’), in a square format.

Parameters:
  • matrix (pandas.DataFrame | pandas.Series | numpy.ndarray) – The matrix to write to disk. If a Series is given, it MUST have a MultiIndex with exactly 2 levels to unstack.

  • file (str | FileIO | Path) – The path or file handler to write to.

  • n_columns (int, optional) – Defaults to None. Specifies a desired “width” of the matrix file. For example, n_columns=4000 on a 3500x3500 matrix will pad the width with 500 extra columns containing 0. If None is provided or the value is <= the width of the given matrix, no padding will be performed.

  • min_index (int, optional) – Defaults to 1. The lowest numbered row. Used when slicing matrices

  • force_square (bool, optional) – Defaults to True.

Network Packages (NWP)#

For more information on the TMGToolbox Network Package format, please visit https://tmg.utoronto.ca/doc/1.6/tmgtoolbox/input_output/ExportNetworkPackage.html

balsa.routines.io.nwp.parse_tmg_ncs_line_id(s: Series) Tuple[Series, Series]#

A function to parse line IDs based on TMG Network Coding Standard conventions. Returns pandas Series objects corresponding to the parsed operator and route IDs

balsa.routines.io.nwp.process_emme_eng_notation_series(s: ~pandas.core.series.Series, *, to_dtype=<class 'float'>) Series#

A function to convert Pandas Series containing values in Emme’s engineering notation

balsa.routines.io.nwp.read_nwp_base_network(nwp_fp: str | PathLike) Tuple[DataFrame, DataFrame]#

A function to read the base network from a Network Package file (exported from Emme using the TMG Toolbox) into DataFrames.

Parameters:

nwp_fp (str | PathLike) – File path to the network package.

Returns:

A tuple of DataFrames containing the nodes and links

Return type:

Tuple[pd.DataFrame, pd.DataFrame]

balsa.routines.io.nwp.read_nwp_exatts_list(nwp_fp: str | PathLike, **kwargs) DataFrame#

A function to read the extra attributes present in a Network Package file (exported from Emme using the TMG Toolbox).

Parameters:
  • nwp_fp (str | PathLike) – File path to the network package.

  • **kwargs – Any valid keyword arguments used by pandas.read_csv().

Returns:

pd.DataFrame

A function to read link attributes from a Network Package file (exported from Emme using the TMG Toolbox).

Parameters:
  • nwp_fp (str | PathLike) – File path to the network package.

  • attributes (str | List[str], optional) – Defaults to None. Names of link attributes to extract. Note that 'inode' and 'jnode' will be included by default.

  • **kwargs – Any valid keyword arguments used by pandas.read_csv().

Returns:

pd.DataFrame

balsa.routines.io.nwp.read_nwp_node_attributes(nwp_fp: str | PathLike, *, attributes: str | List[str] = None, **kwargs) DataFrame#

A function to read node attributes from a Network Package file (exported from Emme using the TMG Toolbox).

Parameters:
  • nwp_fp (str | PathLike) – File path to the network package.

  • attributes (str | List[str], optional) – Defaults to None. Names of node attributes to extract. Note that 'inode' will be included by default.

  • **kwargs – Any valid keyword arguments used by pandas.read_csv().

Returns:

pd.DataFrame

balsa.routines.io.nwp.read_nwp_traffic_results(nwp_fp: str | PathLike) DataFrame#

A function to read the traffic assignment results from a Network Package file (exported from Emme using the TMG Toolbox).

Parameters:

nwp_fp (str | PathLike) – File path to the network package.

Returns:

pd.DataFrame

balsa.routines.io.nwp.read_nwp_traffic_results_at_countpost(nwp_fp: str | PathLike, countpost_att: str) DataFrame#

A function to read the traffic assignment results at countposts from a Network Package file (exported from Emme using the TMG Toolbox).

Parameters:
  • nwp_fp (str | PathLike) – File path to the network package.

  • countpost_att (str) – The name of the extra link attribute containing countpost identifiers. Results will be filtered using this attribute.

Returns:

pd.DataFrame

balsa.routines.io.nwp.read_nwp_transit_line_attributes(nwp_fp: str | PathLike, *, attributes: str | List[str] = None, **kwargs) DataFrame#

A function to read transit line attributes from a Network Package file (exported from Emme using the TMG Toolbox).

Parameters:
  • nwp_fp (str | PathLike) – File path to the network package.

  • attributes (str | List[str], optional) – Defaults to None. Names of transit line attributes to extract. Note that 'line' will be included by default.

  • **kwargs – Any valid keyword arguments used by pandas.read_csv().

Returns:

pd.DataFrame

balsa.routines.io.nwp.read_nwp_transit_network(nwp_fp: str | PathLike) Tuple[DataFrame, DataFrame]#

A function to read the transit network from a Network Package file (exported from Emme using the TMG Toolbox) into DataFrames.

Parameters:

nwp_fp (str | PathLike) – File path to the network package.

Returns:

A tuple of DataFrames containing the transt lines and segments.

Return type:

Tuple[pd.DataFrame, pd.DataFrame]

balsa.routines.io.nwp.read_nwp_transit_result_summary(nwp_fp: str | PathLike, *, parse_line_id: bool = True) DataFrame#

A function to read and summarize the transit assignment boardings and max volumes from a Network Package file (exported from Emme using the TMG Toolbox) by operator and route.

Parameters:
  • nwp_fp (str | PathLike) – File path to the network package.

  • parse_line_id (bool, optional) – Defaults to True. Option to parse operator and route IDs from line IDs. Please note that transit line IDs must adhere to the TMG NCS16 for this option to work properly.

Returns:

pd.DataFrame

balsa.routines.io.nwp.read_nwp_transit_segment_results(nwp_fp: str | PathLike) DataFrame#

A function to read and summarize the transit segment boardings, alightings, and volumes from a Network Package file (exported from Emme using the TMG Toolbox).

Parameters:

nwp_fp (str | PathLike) – File path to the network package.

Returns:

pd.DataFrame

balsa.routines.io.nwp.read_nwp_transit_station_results(nwp_fp: str | PathLike, station_line_nodes: List[int]) DataFrame#

A function to read and summarize the transit boardings (on) and alightings (offs) at stations from a Network Package file (exported from Emme using the TMG Toolbox).

Note

Ensure that station nodes being specified are on the transit line itself and are not station centroids.

Parameters:
  • nwp_fp (str | PathLike) – File path to the network package.

  • station_line_nodes (List[int]) – List of transit line nodes representing transit stops/stations

Returns:

pd.DataFrame

balsa.routines.io.nwp.read_nwp_transit_vehicles(nwp_fp: str | PathLike) DataFrame#

A function to read the transit vehicles from a Network Package file (exported from Emme using the TMG Toolbox) into DataFrames.

Parameters:

nwp_fp (str | PathLike) – File path to the network package.

Returns:

DataFrame containing the transit vehicles.

Return type:

pd.DataFrame