General Utilities#

balsa.routines.general.align_categories(iterable: Series | DataFrame)#

Pre-processing step for pd.concat() which attempts to align any Categorical series in the sequence to using the same set of categories. It passes through the sequence twice: once to accumulate the complete set of all categories used in the sequence; and a second time to modify the sequence’s contents to use this full set. The contents of the sequence are modified in-place.

Note

The resulting categories will be lex-sorted (based on the sorted() builtin)

Parameters:

iterable (pandas.Series | pandas.DataFrame) – Any iterable of Series or DataFrame objects (anything that is acceptable to pandas.concat())

balsa.routines.general.is_identifier(name: str) bool#

Tests that the name is a valid Python variable name and does not collide with reserved keywords

Parameters:

name (str) – Name to test

Returns:

If the name is ‘Pythonic’

Return type:

bool

balsa.routines.general.sort_nicely(l: List[str]) List[str]#

Sort the given list of strings in the way that humans expect.

Parameters:

l (List[str]) – List of strings to sort.

Returns:

The sorted list of strings

Return type:

List[str]

balsa.routines.general.sum_df_sequence(seq: Iterable[DataFrame], *, fill_value: int | float = 0) DataFrame#

Sums over a sequence of DataFrames, even if they have different indexes or columns, filling in 0 (or a value of your choice) for missing rows or columns. Useful when you have a sequence of DataFrames which are supposed to have the same indexes and columns but might be missing a few values.

Parameters:
  • seq (Iterable[pandas.DataFrame]) – Any iterable of DataFrame type, ordered or unordered.

  • fill_value (int | float, optional) – Defaults to 0. The value to use for missing cells.

Returns:

The sum over all items in seq.

Return type:

pandas.DataFrame