General Utilities#
- balsa.routines.general.align_categories(iterable: Series | DataFrame)#
Pre-processing step for
pd.concat()
which attempts to align any Categorical series in the sequence to using the same set of categories. It passes through the sequence twice: once to accumulate the complete set of all categories used in the sequence; and a second time to modify the sequence’s contents to use this full set. The contents of the sequence are modified in-place.Note
The resulting categories will be lex-sorted (based on the
sorted()
builtin)- Parameters:
iterable (pandas.Series | pandas.DataFrame) – Any iterable of Series or DataFrame objects (anything that is acceptable to
pandas.concat()
)
- balsa.routines.general.is_identifier(name: str) bool #
Tests that the name is a valid Python variable name and does not collide with reserved keywords
- Parameters:
name (str) – Name to test
- Returns:
If the name is ‘Pythonic’
- Return type:
bool
- balsa.routines.general.sort_nicely(l: List[str]) List[str] #
Sort the given list of strings in the way that humans expect.
- Parameters:
l (List[str]) – List of strings to sort.
- Returns:
The sorted list of strings
- Return type:
List[str]
- balsa.routines.general.sum_df_sequence(seq: Iterable[DataFrame], *, fill_value: int | float = 0) DataFrame #
Sums over a sequence of DataFrames, even if they have different indexes or columns, filling in 0 (or a value of your choice) for missing rows or columns. Useful when you have a sequence of DataFrames which are supposed to have the same indexes and columns but might be missing a few values.
- Parameters:
seq (Iterable[pandas.DataFrame]) – Any iterable of DataFrame type, ordered or unordered.
fill_value (int | float, optional) – Defaults to
0
. The value to use for missing cells.
- Returns:
The sum over all items in seq.
- Return type:
pandas.DataFrame