compact.utils

module with generally useful functions used in multiple modules

compact.utils.mcl_available(): check whether MCL tool is in PATH and marked as an executable

compact.utils.download_sample_abuns(sample_id, out_fn, output_ids=['prot_ids'], prot_ids=[], id_type='prot_ids', url='https://www3.cmbi.umcn.nl/cedar/api/abundances')

fetch abundances of a sample from CEDAR

Args:

sample_id (int):: CEDAR CRS number of a sample
out_fn (str):: filepath, output location
output_ids (list of strings, optional): Defaults to [‘prot_ids’].: the types of protein ids to include in the output. options: “prot_ids”,”prot_names”,”gene_names”
prot_ids (list, optional): _description_. Defaults to [].: if empty: fetches complete complexome profile otherwise: only fetch abundances for proteins matching given ids
id_type (str, optional): Defaults to ‘prot_ids’.: type of identifier used when providing prot_ids
url (str, optional): Defaults to ‘https://www3.cmbi.umcn.nl/cedar/api/abundances’.: url of CEDAR fetch_abundances api endpoint

compact.utils.map_df_index(df, mapping)

rename df’s index using given mapping {index:new_id}

original id is used for ids that have no mapping

Args:

df (pd.Dataframe):: table to map index of
mapping (dict):: identifier mapping to use

Returns:

pd.Dataframe: table with mapped index

compact.utils.get_stripped_mapping(full_id_list, sep='::')

strip appendix from full ids

stripped ids are stored in dict mapping back to full ids

Args:

full_id_list (list):: list of full ids to be stripped
sep (str, optional): Defaults to ‘::’.: separator between raw id and appendix

Returns:

dict: stripped ids as keys mapping to their original: full ids with appendix

compact.utils.invert_mapping(mapping_dict): inverts given dict

compact.utils.get_comp_mapping(left, right, nested_tags, mappings)

grab mapping for the current comparison of individual samples

Args:

left|right (str):: sample-level tags in this comparison
nested_tags (dict): dict with nested tag structure for profiles: keys: collection-level tags values: sample-level tags
mappings (dict): contains id mappings between collections: keys: tuple with (query,subject) collection-level tags values: dicts with id mappings from query to subject

Returns:

dict: mapping for given sample-level query,subject comparison

compact.utils.get_col_mapping(left, right, mappings)

grab mapping for comparison of 2 collections from mappings

Args:

left/right (string):: collection identifiers
mappings (dict of dicts):: dict with all available mappings

Returns:

dict: mapping between left and right collections: keys: left collection ids values: corresponding right collection ids

compact.utils.get_sample_tags(nested_tags)

get list of all replicate tags from nested_tags

Args:: nested_tags (dict of dicts): collection-replicate id structure
Returns:: list: all replicate-level ids

compact.utils.get_comparison_matches(left, right, mapping=None)

determine id matches between comparison

to get the id matches between indexes of to-be-compared samples, optionally using a mapping

Args:

left|right (list-like):: indexes of compared samples
mapping (dict or None, optional): Defaults to None.: dict with id mapping from left to right if None ids are directly compared

Returns:

list: ids that match between indexes: OR
dict: matching id pairs from left and right: key: left id, value: right id

compact.utils.get_cluster_max_fraction(clusters, profile)

determine fraction in profile where cluster mean abundance is at max

Args:

clusters (dict):: cluster ids and lists of members
profile (pd.DataFrame):: table with protein (rows) abundances in a number of fractions (columns)

Returns:

dict: for each cluster fraction at which mean abundance is at max value

compact.utils.correlate_samples(samples, method='pearson')

compute interaction matrices for given samples

Args:

samples (dict of pd.df): samples to correlate: samples are dataframes of feature data feature ids should be in index. column values should contain feature data
method (str, optional): Defaults to “pearson”.: correlation method to be used. valid options: ‘pearson’,’kendall’,’spearman’

Returns:

dict of pd.df: int_matrices: contains symmetrical interaction matrix for each input sample

compact.utils.eprint(*args, **kwargs): like normal print but writes to stderror