compact.utils

module with generally useful functions used in multiple modules

compact.utils.mcl_available()

check whether MCL tool is in PATH and marked as an executable

compact.utils.download_sample_abuns(sample_id, out_fn, output_ids=['prot_ids'], prot_ids=[], id_type='prot_ids', url='https://www3.cmbi.umcn.nl/cedar/api/abundances')

fetch abundances of a sample from CEDAR

Args:
sample_id (int):

CEDAR CRS number of a sample

out_fn (str):

filepath, output location

output_ids (list of strings, optional): Defaults to [‘prot_ids’].

the types of protein ids to include in the output. options: “prot_ids”,”prot_names”,”gene_names”

prot_ids (list, optional): _description_. Defaults to [].

if empty: fetches complete complexome profile otherwise: only fetch abundances for proteins matching given ids

id_type (str, optional): Defaults to ‘prot_ids’.

type of identifier used when providing prot_ids

url (str, optional): Defaults to ‘https://www3.cmbi.umcn.nl/cedar/api/abundances’.

url of CEDAR fetch_abundances api endpoint

compact.utils.map_df_index(df, mapping)

rename df’s index using given mapping {index:new_id}

original id is used for ids that have no mapping

Args:
df (pd.Dataframe):

table to map index of

mapping (dict):

identifier mapping to use

Returns:

pd.Dataframe: table with mapped index

compact.utils.get_stripped_mapping(full_id_list, sep='::')

strip appendix from full ids

stripped ids are stored in dict mapping back to full ids

Args:
full_id_list (list):

list of full ids to be stripped

sep (str, optional): Defaults to ‘::’.

separator between raw id and appendix

Returns:
dict: stripped ids as keys mapping to their original

full ids with appendix

compact.utils.invert_mapping(mapping_dict)

inverts given dict

compact.utils.get_comp_mapping(left, right, nested_tags, mappings)

grab mapping for the current comparison of individual samples

Args:
left|right (str):

sample-level tags in this comparison

nested_tags (dict): dict with nested tag structure for profiles

keys: collection-level tags values: sample-level tags

mappings (dict): contains id mappings between collections

keys: tuple with (query,subject) collection-level tags values: dicts with id mappings from query to subject

Returns:

dict: mapping for given sample-level query,subject comparison

compact.utils.get_col_mapping(left, right, mappings)

grab mapping for comparison of 2 collections from mappings

Args:
left/right (string):

collection identifiers

mappings (dict of dicts):

dict with all available mappings

Returns:
dict: mapping between left and right collections

keys: left collection ids values: corresponding right collection ids

compact.utils.get_sample_tags(nested_tags)

get list of all replicate tags from nested_tags

Args:

nested_tags (dict of dicts): collection-replicate id structure

Returns:

list: all replicate-level ids

compact.utils.get_comparison_matches(left, right, mapping=None)

determine id matches between comparison

to get the id matches between indexes of to-be-compared samples, optionally using a mapping

Args:
left|right (list-like):

indexes of compared samples

mapping (dict or None, optional): Defaults to None.

dict with id mapping from left to right if None ids are directly compared

Returns:
list: ids that match between indexes

OR

dict: matching id pairs from left and right

key: left id, value: right id

compact.utils.get_cluster_max_fraction(clusters, profile)

determine fraction in profile where cluster mean abundance is at max

Args:
clusters (dict):

cluster ids and lists of members

profile (pd.DataFrame):

table with protein (rows) abundances in a number of fractions (columns)

Returns:

dict: for each cluster fraction at which mean abundance is at max value

compact.utils.correlate_samples(samples, method='pearson')

compute interaction matrices for given samples

Args:
samples (dict of pd.df): samples to correlate

samples are dataframes of feature data feature ids should be in index. column values should contain feature data

method (str, optional): Defaults to “pearson”.

correlation method to be used. valid options: ‘pearson’,’kendall’,’spearman’

Returns:
dict of pd.df: int_matrices

contains symmetrical interaction matrix for each input sample

compact.utils.eprint(*args, **kwargs)

like normal print but writes to stderror