compact.utils
module with generally useful functions used in multiple modules
- compact.utils.mcl_available()
check whether MCL tool is in PATH and marked as an executable
- compact.utils.download_sample_abuns(sample_id, out_fn, output_ids=['prot_ids'], prot_ids=[], id_type='prot_ids', url='https://www3.cmbi.umcn.nl/cedar/api/abundances')
fetch abundances of a sample from CEDAR
- Args:
- sample_id (int):
CEDAR CRS number of a sample
- out_fn (str):
filepath, output location
- output_ids (list of strings, optional): Defaults to [‘prot_ids’].
the types of protein ids to include in the output. options: “prot_ids”,”prot_names”,”gene_names”
- prot_ids (list, optional): _description_. Defaults to [].
if empty: fetches complete complexome profile otherwise: only fetch abundances for proteins matching given ids
- id_type (str, optional): Defaults to ‘prot_ids’.
type of identifier used when providing prot_ids
- url (str, optional): Defaults to ‘https://www3.cmbi.umcn.nl/cedar/api/abundances’.
url of CEDAR fetch_abundances api endpoint
- compact.utils.map_df_index(df, mapping)
rename df’s index using given mapping {index:new_id}
original id is used for ids that have no mapping
- Args:
- df (pd.Dataframe):
table to map index of
- mapping (dict):
identifier mapping to use
- Returns:
pd.Dataframe: table with mapped index
- compact.utils.get_stripped_mapping(full_id_list, sep='::')
strip appendix from full ids
stripped ids are stored in dict mapping back to full ids
- Args:
- full_id_list (list):
list of full ids to be stripped
- sep (str, optional): Defaults to ‘::’.
separator between raw id and appendix
- Returns:
- dict: stripped ids as keys mapping to their original
full ids with appendix
- compact.utils.invert_mapping(mapping_dict)
inverts given dict
- compact.utils.get_comp_mapping(left, right, nested_tags, mappings)
grab mapping for the current comparison of individual samples
- Args:
- left|right (str):
sample-level tags in this comparison
- nested_tags (dict): dict with nested tag structure for profiles
keys: collection-level tags values: sample-level tags
- mappings (dict): contains id mappings between collections
keys: tuple with (query,subject) collection-level tags values: dicts with id mappings from query to subject
- Returns:
dict: mapping for given sample-level query,subject comparison
- compact.utils.get_col_mapping(left, right, mappings)
grab mapping for comparison of 2 collections from mappings
- Args:
- left/right (string):
collection identifiers
- mappings (dict of dicts):
dict with all available mappings
- Returns:
- dict: mapping between left and right collections
keys: left collection ids values: corresponding right collection ids
- compact.utils.get_sample_tags(nested_tags)
get list of all replicate tags from nested_tags
- Args:
nested_tags (dict of dicts): collection-replicate id structure
- Returns:
list: all replicate-level ids
- compact.utils.get_comparison_matches(left, right, mapping=None)
determine id matches between comparison
to get the id matches between indexes of to-be-compared samples, optionally using a mapping
- Args:
- left|right (list-like):
indexes of compared samples
- mapping (dict or None, optional): Defaults to None.
dict with id mapping from left to right if None ids are directly compared
- Returns:
- list: ids that match between indexes
OR
- dict: matching id pairs from left and right
key: left id, value: right id
- compact.utils.get_cluster_max_fraction(clusters, profile)
determine fraction in profile where cluster mean abundance is at max
- Args:
- clusters (dict):
cluster ids and lists of members
- profile (pd.DataFrame):
table with protein (rows) abundances in a number of fractions (columns)
- Returns:
dict: for each cluster fraction at which mean abundance is at max value
- compact.utils.correlate_samples(samples, method='pearson')
compute interaction matrices for given samples
- Args:
- samples (dict of pd.df): samples to correlate
samples are dataframes of feature data feature ids should be in index. column values should contain feature data
- method (str, optional): Defaults to “pearson”.
correlation method to be used. valid options: ‘pearson’,’kendall’,’spearman’
- Returns:
- dict of pd.df: int_matrices
contains symmetrical interaction matrix for each input sample
- compact.utils.eprint(*args, **kwargs)
like normal print but writes to stderror