mapqc.run_mapqc#

mapqc.run_mapqc(adata, adata_emb_loc, ref_q_key, q_cat, r_cat, sample_key, n_nhoods, k_min, k_max, min_n_cells=10, min_n_samples_r=3, study_key=None, exclude_same_study=True, grouping_key=None, distance_metric='energy_distance', seed=None, overwrite=False, return_nhood_info_df=False, return_sample_dists_to_ref_df=False, verbose=True)#

Calculate mapQC scores.

This function modifies the input AnnData object in-place by adding several new columns to adata.obs:

‘mapqc_score’: Contains the mapqc scores for query cells (NaN for reference cells)
‘mapqc_filtering’: Contains filtering information for query cells (None for reference cells)
‘mapqc_nhood_filtering’: Contains filtering information for each neighborhood
‘mapqc_nhood_number’: Contains the number of the neighborhood
‘mapqc_k’: Contains the size of the neighborhood

It also adds a dictionary including the input parameter values to adata.uns[‘mapqc_params’]

Finally, if return_nhood_info_df is True, the function will return a pandas DataFrame containing the neighborhood information, and if return_sample_dists_to_ref_df is True, the function will return a pandas DataFrame containing the sample distances to reference for each neighborhood.

Parameters:

adata (AnnData) – The AnnData object including both the reference and the query cells. This object will be modified in-place. Important! The AnnData object should include only controls for the reference, and should include some controls for the query.
adata_emb_loc (str) – The location of the embeddings in adata.obsm or “X” if the embedding is in adata.X
ref_q_key (str) – Key in adata.obs that contains reference/query labels
q_cat (str) – Category label for query samples
r_cat (str) – Category label for reference samples
sample_key (str) – Key in adata.obs that contains sample identifiers
n_nhoods (int) – Number of neighborhoods to analyze
k_min (int) – Minimum number of cells per neighborhood
k_max (int) – Maximum number of cells per neighborhood, if the neighborhood of size k_min does not fulfill filtering criteria.
min_n_cells (int (default: 10)) – Minimum number of cells required per sample, in a neighborhood. Default is 10.
min_n_samples_r (int (default: 3)) – Minimum number of reference samples (with at least min_n_cells cells) required per neighborhood. Default is 3.
exclude_same_study (bool (default: True)) – Whether to exclude samples from the same study when calculating distances between reference samples. To prevent bias in inter-sample distances within the reference, we recommend excluding inter-sample distances between samples from the same study, i.e. setting this argument to True. Default is True.
study_key (str | None (default: None)) – Key in adata.obs that contains study identifiers (needed if exclude_same_study is True)
grouping_key (str | None (default: None)) – Key in adata.obs that contains grouping information, which will be used to sample center cells (i.e. the centers of neighborhoods). If not provided, center cells will be sampled randomly from the query. If provided, center cells will be sampled based on query and reference cell proportions per group of the grouping key. This can be set to e.g. a clustering performed on the joint reference and query, or a (preliminary) cell type annotation of reference and query.
distance_metric (Literal['energy_distance', 'pairwise_euclidean'] (default: 'energy_distance')) – Distance metric to use to calculate distances between samples (i.e. between two sets of cells). Default is “energy_distance”.
seed (int | None (default: None)) – Seed for random number generator. Set the seed to ensure reproducibility of results.
overwrite (bool (default: False)) – Whether to overwrite existing mapqc_score and mapqc_filtering columns in adata.obs. Default is False.
return_nhood_info_df (bool (default: False)) – Whether to return a pandas DataFrame containing detailed neighborhood information. This can be useful for debugging, or for getting a detailed understanding of your neighborhoods and the mapqc output. Default is False.
return_sample_dists_to_ref_df (bool (default: False)) – Whether to return a pandas DataFrame containing the sample distances to reference for each neighborhood. Default is False.
verbose (bool (default: True)) – Whether to print progress messages. Default is True.

Return type:

None | DataFrame | tuple[DataFrame, DataFrame]

Returns:

None or pd.DataFrame or tuple This function modifies the input AnnData object in-place by adding:

’mapqc_score’
’mapqc_filtering’
’mapqc_nhood_filtering’
’mapqc_nhood_number’
’mapqc_k’

columns to adata.obs. It furthermore adds a dictionary including the input parameter values to adata.uns[‘mapqc_params’].

The return value depends on the input parameters:

If return_nhood_info_df is True, returns a pandas DataFrame containing detailed neighborhood information.
If return_sample_dists_to_ref_df is True, returns a pandas DataFrame containing the sample distances to reference.
If both are True, returns a tuple of (nhood_info_df, sample_dists_to_ref_df).
If neither is True, returns None.

mapqc.run_mapqc

Contents

mapqc.run_mapqc#