cvanmf.stability

Describe how stable signatures are across initialisations and ranks.

It can be useful to look at how similar signatures are across multiple random intialisations. When a signatures is not frequently repeated across iterations, we can consider this as potentially as poor selection of rank, or place less confidence in those signatures. This can also serve as a method of rank selection, by looking for ranks where signatures show high similarity across ranks.

Functions in this module are primarily concerned with looking at stability of each signature, rather than rank selection.

The main functions are signature_stability() and plot_signature_stability().

For rank selection using stability, see instead cvanmf.denovo.signature_similarity().

Attributes

Functions

align_signatures(→ List[pandas.DataFrame])

Give signature matrices matching indices.

compare_signatures(→ pandas.DataFrame)

Compare how similar signatures are between two models.

get_signatures_from_comparable(→ pandas.DataFrame)

Get the signature matrix from any of the comparable types.

match_signatures(→ pandas.DataFrame)

Match signatures between two models maximising cosine similarity.

plot_signature_stability(→ plotnine.ggplot)

Plot the similarity of signatures across multiple decompositions.

signature_stability(→ pandas.DataFrame)

Characterise how similar signatures are across random initialisations.

signature_stability(→ pandas.DataFrame)

Characterise how similar signatures are across random initialisations.

Module Contents

cvanmf.stability.align_signatures(*args) List[pandas.DataFrame][source]

Give signature matrices matching indices.

Signatures from different matrices potentially will have some different features (species observed in one matrix and not another). For the comparisons we use, need to have signatures have the same set of features in the same order.

Pass any number of Comparable type objects, or a single iterable of Comparable type objects.

Uses exact matching of feature name strings from the index of the W matrix DataFrame.

cvanmf.stability.compare_signatures(a: Comparable, b: Comparable) pandas.DataFrame[source]

Compare how similar signatures are between two models.

For models learnt on similar data, signatures recovered may also be similar. We can characterise the similarity by the angle between the signature vectors. This functions aligns W matrices (so the features are the union of those in a and b), and calculates pairwise cosine similarity between signatures.

This returns a DataFrame with signatures of A on rows, and B on columns with entry i,j being the cosine of the angle between aligned signature vectors A[,i] and B[,j].

Parameters:
  • a – Set of signatures to be on rows

  • b – Set of signatures to be on columns

Returns:

Pairwise cosine of angle between signatures

cvanmf.stability.get_signatures_from_comparable(c: Comparable) pandas.DataFrame[source]

Get the signature matrix from any of the comparable types.

cvanmf.stability.match_signatures(a: Comparable, b: Comparable) pandas.DataFrame[source]

Match signatures between two models maximising cosine similarity.

Find the pairing of signatures which are most similar. More technically, this finds the pairing of signatures which maximises the total cosine similarity using the Hungarian algorithm. It is possible that a signature gets paired with another for which the cosine similarity is not highest, suggesting a potentially bad match between some signatures in the model.

The return is a dataframe with columns a and b for which signatures are paired, the cosine similarity of the pairing, and the maximum ‘off-target’ cosine value for any of the signatures which it was not assigned to. The intention for the off-target score is that ideally this would be low, and the paired similarity high: signatures match well their paired one, while being dissimilar to all others.

Parameters:
  • a – Signature matrix, or object with signature matrix

  • b – Signature matrix, or object with signature matrix

Returns:

DataFrame with pairing and scores

cvanmf.stability.plot_signature_stability(stability: pandas.DataFrame, colors: List[str] | None = None, ncol: int = 6, geom_boxplot: Dict[str, Any] = None, geom_line: bool | Dict[str, Any] | None = None) plotnine.ggplot[source]

Plot the similarity of signatures across multiple decompositions.

The distribution of how similar (measured by cosine similarity) the paired signature in each other model is will be represented as boxplots. Each panel is the distribution for one rank.

Parameters:
  • stability – DataFrame in format returned by signature_stability().

  • colors – Colours to be applied to signatures. If list is shorter than the number of signatures, excess will be grey.

  • ncol – Number of columns in the plot.

  • geom_boxplot – Arguments to pass to plotnine’s geom_boxplot class.

  • geom_line – Arguments to pass to plotnine’s geom_line class. If set to True, will draw lines connecting signatures from each model with default styling; pass dictionary to alter styling of lines.

cvanmf.stability.signature_stability(decomps: List[cvanmf.denovo.Decomposition], to: Comparable = None) pandas.DataFrame[source]

Characterise how similar signatures are across random initialisations.

Compare the signatures in to to those in decomps. If to is None, the first value in decomps will be used.

Each model in decomps is compared to reference to using match_signatures.

Note that if you compare signatures for multiple ranks, the orders of signatures are not related across ranks, i.e. S1 in k=2 and S1 in k=3 are not related.

Parameters:
  • decomps – Decompositions to compare to. If this is a list, all decompositions must have the same rank.

  • to – Reference decompositions to compare each of decomps to. If this None then the first item in the list will be used.

cvanmf.stability.signature_stability(decomps: Dict[int, List[cvanmf.denovo.Decomposition]], to: Comparable = None) pandas.DataFrame[source]

Characterise how similar signatures are across random initialisations.

This versions accepts a dictionary of results, with keys being rank, and values list of decompositions (the output format of cvanmf.denovo.decompositions()). For each rank, the first (best) decomposition is compared to the others.

cvanmf.stability.Comparable
cvanmf.stability.logger: logging.Logger