cvanmf.stability ================ .. py:module:: cvanmf.stability .. autoapi-nested-parse:: Describe how stable signatures are across initialisations and ranks. It can be useful to look at how similar signatures are across multiple random intialisations. When a signatures is not frequently repeated across iterations, we can consider this as potentially as poor selection of rank, or place less confidence in those signatures. This can also serve as a method of rank selection, by looking for ranks where signatures show high similarity across ranks. Functions in this module are primarily concerned with looking at stability of each signature, rather than rank selection. The main functions are :func:`signature_stability` and :func:`plot_signature_stability`. For rank selection using stability, see instead :func:`cvanmf.denovo.signature_similarity`. Attributes ---------- .. autoapisummary:: cvanmf.stability.Comparable cvanmf.stability.logger Functions --------- .. autoapisummary:: cvanmf.stability.align_signatures cvanmf.stability.compare_signatures cvanmf.stability.get_signatures_from_comparable cvanmf.stability.match_signatures cvanmf.stability.plot_signature_stability cvanmf.stability.signature_stability cvanmf.stability.signature_stability Module Contents --------------- .. py:function:: align_signatures(*args) -> List[pandas.DataFrame] Give signature matrices matching indices. Signatures from different matrices potentially will have some different features (species observed in one matrix and not another). For the comparisons we use, need to have signatures have the same set of features in the same order. Pass any number of Comparable type objects, or a single iterable of Comparable type objects. Uses exact matching of feature name strings from the index of the W matrix DataFrame. .. py:function:: compare_signatures(a: Comparable, b: Comparable) -> pandas.DataFrame Compare how similar signatures are between two models. For models learnt on similar data, signatures recovered may also be similar. We can characterise the similarity by the angle between the signature vectors. This functions aligns W matrices (so the features are the union of those in a and b), and calculates pairwise cosine similarity between signatures. This returns a DataFrame with signatures of A on rows, and B on columns with entry i,j being the cosine of the angle between aligned signature vectors A[,i] and B[,j]. :param a: Set of signatures to be on rows :param b: Set of signatures to be on columns :return: Pairwise cosine of angle between signatures .. py:function:: get_signatures_from_comparable(c: Comparable) -> pandas.DataFrame Get the signature matrix from any of the comparable types. .. py:function:: match_signatures(a: Comparable, b: Comparable) -> pandas.DataFrame Match signatures between two models maximising cosine similarity. Find the pairing of signatures which are most similar. More technically, this finds the pairing of signatures which maximises the total cosine similarity using the Hungarian algorithm. It is possible that a signature gets paired with another for which the cosine similarity is not highest, suggesting a potentially bad match between some signatures in the model. The return is a dataframe with columns a and b for which signatures are paired, the cosine similarity of the pairing, and the maximum 'off-target' cosine value for any of the signatures which it was not assigned to. The intention for the off-target score is that ideally this would be low, and the paired similarity high: signatures match well their paired one, while being dissimilar to all others. :param a: Signature matrix, or object with signature matrix :param b: Signature matrix, or object with signature matrix :returns: DataFrame with pairing and scores .. py:function:: plot_signature_stability(stability: pandas.DataFrame, colors: Optional[List[str]] = None, ncol: int = 6, geom_boxplot: Dict[str, Any] = None, geom_line: Optional[Union[bool, Dict[str, Any]]] = None) -> plotnine.ggplot Plot the similarity of signatures across multiple decompositions. The distribution of how similar (measured by cosine similarity) the paired signature in each other model is will be represented as boxplots. Each panel is the distribution for one rank. :param stability: DataFrame in format returned by :func:`signature_stability`. :param colors: Colours to be applied to signatures. If list is shorter than the number of signatures, excess will be grey. :param ncol: Number of columns in the plot. :param geom_boxplot: Arguments to pass to plotnine's geom_boxplot class. :param geom_line: Arguments to pass to plotnine's geom_line class. If set to True, will draw lines connecting signatures from each model with default styling; pass dictionary to alter styling of lines. .. py:function:: signature_stability(decomps: List[cvanmf.denovo.Decomposition], to: Comparable = None) -> pandas.DataFrame Characterise how similar signatures are across random initialisations. Compare the signatures in ``to`` to those in ``decomps``. If ``to`` is None, the first value in ``decomps`` will be used. Each model in ``decomps`` is compared to reference ``to`` using ``match_signatures``. Note that if you compare signatures for multiple ranks, the orders of signatures are not related across ranks, i.e. S1 in k=2 and S1 in k=3 are not related. :param decomps: Decompositions to compare to. If this is a list, all decompositions must have the same rank. :param to: Reference decompositions to compare each of decomps to. If this None then the first item in the list will be used. .. py:function:: signature_stability(decomps: Dict[int, List[cvanmf.denovo.Decomposition]], to: Comparable = None) -> pandas.DataFrame Characterise how similar signatures are across random initialisations. This versions accepts a dictionary of results, with keys being rank, and values list of decompositions (the output format of :func:`cvanmf.denovo.decompositions`). For each rank, the first (best) decomposition is compared to the others. .. py:data:: Comparable .. py:data:: logger :type: logging.Logger