cvanmf.reapply ============== .. py:module:: cvanmf.reapply .. autoapi-nested-parse:: Reapply existing Enterosignature models to new abundance data. The easiest way to do this is through the :func:`reapply` function, which is most flexible about parameter types. The other functions perform individual steps, which are useful if you want fine control of a given step, but probably not necessary for most uses. Attributes ---------- .. autoapisummary:: cvanmf.reapply.logger Classes ------- .. autoapisummary:: cvanmf.reapply.FeatureMapping cvanmf.reapply.FeatureMatch cvanmf.reapply.InputValidation Functions --------- .. autoapisummary:: cvanmf.reapply.cli cvanmf.reapply.match_genera cvanmf.reapply.match_identical cvanmf.reapply.nmf_transform cvanmf.reapply.reapply cvanmf.reapply.validate_genus_table Module Contents --------------- .. py:class:: FeatureMapping(target_features: Set[str], source_features: Set[str], hard_map: Optional[Dict[str, str]] = None) Connect new data table features to those in the model. Manage the mappings from input features to model features. Source features are the features in the new abundance table we want to fit to the model; target features are the features in the model we're trying to match to. User defined mappings can be provided via `hard_map`, any subsequent mappings for a source taxon in `hard_map` will be ignored. New mappings are added via :meth:`add`. When mappings are fully defined the model w matrix and the new data table can be matched using :meth:`transform_w` and :meth:`transform_abundance` :param target_features: Model features to map to :type target_features: Set[str] :param source_features: Input features to be mapped from :type source_features: Set[str] :param hard_map: User defined mappings, as a dictionary with source as key and target as value. :type hard_map: Optional[Dict[str, str]] .. py:method:: add(feature_from: str, feature_to: str) -> None Add a mapping. If there is already a mapping from this feature, we will append this one. Use :meth:`conflicts` to identify where more than one mapping exists. :param feature_from: Feature in the new table :type feature_from: str :param feature_to: Model feature to map to :type feature_to: str :raises EnteroException: Feature not in the relevant sets .. py:method:: missing() -> Collection[str] Identify input features which currently have no mapping. :return: Source features which are not mapping to any model feature :rtype: Collection[str] .. py:method:: to_df() -> pandas.DataFrame Produce a dataframe of the mapping. Where mappings are amibiguous, multiple rows will be included. Where mappings are missing, one row with a blank target will be included. :return: DataFrame with two columns, first source feature, second target feature. :rtype: pd.DataFrame .. py:method:: transform_abundance(abd_tbl: pandas.DataFrame) -> pandas.DataFrame Applying mapping to the input table. Make a table with renamed and combined rows based on the identified mappings. :param abd_tbl: New table, samples on columns :type abd_tbl: pd.DataFrame :return: Table with mappings applied :rtype: pd.DataFrame .. py:method:: transform_w(w: pandas.DataFrame, abd_tbl: pandas.DataFrame) -> pandas.DataFrame Match the model w matrix to the new table. Make a W matrix which has features not in the abundance table removed, and rows added for features which are in the abundance table but not the model. :param w: Model W matrix :type w: pd.DataFrame :param abd_tbl: New matrix. Should `not` have been transformed with :meth:`transform_abundance`. :type abd_tbl: pd.DataFrame :return: W matrix matched to new table :rtype: pd.DataFrame .. py:property:: conflicts :type: List[Tuple[str, List[str]]] Features for which more than one target exists. .. py:property:: mapping :type: Dict[str, List[str]] Mapping from source to target features. .. py:class:: FeatureMatch Bases: :py:obj:`Protocol` Signature for functions which perform feature matching. .. py:class:: InputValidation Bases: :py:obj:`Protocol` Signature for functions which perform input validation. .. py:function:: cli(input: str, model: str, hard_mapping: str, rollup: bool, separator: str, output_dir: str) -> None Command line interface to fit new data to an existing NMF Signatures model. The new data must use the same features as the model, though there can be some difference (features in now data not in model and vice versa). Currently this is GTDB r207 for the 5 Enterosignatures model. For more on Enterosignatures see: * Frioux et al. 2023 (doi:10.1016/j.chom.2023.05.024) * https://enterosignatures.quadram.ac.uk .. py:function:: match_genera(w: pandas.DataFrame, y: pandas.DataFrame, hard_mapping: Optional[Dict[str, str]] = None, family_rollup: bool = True, **kwargs) -> FeatureMapping Match taxonomic names in the input table and the Enterosignatures W matrix. This function is currently based on the R script provided by Clemence in the Enterosignatures (ES) gitlab repo (prepare_matrices.R) https://gitlab.inria.fr/cfrioux/enterosignature-paper/. This will attempt to match names. Mappings in the ``hard_mapping`` parameter are new names to ES names, and will be applied before any other matches identified. :param w: Enterosignatures W matrix :type w: pd.DataFrame :param y: Abundance table being transformed :type y: pd.DataFrame :param hard_mapping: Mapping from input to ES name :type hard_mapping: Dict[str, str] :param family_rollup: Move abundance of genera which are not matched to the family level entry if one exists :type family_rollup: bool :param logger: Function to log messages :type logger: Callable[[Any], None] :returns: Transformed abundance table, es W matrix, and mapping object :rtype: Tuple[pd.DataFrame, pd.DataFrame, List[str]] .. py:function:: match_identical(w: pandas.DataFrame, y: pandas.DataFrame, **kwargs) -> FeatureMapping Match features by identical labels only. :param w: W matrix from model :param y: Table of new data .. py:function:: nmf_transform(new_abd: pandas.DataFrame, w_prime: pandas.DataFrame) -> pandas.DataFrame Transform the input data into model weights. Takes the matched up W matrix and feature matrix. Expects the row ordering of W and feature matrix to be the same. Any NA values will be filled with 0. :param new_abd: Feature matrix matched to W :param w_prime: Model weights :return: Model weights for the given model and abundances, note this is not relative abundance (do not sum to 1) .. py:function:: reapply(y: Union[str, pandas.DataFrame], model: Union[str, cvanmf.models.Signatures] = '5es', hard_mapping: Optional[Union[str, pandas.DataFrame]] = None, separator: str = '\t', output_dir: Optional[str] = None, **kwargs) -> denovo.Decomposition Load and transform abundances to an existing model. The new data must be annotated against the same taxonomy the model uses. Currently for the 5 ES models this is GTDB r207. Feature names will be automatically matched between the abundance table and model where possible, (see :func:`match_genera`). Most of the work is done in :func:`transform_table`, this mostly provides convenience of allowing parameters to be paths or DataFrames, or to specify models as string or object. :param y: Feature matrix to transform. Can be a string giving path, or a DataFrame. :param model: Model to use. Can be a Signature object, or the name of one of the provded Signature objects. Currently this is '5es' for the 5ES model of Frioux et al. (2023, https://doi.org/10.1016/j.chom.2023.05.024). :param hard_mapping: Define matchups between feature identifiers in y and those in model W matrix. These will be used in preference of any automated matches. Should be a table with index being y matrix identifier, and first column the model W identifier. Can be either a path, or DataFrame. :param separator: Separator to use when reading and writing matrices. :param output_dir: Directory to write results to. Directory will be created if it does not exist. Pass None for no output to disk. :param **kwargs: Passed to the Signature validate_input and match_feature functions. .. py:function:: validate_genus_table(abd_tbl: pandas.DataFrame, **kwargs) -> pandas.DataFrame Basic checks and transformations of the abundance table. Some transformations may be made here, such as transposition. Any transformation will be written out to inform the user. Transformations are done in place. :param abd_tbl: Abundance table to check :type abd_tbl: pd.DataFrame :param logger: Function to report errors :type logger: Callable[[str], None] :returns: Validated, potentially transformed, dataframe :rtype: pd.DataFrame .. py:data:: logger :type: logging.Logger