cvanmf.reapply¶
Reapply existing Enterosignature models to new abundance data.
The easiest way to do this is through the reapply() function, which is
most flexible about parameter types. The other functions perform individual
steps, which are useful if you want fine control of a given step, but
probably not necessary for most uses.
Attributes¶
Classes¶
Connect new data table features to those in the model. |
|
Signature for functions which perform feature matching. |
|
Signature for functions which perform input validation. |
Functions¶
|
Command line interface to fit new data to an existing NMF Signatures |
|
Match taxonomic names in the input table and the Enterosignatures W |
|
Match features by identical labels only. |
|
Transform the input data into model weights. |
|
Load and transform abundances to an existing model. |
|
Basic checks and transformations of the abundance table. |
Module Contents¶
- class cvanmf.reapply.FeatureMapping(target_features: Set[str], source_features: Set[str], hard_map: Dict[str, str] | None = None)[source]¶
Connect new data table features to those in the model.
Manage the mappings from input features to model features. Source features are the features in the new abundance table we want to fit to the model; target features are the features in the model we’re trying to match to. User defined mappings can be provided via hard_map, any subsequent mappings for a source taxon in hard_map will be ignored. New mappings are added via
add(). When mappings are fully defined the model w matrix and the new data table can be matched usingtransform_w()andtransform_abundance()- Parameters:
target_features (Set[str]) – Model features to map to
source_features (Set[str]) – Input features to be mapped from
hard_map (Optional[Dict[str, str]]) – User defined mappings, as a dictionary with source as key and target as value.
- add(feature_from: str, feature_to: str) None[source]¶
Add a mapping. If there is already a mapping from this feature, we will append this one. Use
conflicts()to identify where more than one mapping exists.- Parameters:
feature_from (str) – Feature in the new table
feature_to (str) – Model feature to map to
- Raises:
EnteroException – Feature not in the relevant sets
- missing() Collection[str][source]¶
Identify input features which currently have no mapping.
- Returns:
Source features which are not mapping to any model feature
- Return type:
Collection[str]
- to_df() pandas.DataFrame[source]¶
Produce a dataframe of the mapping. Where mappings are amibiguous, multiple rows will be included. Where mappings are missing, one row with a blank target will be included.
- Returns:
DataFrame with two columns, first source feature, second target feature.
- Return type:
pd.DataFrame
- transform_abundance(abd_tbl: pandas.DataFrame) pandas.DataFrame[source]¶
Applying mapping to the input table.
Make a table with renamed and combined rows based on the identified mappings.
- Parameters:
abd_tbl (pd.DataFrame) – New table, samples on columns
- Returns:
Table with mappings applied
- Return type:
pd.DataFrame
- transform_w(w: pandas.DataFrame, abd_tbl: pandas.DataFrame) pandas.DataFrame[source]¶
Match the model w matrix to the new table.
Make a W matrix which has features not in the abundance table removed, and rows added for features which are in the abundance table but not the model.
- Parameters:
w (pd.DataFrame) – Model W matrix
abd_tbl (pd.DataFrame) – New matrix. Should not have been transformed with
transform_abundance().
- Returns:
W matrix matched to new table
- Return type:
pd.DataFrame
- property conflicts: List[Tuple[str, List[str]]]¶
Features for which more than one target exists.
- property mapping: Dict[str, List[str]]¶
Mapping from source to target features.
- class cvanmf.reapply.FeatureMatch[source]¶
Bases:
ProtocolSignature for functions which perform feature matching.
- class cvanmf.reapply.InputValidation[source]¶
Bases:
ProtocolSignature for functions which perform input validation.
- cvanmf.reapply.cli(input: str, model: str, hard_mapping: str, rollup: bool, separator: str, output_dir: str) None¶
Command line interface to fit new data to an existing NMF Signatures model. The new data must use the same features as the model, though there can be some difference (features in now data not in model and vice versa). Currently this is GTDB r207 for the 5 Enterosignatures model.
For more on Enterosignatures see:
Frioux et al. 2023 (doi:10.1016/j.chom.2023.05.024)
- cvanmf.reapply.match_genera(w: pandas.DataFrame, y: pandas.DataFrame, hard_mapping: Dict[str, str] | None = None, family_rollup: bool = True, **kwargs) FeatureMapping[source]¶
Match taxonomic names in the input table and the Enterosignatures W matrix.
This function is currently based on the R script provided by Clemence in the Enterosignatures (ES) gitlab repo (prepare_matrices.R) https://gitlab.inria.fr/cfrioux/enterosignature-paper/. This will attempt to match names. Mappings in the
hard_mappingparameter are new names to ES names, and will be applied before any other matches identified.- Parameters:
w (pd.DataFrame) – Enterosignatures W matrix
y (pd.DataFrame) – Abundance table being transformed
hard_mapping (Dict[str, str]) – Mapping from input to ES name
family_rollup (bool) – Move abundance of genera which are not matched to the family level entry if one exists
logger (Callable[[Any], None]) – Function to log messages
- Returns:
Transformed abundance table, es W matrix, and mapping object
- Return type:
Tuple[pd.DataFrame, pd.DataFrame, List[str]]
- cvanmf.reapply.match_identical(w: pandas.DataFrame, y: pandas.DataFrame, **kwargs) FeatureMapping[source]¶
Match features by identical labels only.
- Parameters:
w – W matrix from model
y – Table of new data
- cvanmf.reapply.nmf_transform(new_abd: pandas.DataFrame, w_prime: pandas.DataFrame) pandas.DataFrame[source]¶
Transform the input data into model weights.
Takes the matched up W matrix and feature matrix. Expects the row ordering of W and feature matrix to be the same. Any NA values will be filled with 0.
- Parameters:
new_abd – Feature matrix matched to W
w_prime – Model weights
- Returns:
Model weights for the given model and abundances, note this is not relative abundance (do not sum to 1)
- cvanmf.reapply.reapply(y: str | pandas.DataFrame, model: str | cvanmf.models.Signatures = '5es', hard_mapping: str | pandas.DataFrame | None = None, separator: str = '\t', output_dir: str | None = None, **kwargs) denovo.Decomposition[source]¶
Load and transform abundances to an existing model.
The new data must be annotated against the same taxonomy the model uses. Currently for the 5 ES models this is GTDB r207. Feature names will be automatically matched between the abundance table and model where possible, (see
match_genera()). Most of the work is done intransform_table(), this mostly provides convenience of allowing parameters to be paths or DataFrames, or to specify models as string or object.- Parameters:
y – Feature matrix to transform. Can be a string giving path, or a DataFrame.
model – Model to use. Can be a Signature object, or the name of one of the provded Signature objects. Currently this is ‘5es’ for the 5ES model of Frioux et al. (2023, https://doi.org/10.1016/j.chom.2023.05.024).
hard_mapping – Define matchups between feature identifiers in y and those in model W matrix. These will be used in preference of any automated matches. Should be a table with index being y matrix identifier, and first column the model W identifier. Can be either a path, or DataFrame.
separator – Separator to use when reading and writing matrices.
output_dir – Directory to write results to. Directory will be created if it does not exist. Pass None for no output to disk.
**kwargs –
Passed to the Signature validate_input and match_feature functions.
- cvanmf.reapply.validate_genus_table(abd_tbl: pandas.DataFrame, **kwargs) pandas.DataFrame[source]¶
Basic checks and transformations of the abundance table.
Some transformations may be made here, such as transposition. Any transformation will be written out to inform the user. Transformations are done in place.
- Parameters:
abd_tbl (pd.DataFrame) – Abundance table to check
logger (Callable[[str], None]) – Function to report errors
- Returns:
Validated, potentially transformed, dataframe
- Return type:
pd.DataFrame
- cvanmf.reapply.logger: logging.Logger¶