{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "# Reapply existing signatures\n",
    "\n",
    "Simple example of fitting the provided example data to the 5 ES model.\n",
    "In this example, we load some abundance data provided in `cvanmf` using\n",
    "the `data.example_abundance()` function.\n",
    "This provides the data as a pandas DataFrame, with each row being a feature\n",
    "(in this case a bacterial genus) and each column a sample.\n",
    "**If you are using your own data**, you will first need to load it using pandas,\n",
    "typically via\n",
    "[read_csv](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-06T14:48:38.117964Z",
     "start_time": "2025-01-06T14:48:36.910754Z"
    }
   },
   "outputs": [],
   "source": [
    "from cvanmf import models, data\n",
    "import numpy as np\n",
    "\n",
    "# The example data is quite large, so we'll take a subset of samples\n",
    "result = models.five_es().reapply(\n",
    "    # The input should be as a pandas DataFrame, with features on rows and\n",
    "    # samples on columns. Data does not need to be normalised in this case,\n",
    "    # the `reapply` method of the `five_es` object will apply the appropriate\n",
    "    # transformations.\n",
    "    y=data.example_abundance().iloc[:, :30]\n",
    ")\n",
    "\n",
    "result.model_fit.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `reapply` method on one of the existing models in the `models` package often\n",
    "includes method to normalise and process the data into the correct format. For\n",
    "the `five_es` model this includes normalising to relative abundance, and\n",
    "attempting to match the names of taxa between input and the model. Check the\n",
    "description of each more in the API reference for details.\n",
    "\n",
    "The results object is a Decomposition class object. The Enterosignature weights\n",
    "for each sample are in the H matrix of the results object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-06T14:49:01.881093Z",
     "start_time": "2025-01-06T14:49:01.853922Z"
    }
   },
   "outputs": [],
   "source": [
    "result.h.head().iloc[:, :5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can get a scaled version of the H matrix (so each sample sums to 1), and visualise this using a mix of the inbuilt methods."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "assert np.allclose(result.scaled('h').sum(), 1.0)\n",
    "result.scaled('h').iloc[:, :5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The relative weights can be plotted as a stacked bar, using the `plot_relative_weight` method. By default, the top part of this plot shows the model fit of each sample (the cosine angle between the sample and X and in WH), with 1 being good and 0 bad. A default line of 0.4 for \"poor model fit\" is given, based on the Enterosignatures paper. Below this is a stacked bar plot, showing the relative abundance of each signature in each sample.\n",
    "\n",
    "Most plots in the package are produced using `plotnine`. `plot_relative_weight` uses [Marsilea](https://marsilea.readthedocs.io/en/stable/) instead, as the options for combining multiple plotnine plots are currently limitted. The object returned is a `WhiteBoard` object, which has a method `render()` which can be used to display the plot."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The heights parameter controls the relative height of each components\n",
    "result.plot_relative_weight(heights=[0.5, 0.7, 0.25], width=6, height=3, sample_label_size=6).render()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Examples of extra visualisations can be found in the \"De-novo signatures\" section."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.7"
  },
  "mystnb": {
   "execution_timeout": -1
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}