Example application to cell composition data¶
In this example, we will look at the compositions of samples from a non-small cell lung cancer dataset.
The data orirginates from to this study. It was subsequently added to cellxgene, which is the source we have taken the data from. The data on cellxgene is gene expression, but also includes cell type labels for each cell. From this we derived the relative cell type composition of each sample, the proportion of each cell type in each sample.
from cvanmf import denovo, combine, data, stability
import pandas as pd
import plotnine as pn
import numpy as np
import matplotlib.pyplot as plt
import pathlib
lung_data = data.lung_cancer_cells()
lung_data
| Non-small cell lung cancer | |
| cvanmf example data | |
| shape | (33, 224) |
| rank | [] |
| description | Relative cell-compositions from non-small cell lung cancer studies. Gives the number of cells of different types in lung tissue samples from a non-small cell lung cancer atlas, which was compiled from 29 studies and includes 556 samples, from 318 individuals (86 of which are healthy controls). The data was uploaded to cellxgene using their standard ontologies, which is the source we have taken the data from. Metadata provided here is a mixture of metadata from cellxgene, and some from the original paper. We have selected out only the tissues samples labelled as "lung". In total, this gives 224 samples, and 33 cell types. The data here is total-sum-scaled, i.e. each sample sums to 1. |
| doi | https://doi.org/10.1016/j.ccell.2022.10.008 |
| citation | Salcher, S. et al. High-resolution single-cell atlas reveals diversity and plasticity of tissue-resident neutrophils in non-small cell lung cancer. Cancer Cell 40, 1503-1520.e8 (2022). Program, C. S.-C. B. et al. CZ CELL×GENE Discover: A single-cell data platform for scalable exploration, analysis and modeling of aggregated data. 2023.10.30.563174 Preprint at https://doi.org/10.1101/2023.10.30.563174 (2023). |
This is a named tuple object, with properties giving tables and metadata associated to the data. The data is in the data property.
composition = lung_data.data
composition.head().iloc[:,:3]
| Adams_Kaminski_2020_001C | Adams_Kaminski_2020_002C | Adams_Kaminski_2020_003C | |
|---|---|---|---|
| cell_type | |||
| B cell | 0.019485 | 0.030466 | 0.038913 |
| CD1c-positive myeloid dendritic cell | 0.012571 | 0.007168 | 0.032539 |
| CD4-positive, alpha-beta T cell | 0.006285 | 0.021505 | 0.023482 |
| CD8-positive, alpha-beta T cell | 0.012571 | 0.014337 | 0.042603 |
| alveolar macrophage | 0.293526 | 0.367384 | 0.300906 |
The data has been total-sum-scaled, i.e. each sample sums to 1, which we can check.
np.allclose(composition.sum(axis=0), 1.0)
True
The metadata for each column (sample) is stored in the col_metadata property. One of the labels for disease is quite long, so here we’ll tidy it a bit.
# Disease labels are very long, so we will recode these for more compact plots
sample_md = lung_data.col_metadata.copy()
sample_md['disease_long'] = sample_md['disease']
disease_code = {
"chronic obstructive pulmonary disease": "NC: COPD",
"normal": "NC",
"non-small cell lung carcinoma": "C: non-small cell",
"squamous cell lung carcinoma": "C: squamous cell",
"lung adenocarcinoma": "C: adenocarcinoma"
}
sample_md['disease'] = sample_md['disease_long'].apply(lambda x: disease_code[x])
sample_md = sample_md.loc[~sample_md.index.duplicated()]
sample_md.head().iloc[:, :3]
| assay | assay_ontology_term_id | development_stage | |
|---|---|---|---|
| patient | |||
| Adams_Kaminski_2020_001C | 10x 3' v2 | EFO:0009899 | 22-year-old human stage |
| Adams_Kaminski_2020_002C | 10x 3' v2 | EFO:0009899 | 25-year-old human stage |
| Adams_Kaminski_2020_003C | 10x 3' v2 | EFO:0009899 | 67-year-old human stage |
| Adams_Kaminski_2020_034C | 10x 3' v2 | EFO:0009899 | 49-year-old human stage |
| Adams_Kaminski_2020_052CO | 10x 3' v2 | EFO:0009899 | 62-year-old human stage |
Rank Selection¶
We’ll search from ranks 2 to 10. You may want to search a wider range in other data, but here we will see a clear rank is indicated within this range.
rank_res = denovo.rank_selection(
x=composition,
ranks=list(range(2, 11)),
seed=4298,
shuffles=100,
progress_bar=False
)
We can plot the results to look for an elbow point in the values. Throughout the notebook we apply some customisations to make publication ready figures, but these are not neccessary. For most figures the customisation is done using plotnine, which has very similar syntax to ggplot in R. The purpose of any complex changes to the plot are indicated by comments.
# This is a flag indicating whether to write figure produced
PUB = True
def figure_output(plot, name, pub):
"""Output a figure and the underlying data in PNG, PDF and TSV format."""
if not pub:
return plot
prefix = pathlib.Path("cell_imgs")
out_path = prefix / name
plot.save(out_path.with_suffix('.png'), dpi=300)
plot.save(out_path.with_suffix('.pdf'))
plot.data.to_csv(out_path.with_suffix('.tsv'), sep='\t')
return plot
plt_ranksel = (
denovo.plot_rank_selection(rank_res, jitter=False)
+ pn.guides(fill = "none")
+ pn.theme(figure_size=(2.66 * 2, 1.8))
+ pn.ggtitle("Bicrossvalidation Rank Selection")
+ pn.xlab("Rank")
+ pn.ylab("")
# We're adding a little more room for the stars - the internal calculation doesn't always get the right space for custom size plots
+ pn.scale_y_continuous(expand=[0.1, 0, 0.1, 0])
)
figure_output(plt_ranksel, "a_ranksel", PUB)
# You can show the figure in the notebook by using it as the final line of the cell
plt_ranksel
/home/kam24goz/repos/cvanmf/src/cvanmf/denovo.py:1194: FutureWarning: The previous implementation of stack is deprecated and will be removed in a future version of pandas. See the What's New notes for pandas 2.1.0 for details. Specify future_stack=True to adopt the new implementation and silence this warning.
/home/kam24goz/repos/cvanmf/src/cvanmf/denovo.py:1298: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
/home/kam24goz/repos/cvanmf/src/cvanmf/denovo.py:1303: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/home/kam24goz/miniforge3/envs/cvanmf/lib/python3.12/site-packages/plotnine/ggplot.py:606: PlotnineWarning: Saving 5.32 x 1.8 in image.
/home/kam24goz/miniforge3/envs/cvanmf/lib/python3.12/site-packages/plotnine/ggplot.py:607: PlotnineWarning: Filename: cell_imgs/a_ranksel.png
/home/kam24goz/miniforge3/envs/cvanmf/lib/python3.12/site-packages/plotnine/ggplot.py:606: PlotnineWarning: Saving 5.32 x 1.8 in image.
/home/kam24goz/miniforge3/envs/cvanmf/lib/python3.12/site-packages/plotnine/ggplot.py:607: PlotnineWarning: Filename: cell_imgs/a_ranksel.pdf
Regularisation selection¶
We would like a sparse solution so we will use L1 regularisation, and select a value for parameter alpha. Regularisation selection is performed in much the same way, however the results returned are a tuple with the first element the suggested best value for alpha based on a heuristic, the second element being results in the same structure as rank selection.
best_alpha, regu_res = denovo.regu_selection(
x=composition,
rank=3,
seed=4298,
l1_ratio=1.0,
progress_bar=False
)
plt_regusel = (
denovo.plot_regu_selection(regu_res, jitter=False)
+ pn.theme(
figure_size=(2.66, 1.8),
axis_text_x=pn.element_text(rotation=45, hjust=1.0)
)
+ pn.guides(fill = "none")
+ pn.ggtitle("Regularisation Selection")
# We're adding a little more room for the stars
+ pn.scale_y_continuous(expand=[0.1, 0, 0.1, 0])
)
figure_output(plt_regusel, "a_regusel", PUB)
plt_regusel
/home/kam24goz/repos/cvanmf/src/cvanmf/denovo.py:1194: FutureWarning: The previous implementation of stack is deprecated and will be removed in a future version of pandas. See the What's New notes for pandas 2.1.0 for details. Specify future_stack=True to adopt the new implementation and silence this warning.
/home/kam24goz/repos/cvanmf/src/cvanmf/denovo.py:1298: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
/home/kam24goz/miniforge3/envs/cvanmf/lib/python3.12/site-packages/plotnine/ggplot.py:606: PlotnineWarning: Saving 2.66 x 1.8 in image.
/home/kam24goz/miniforge3/envs/cvanmf/lib/python3.12/site-packages/plotnine/ggplot.py:607: PlotnineWarning: Filename: cell_imgs/a_regusel.png
/home/kam24goz/miniforge3/envs/cvanmf/lib/python3.12/site-packages/plotnine/ggplot.py:606: PlotnineWarning: Saving 2.66 x 1.8 in image.
/home/kam24goz/miniforge3/envs/cvanmf/lib/python3.12/site-packages/plotnine/ggplot.py:607: PlotnineWarning: Filename: cell_imgs/a_regusel.pdf
best_alpha
0.008928571428571428
The heuristic suggested value is quite a drop in \(R^2\) compared to the previous, so we will instead take the value one before.
Decomposition with selected parameters¶
decomps = denovo.decompositions(
x=composition,
ranks=[3],
alpha=2.23e-03,
l1_ratio=1.0,
seed=4298,
# We keep the 100 best decompositions so we can look at how similar signatures are across them later
# You could chose to retain a lower number
top_n=100
)
k3 = decomps[3][0]
100%|███████████████████████████████████| 100/100 [00:00<00:00, 142.97it/s]
Model fit for disease status¶
Dose our model describe certain diseases types better than others? We can plot the model fit for each sample, grouped by disease status. The model fit is relatively good for all groups, though slightly higher for all the cancer samples. We could hypothesise that this might be that there is more variation in composition among healthy participants; or perhaps as there are more cancer samples the model has learnt those types better.
plt_modelfit = (
k3.plot_modelfit(group=sample_md['disease'])
+ pn.coord_flip()
+ pn.theme(figure_size=(3, 1.5))
+ pn.ggtitle("Model Fit")
)
plt_modelfit
We can do the same for tumour stage
Interpreting signatures¶
Knowing which features contribute to each signature can help give a domain-specific interpretation, such as understanding which types of cells are contributing to the cancer associated signature.
The default plot for this is plot_feature_weight, which shows all features above a certain proportion of the overall signature weights (by default 4%).
Here we see S2 is largely inflammatory immune type / T-cells, S3 malignant cancer cells, and S1 a mix including functional lung cells (pneumocyte, alveolar macrophage).
plt_weight = (
k3.plot_feature_weight()
+ pn.theme(
figure_size=(4.2, 2.5)
)
+ pn.ggtitle("Feature Weight")
+ pn.scale_x_discrete(expand=(0.1, 0.0, 0.15, 0.0)) # This adds a little more room left and right of the bubbles
)
plt_weight
To make understanding future plots simpler, we can rename signatures. Here we retain the numbering (S1 .. S3), and add a brief tag indicating the broad types of cells represented.
k3.names = ['S1 (Func.)', 'S2 (Infl.)', 'S3 (Tumor)']
plt_weight = (
k3.plot_feature_weight()
+ pn.theme(
figure_size=(4.2, 2.8)
)
+ pn.ggtitle("Feature Weight")
+ pn.scale_x_discrete(expand=(0.1, 0.0, 0.15, 0.0)) # This adds a little more room left and right of the bubbles
)
figure_output(plt_weight, "f_feature_weight", PUB)
plt_weight
/home/kam24goz/miniforge3/envs/cvanmf/lib/python3.12/site-packages/plotnine/scales/scales.py:48: PlotnineWarning: Scale for 'x' is already present.
Adding another scale for 'x',
which will replace the existing scale.
/home/kam24goz/miniforge3/envs/cvanmf/lib/python3.12/site-packages/plotnine/ggplot.py:606: PlotnineWarning: Saving 4.2 x 2.8 in image.
/home/kam24goz/miniforge3/envs/cvanmf/lib/python3.12/site-packages/plotnine/ggplot.py:607: PlotnineWarning: Filename: cell_imgs.png
/home/kam24goz/miniforge3/envs/cvanmf/lib/python3.12/site-packages/plotnine/ggplot.py:606: PlotnineWarning: Saving 4.2 x 2.8 in image.
/home/kam24goz/miniforge3/envs/cvanmf/lib/python3.12/site-packages/plotnine/ggplot.py:607: PlotnineWarning: Filename: cell_imgs.pdf
Ordination of samples¶
To look at broad patterns in the data, we can look at PCoA ordination of samples. With few signatures, these willl largely divide by primary signature. In the PCoA below, we can see cancer and non-cancer samples separating along the first axis, and early and late stage tumors separating on the second axis.
We could run the PCoA on the input data, or WH, using the on argument of plot_pcoa/pcoa.
# We're defining a colour scale for tumor stage which gives an intution of progression
stage_color = dict(
none="grey",
early="#c8b1e4",
advanced="#532b88"
)
plt_pcoa = (
k3.plot_pcoa(on="h", color=sample_md['tumor_stage'], shape="signature")
+ pn.theme(figure_size=(4.3, 4), legend_position="bottom")
+ pn.ggtitle("PCoA on Signature Weights")
+ pn.scale_color_manual(
values=list(stage_color.values()),
breaks=list(stage_color.keys())
)
# We're removing the point colour legend, as it will be indicated on another figure in the publication
+ pn.guides(
color="none"
)
)
figure_output(plt_pcoa, "c_pcoa", PUB)
plt_pcoa
/home/kam24goz/repos/cvanmf/src/cvanmf/denovo.py:3831: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
/home/kam24goz/repos/cvanmf/src/cvanmf/denovo.py:3835: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
/home/kam24goz/miniforge3/envs/cvanmf/lib/python3.12/site-packages/plotnine/ggplot.py:606: PlotnineWarning: Saving 4.3 x 4 in image.
/home/kam24goz/miniforge3/envs/cvanmf/lib/python3.12/site-packages/plotnine/ggplot.py:607: PlotnineWarning: Filename: cell_imgs/c_pcoa.png
/home/kam24goz/miniforge3/envs/cvanmf/lib/python3.12/site-packages/plotnine/ggplot.py:606: PlotnineWarning: Saving 4.3 x 4 in image.
/home/kam24goz/miniforge3/envs/cvanmf/lib/python3.12/site-packages/plotnine/ggplot.py:607: PlotnineWarning: Filename: cell_imgs/c_pcoa.pdf
Visualising signature composition¶
An overview of the signature composition for samples can be a useful exploratory step. The plot_relative_weight methods produces a combination of stacked bar plot, a ribbon indicating some categorical metadata value for each sample, and a point plot of model fit.
Here we can see again what we observed above - there is a shift in S2 / S3 abundance between early and late stage tumours, and S3 is more abundant in advanced tumors.
fig = k3.plot_relative_weight(
# The metadata to show groups for. It will appear in the order and with the name we give it here in this series.
group=(
sample_md['tumor_stage']
.rename('Tumor Stage')
.sort_values(ascending=False)
),
# The colours should be provided as a series
# There is currently a bug here, where the colours are applied reversed.
group_colors=pd.Series(
reversed(list(stage_color.values())),
index=list(stage_color.keys())
),
model_fit=False,
heights=[0.25, 1, 0],
sample_label_size=False,
legend_cols_grp=2,
legend_cols_sig=2,
width=4, point_size=0.1,
threshold=None,
legend_side="top"
)
# The relative weight plot is produced by Marsilea, so unfortunately currently has a slightly different way to display and save
fig.render()
plt.show()
fig.save("cell_imgs/d_relative_weight.pdf")
fig.save("cell_imgs/d_relative_weight.png", dpi=300)
Relationships to metadata¶
Throught the previous plots we have seen what look like some associations between tumor stage and signature. We can plot this relationship to metadata as box or scatter plots. plot_metadata takes a single datframe with all the metadata we want to plot against, and returns two figures with discrete and continuous variables respectively.
By default, statistical tests are run for discrete comparison: a Kruskal-Wallis test for more than two classes, and a Mann-Whitney U-test for two samples. Significance is indicated with stars.
Here, we see that there are differences in signature weight between disease and tumour stage.
disc.data.head()
| sample | metadata_field | metadata_value | signature | signature_weight | |
|---|---|---|---|---|---|
| 0 | Adams_Kaminski_2020_001C | Disease | NC | S1 (Func.) | 0.961407 |
| 1 | Adams_Kaminski_2020_001C | Disease | NC | S2 (Infl.) | 0.028294 |
| 2 | Adams_Kaminski_2020_001C | Disease | NC | S3 (Tumor) | 0.010300 |
| 3 | Adams_Kaminski_2020_001C | Disease | NC | modelfit | 0.914390 |
| 4 | Adams_Kaminski_2020_001C | Tumor Stage | none | S1 (Func.) | 0.961407 |
disc, cont = k3.plot_metadata(sample_md[['disease', 'tumor_stage', 'age']].rename(
# Rename our metadata fields to more readable form
columns=dict(tumor_stage="Tumor Stage", disease="Disease")
))
disc = (
disc
+ pn.guides(fill="none")
+ pn.theme(
figure_size=(3,4),
axis_text_x=pn.element_text(rotation=60, hjust=1, vjust=1)
)
+ pn.ggtitle("Metadata Plots")
# We're adding a little more room for the stars - the internal calculation doesn't always get the right space for custom size plots
+ pn.scale_y_continuous(expand=[0.1, 0, 0.2, 0])
# This plot is facetted by default, but we're redefining it so we can add linebreaks to row values
# To see the data and columns which the plot is based on, your can look at disc.data
+ pn.facet_grid(
rows='signature',
cols='metadata_field',
labeller=pn.labeller(rows=lambda x: x.replace("(", "\n(").replace('model', '\nmodel')),
scales='free_x',
space='free_x'
)
)
figure_output(disc, "e_metadata", PUB)
disc
/home/kam24goz/repos/cvanmf/src/cvanmf/denovo.py:5147: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
/home/kam24goz/repos/cvanmf/src/cvanmf/denovo.py:4310: FutureWarning: The provided callable <function min at 0x7f88c86d0360> is currently using DataFrameGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
/home/kam24goz/repos/cvanmf/src/cvanmf/denovo.py:4326: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
/home/kam24goz/repos/cvanmf/src/cvanmf/denovo.py:4331: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
/home/kam24goz/miniforge3/envs/cvanmf/lib/python3.12/site-packages/plotnine/scales/scales.py:48: PlotnineWarning: Scale for 'y' is already present.
Adding another scale for 'y',
which will replace the existing scale.
/home/kam24goz/miniforge3/envs/cvanmf/lib/python3.12/site-packages/plotnine/ggplot.py:606: PlotnineWarning: Saving 3 x 4 in image.
/home/kam24goz/miniforge3/envs/cvanmf/lib/python3.12/site-packages/plotnine/ggplot.py:607: PlotnineWarning: Filename: cell_imgs/e_metadata.png
/home/kam24goz/miniforge3/envs/cvanmf/lib/python3.12/site-packages/plotnine/ggplot.py:606: PlotnineWarning: Saving 3 x 4 in image.
/home/kam24goz/miniforge3/envs/cvanmf/lib/python3.12/site-packages/plotnine/ggplot.py:607: PlotnineWarning: Filename: cell_imgs/e_metadata.pdf
Here it is might be interesting to look which of the tumor stages are different in post-hoc tests. That isn’t available in the plot currently, but we can run the tests and get a dataframe of the results
univar_results = k3.univariate_tests(sample_md[['disease', 'tumor_stage']].rename(
# Rename our metadata fields to more readable form
columns=dict(tumor_stage="Tumor Stage", disease="Disease")
))
univar_results.head()
| statistic | p | test | signature | md | max_median | max_mean | posthoc_str | posthoc_mat | alpha | local_reject | local_adj_p | global_reject | global_adj_p | adj_method | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| S1 (Func.) | 111.551123 | 0.0 | kruskal | S1 (Func.) | Disease | NC | NC | C: adenocarcinoma|C: squamous cell(0.041462685... | [[1.0, 0.326832357772083, 0.04146268546446127,... | 0.05 | True | 0.0 | True | 0.0 | fdr_bh |
| S2 (Infl.) | 56.195448 | 0.0 | kruskal | S2 (Infl.) | Disease | C: adenocarcinoma | C: non-small cell | C: adenocarcinoma|C: squamous cell(4.287959966... | [[1.0, 0.6297354012486505, 4.287959966752344e-... | 0.05 | True | 0.0 | True | 0.0 | fdr_bh |
| S3 (Tumor) | 60.076882 | 0.0 | kruskal | S3 (Tumor) | Disease | C: squamous cell | C: squamous cell | C: adenocarcinoma|C: squamous cell(0.000366053... | [[1.0, 0.4083706712324559, 0.00036605306670249... | 0.05 | True | 0.0 | True | 0.0 | fdr_bh |
| modelfit | 5.183176 | 0.269014 | kruskal | modelfit | Disease | C: squamous cell | C: adenocarcinoma | [[1.0, 0.8855997748258781, 0.8855997748258781,... | 0.05 | False | 0.269014 | False | 0.269014 | fdr_bh | |
| S1 (Func.) | 115.403315 | 0.0 | kruskal | S1 (Func.) | Tumor Stage | none | none | advanced|early(0.0004988496148587597);advanced... | [[1.0, 0.0004988496148587597, 9.12382034786168... | 0.05 | True | 0.0 | True | 0.0 | fdr_bh |
# To extract only the modelfit / stage tests
univar_results[
(univar_results['signature'] == "modelfit")
& (univar_results['md'] == 'Tumor Stage')
]['posthoc_str']
modelfit advanced|none(0.049496854141262696)
Name: posthoc_str, dtype: object
This indicates that only the comparison between advanced and none was significant below the default alpha of 0.05.
We also get a plot for continuous metadata variables, though there is not much interesting to see here.
Stability of Signatures¶
NMF solutions vary depending on their initialisation - most methods offered include some randomisation, so solutions can vary. If highly similar signatures are obtained across random initialisations, this can help characterise how robust each signature is. Using the 100 decompositions we produced earlier, we can visualise how similar the signatures in other decompositions are to the best decomposition which we used for analysis.
sig_stability = stability.signature_stability(
decomps[3],
decomps[3][0]
)
plt_stability = (
stability.plot_signature_stability(
sig_stability,
k3.colors
) +
pn.ggtitle("Signature\nStability") +
pn.theme(figure_size=(2.5, 3)) +
pn.guides(fill="none")
)
plt_stability
Signatures are mostly highly similar across decompositions, with a handful of outliers where the solution is more different. In particular, signature S3 seems to have no similar signature in some decompositions.
We can compare this across decompositions of multiple ranks to see if signature stability supports our choice of rank.
decomps_multi = denovo.decompositions(
x=composition,
ranks=range(2, 11),
# Suitable alpha hasn't been selected for all ranks, so set to 0
alpha=0.0,
l1_ratio=1.0,
seed=4298,
# We want to retain all our decompositions so we can compare them
top_n=101,
# We choose 101 so we can take the best, and compare it to 100 others
random_starts=101,
progress_bar=False
)
sig_stability_multi = stability.signature_stability(
decomps_multi
)
plt_stability_multi = (
stability.plot_signature_stability(
sig_stability_multi,
geom_boxplot=dict(outlier_size=0.1),
geom_line=True
) +
pn.ggtitle("Signature\nStability") +
pn.theme(figure_size=(7, 5)) +
pn.guides(fill="none") +
pn.facet_wrap(facets="k", ncol=6, scales="free_x")
)
plt_stability_multi
The signatures here show much more variation across intialisations, suggesting a stable solution is not being found.