Toy Example

Contents

Toy Example#

[1]:

import scanpy as sc
import singleCellHaystack as hs

sc.set_figure_params(facecolor="white", dpi=90)

Load toy data#

[2]:

adata = hs.load_toy()
adata

[2]:

AnnData object with n_obs × n_vars = 601 × 500
    obsm: 'X_tsne'

[3]:

sc.pl.scatter(adata, basis="tsne")

../_images/tutorials_toy_4_0.png

Run haystack#

[4]:

res = hs.haystack(adata, coord="tsne", n_randomizations=100, n_genes_to_randomize=100, spline_method="ns")

> starting haystack ...
> entering array method ...
> scaling coordinates ...
> calculating feature stds ...
> calculating grid points ...
> calculating distance to cells ...
> calculating densities ...
> calculating Q dist ...
> calculating KLD for 500 features ...

100%|███████████████████████████████████████████████████████████████████████████████| 500/500 [00:00<00:00, 7301.81it/s]

> calculating feature's CV ...
> selecting genes to randomize ...
> calculating randomized KLD ...

100%|█████████████████████████████████████████████████████████████████████████████████| 100/100 [00:01<00:00, 81.12it/s]

> calculating P values ...
> done.

QC#

We can examine some of the QC plots. First the randomization fits. These are used to calculate KLD from randomized expression levels for a subset of genes, in order to estimate the values to the entire gene set.

[5]:

hs.plot_rand_fit(res, "mean")
hs.plot_rand_fit(res, "sd")

../_images/tutorials_toy_8_0.png

../_images/tutorials_toy_8_1.png

The ranking of logpval and distribution of pval gives us some idea of how many significant genes we can detect.

[6]:

hs.plot_pval_rank(res)
hs.plot_pval_hist(res)

../_images/tutorials_toy_10_0.png

../_images/tutorials_toy_10_1.png

Results#

A pandas DataFrame with the results can be obtained. By default the results are sorted by logpval_adj.

[7]:

import pandas as pd

[8]:

sum = res.top_features(10)
sum

[8]:

	gene	KLD	CV	pval	pval_adj	logpval	logpval_adj
241	gene_242	1.738191	2.627666	1.066598e-37	5.332992e-35	-36.971999	-34.273029
338	gene_339	1.861424	2.718406	2.894843e-37	1.447422e-34	-36.538375	-33.839405
496	gene_497	2.007779	2.860687	1.749626e-35	8.748131e-33	-34.757055	-32.058085
274	gene_275	1.733409	2.698842	1.510932e-34	7.554659e-32	-33.820755	-31.121785
350	gene_351	1.849559	2.781079	1.538332e-34	7.691660e-32	-33.812950	-31.113980
61	gene_62	2.027718	2.931154	3.283019e-33	1.641509e-30	-32.483727	-29.784757
136	gene_137	1.791516	2.784830	6.520173e-33	3.260087e-30	-32.185741	-29.486771
316	gene_317	1.762123	2.776867	1.982998e-32	9.914989e-30	-31.702678	-29.003708
243	gene_244	1.664939	2.719325	7.226574e-32	3.613287e-29	-31.141068	-28.442098
78	gene_79	2.238983	3.174348	1.517525e-30	7.587625e-28	-29.818864	-27.119894

Plot top 4 genes.

[9]:

sc.pl.tsne(adata, color=sum.gene.iloc[:4], ncols=2, cmap="Spectral_r")

../_images/tutorials_toy_16_0.png

Export results#

[10]:

#sum.to_csv("toy-results.tsv")