Introduction

Here we introduce the usage of celltype package. The main function uses a dictionary of immune cell markers that consist on expression level measurements of genes in different immune cell types. We use this dictionary to predict the likely cell type of a experimental dataset of single cell transcriptome measurements. The assignment is made using correlation between the expression levels of the markers in the dictionary and the same genes in the cells.

Dataset

Immune cell dictionary

As immune gene markers we use a set defined by the Immgen project. We have gene symbols for both human and mouse.

head(markers)
#>       human mouse
#> Bcr     BCR   Bcr
#> Ccr7   CCR7  Ccr7
#> Cd14   CD14  Cd14
#> Cd19   CD19  Cd19
#> Cd1d1 CD1D1 Cd1d1
#> Cd207 CD207 Cd207

We use a dictionary of gene expression for the gene markers in different cell types from the Immgen project.

head(immgen.db)
#> # A tibble: 6 x 5
#>   celltype human mouse expression tissue
#>   <chr>    <chr> <chr>      <dbl> <chr> 
#> 1 B.FO.LN  BCR   Bcr      -0.0446 LN    
#> 2 B.FO.LN  CCR7  Ccr7      1.06   LN    
#> 3 B.FO.LN  CD14  Cd14      0      LN    
#> 4 B.FO.LN  CD19  Cd19      3.49   LN    
#> 5 B.FO.LN  CD1D1 Cd1d1    -0.452  LN    
#> 6 B.FO.LN  CD207 Cd207     0      LN

Experimental data

As experimental dataset we use a test SingleCellExperiment object containing 500 murine splenic cells.

sce1
#> class: SingleCellExperiment 
#> dim: 461 500 
#> metadata(0):
#> assays(2): counts logcounts
#> rownames(461): 2810417H13Rik|ENSMUST00000045802.6|chr9:65902119
#>   Ada|ENSMUST00000017841.3|chr2:163726816 ...
#>   Zfp36l2|NM_001001806.2|Reference_end
#>   Zfp36|NM_011756.4|Reference_end
#> rowData names(5): id symbol accession note entrezgene
#> colnames(500): GA01075_169602 GA01075_95437 ... GA01075_717854
#>   GA01075_319415
#> colData names(4): samplename filename cell_index group
#> reducedDimNames(0):
#> spikeNames(0):

Analysis

Using the ImmGen dictionary

We can use predict_celltype() to obtain a matrix of cell type correlations. For each cell in the dataset we obtain the computed correlation to each of the markers that are present. By default the ImmGen dictionary is used.

celltype1 <- predict_celltype(sce1, tissue = "SP")
celltype1[1:5, 1:3]
#>         GA01075_169602 GA01075_95437 GA01075_200935
#> B.FO.SP      0.6358489    -0.1884949      0.5800941
#> B.GC.SP      0.4047710    -0.2828901      0.3869595
#> B.MZ.SP      0.5631228    -0.2308329      0.6293923
#> B.T1.SP      0.6580691    -0.1932920      0.6432285
#> B.T2.SP      0.6284439    -0.1999266      0.6013123

To choose a particular celltype we can use choose_celltype(), which does that based on maximum correlation.

celltype1 <- choose_celltype(celltype1)
head(celltype1)
#> # A tibble: 6 x 3
#>   cell_index     celltype   correlation
#>   <chr>          <chr>            <dbl>
#> 1 GA01075_169602 B.T1.SP          0.658
#> 2 GA01075_95437  T.4FP3-.SP       0.631
#> 3 GA01075_200935 B.T1.SP          0.643
#> 4 GA01075_848724 T.8MEM.SP        0.604
#> 5 GA01075_490889 B.FO.SP          0.404
#> 6 GA01075_574367 B.FO.SP          0.444

The Immgen hierarchy of cells is very specific. The function simplify_immgen_celltype() enables to focus on the top level cell in the hierarchy.

celltype1 <- celltype1 %>%mutate(celltype_simple = simplify_immgen_celltype(celltype))

head(celltype1)
#> # A tibble: 6 x 4
#>   cell_index     celltype   correlation celltype_simple
#>   <chr>          <chr>            <dbl> <chr>          
#> 1 GA01075_169602 B.T1.SP          0.658 B              
#> 2 GA01075_95437  T.4FP3-.SP       0.631 T              
#> 3 GA01075_200935 B.T1.SP          0.643 B              
#> 4 GA01075_848724 T.8MEM.SP        0.604 T              
#> 5 GA01075_490889 B.FO.SP          0.404 B              
#> 6 GA01075_574367 B.FO.SP          0.444 B

Using the Immuno-Navigator dictionary

We can use alternative dictionaries. In this package a dictionary based on the Immuno-Navigator database is also available. Note that the cell hierarchies do not need to be identical. Nor both predictions need to be consistent.

celltype2 <- predict_celltype(sce1, name = "immnav")
celltype2[1:5, 1:2]
#>                            GA01075_169602 GA01075_95437
#> CD4                           0.014040968    0.70506105
#> CD8                           0.003266078    0.65287022
#> Common DC progenitor          0.210933611   -0.02917296
#> Common lymphoid progenitor    0.192048030    0.05134000
#> Common myeloid progenitor     0.183723883   -0.02631517

And again choose the cell type based on maximum correlation.

celltype2 <- choose_celltype(celltype2)
head(celltype2)
#> # A tibble: 6 x 3
#>   cell_index     celltype correlation
#>   <chr>          <chr>          <dbl>
#> 1 GA01075_169602 Mature B       0.769
#> 2 GA01075_95437  CD4            0.705
#> 3 GA01075_200935 Mature B       0.770
#> 4 GA01075_848724 CD8            0.781
#> 5 GA01075_490889 Mature B       0.668
#> 6 GA01075_574367 Mature B       0.741

Using MCA dictionary

celltype3 <- predict_celltype(sce1, name = "mca", tissue = "spleen")
celltype3[1:5, 1:2]
#>                                     GA01075_169602 GA01075_95437
#> Dendritic cell_S100a4 high(Spleen)       0.6370907  -0.006726544
#> Dendritic cell_Siglech high(Spleen)      0.7132803   0.019037986
#> Erythroblast(Spleen)                     0.5962325   0.130955725
#> Granulocyte(Spleen)                      0.5556167   0.111002183
#> Macrophage(Spleen)                       0.5194207  -0.035899145

celltype3 <- choose_celltype(celltype3)
head(celltype3)
#> # A tibble: 6 x 3
#>   cell_index     celltype                            correlation
#>   <chr>          <chr>                                     <dbl>
#> 1 GA01075_169602 Marginal zone B cell(Spleen)              0.765
#> 2 GA01075_95437  T cell(Spleen)                            0.443
#> 3 GA01075_200935 Marginal zone B cell(Spleen)              0.606
#> 4 GA01075_848724 T cell(Spleen)                            0.435
#> 5 GA01075_490889 Dendritic cell_Siglech high(Spleen)       0.526
#> 6 GA01075_574367 Marginal zone B cell(Spleen)              0.525

Using custom dictionaries

It is possible also to define custom dictionaries. Here we make a dictionary for CD4 and CD8 T cells.

cells <- c("CD4", "CD8", "DP", "DN", "B")
my.db <- matrix(0, nrow = nrow(markers), ncol = length(cells), dimnames = list(markers[["mouse"]], cells))

my.db["Cd69", ] <- c(-1, -1, -1, -1, 1)
my.db["Cd19", ] <- c(-1, -1, -1, -1, 1)
my.db["Cd4", ] <- c(1, -1, 1, -1, -1)
my.db["Cd8a", ] <- c(-1, 1, 1, -1, -1)

my.db[c("Cd69", "Cd19", "Cd4", "Cd8a"), ]
#>      CD4 CD8 DP DN  B
#> Cd69  -1  -1 -1 -1  1
#> Cd19  -1  -1 -1 -1  1
#> Cd4    1  -1  1 -1 -1
#> Cd8a  -1   1  1 -1 -1

celltype4 <- predict_celltype(sce1, db = my.db)
celltype4[1:5, 1:5]
#>     GA01075_169602 GA01075_95437 GA01075_200935 GA01075_848724
#> CD4     -0.3010388    0.21350421   0.0005910621    -0.13893030
#> CD8     -0.3010388   -0.05337605   0.0005910621     0.13699145
#> DP      -0.3672402    0.13176157  -0.0792944953     0.05027676
#> DN      -0.2395571    0.02777778   0.0848141308    -0.05501438
#> B        0.3672402   -0.13176157   0.0792944953    -0.05027676
#>     GA01075_490889
#> CD4     -0.3815523
#> CD8     -0.3815523
#> DP      -0.4418452
#> DN      -0.3285187
#> B        0.4418452

celltype4 <- choose_celltype(celltype4)
head(celltype4)
#> # A tibble: 6 x 3
#>   cell_index     celltype correlation
#>   <chr>          <chr>          <dbl>
#> 1 GA01075_169602 B             0.367 
#> 2 GA01075_95437  CD4           0.214 
#> 3 GA01075_200935 DN            0.0848
#> 4 GA01075_848724 CD8           0.137 
#> 5 GA01075_490889 B             0.442 
#> 6 GA01075_574367 B             0.402

Compare all predictions

We can compare the different predictions.

celltype1 <- celltype1 %>% select(-celltype_simple) %>%
  dplyr::rename(cell_img = celltype, cor_img = correlation) %>%
  mutate(cor_img = format(cor_img, digits = 3))

celltype2 <- celltype2 %>%
  dplyr::rename(cell_nav = celltype, cor_nav = correlation) %>%
  mutate(cor_nav = format(cor_nav, digits = 3))

celltype3 <- celltype3 %>%
  dplyr::rename(cell_mca = celltype, cor_mca = correlation) %>%
  mutate(cor_mca = format(cor_mca, digits = 3))

celltype4 <- celltype4 %>%
  dplyr::rename(cell_cus = celltype, cor_cus = correlation) %>%
  mutate(cor_cus = format(cor_cus, digits = 3))


Reduce(left_join, list(celltype1, celltype2, celltype3, celltype4)) %>%
  datatable(rownames = FALSE)

Identification of cell types from single cell RNA-seq data

Diego Diez

2019-04-01