[Single-cell isoform rNA sequencing characterizes isoforms in thousands of cerebellar cells](doi:10.1038/nbt.4259)
The authors used microfluidics to amplify full-length cDNA from single cells in a sample. cDNA produced from each single cell was barcoded to enable cell-of-origin identification and then split into two pools, with one pool being used for short-read Illumina 3′ sequencing to measure gene expression and the other pool being used for long-read sequencing and isoform identification.Long-read sequencing with Pacific Biosciences (PacBio) or Oxford Nanopore3 was used to identify full-length RNA isoforms.
Filter cells to retain reads confidently mapped to genes. Then use these short reads to cluster cells
Performe a second independent replicate (rep2) with threefold sequence depth
Jaccard index 显示rep1和rep2中相同的群的marker gene相似
Generated ~5.2 million PacBio circular consensus reads(CCS)
Cellular barcodes are located close to the polyA-tail, so they first searched for polyA-tails.
61.6% of CCS contained a T9.
Error- free sequencing of the theoretical construct (21-bp adaptor sequence, 16-bp cellular barcode, and 10-bp UMI and polyA-tail) yielded a T9 starting at position 48. ~97% of T9-CCS had a T9 starting between positions 45 and 51
Non-expected T9-position CCS had lower T-content while expected T9-position CCS have 30-bp T-content.
Expected T9-position CCS showed a higher barcode identification rate than CSS with a T9 in other positions
For 92.7% of barcodes, the minimal (Levenshtein) distance was 3 or greater, and for the remaining barcodes it was 2. Thus, for most barcodes there was only one specific error pattern (three errors) that would result in a mis-identified cell. Simulation indicated that all of this false-positive barcode were discarded.
Aligned PacBio reads to the mouse genome (version mm10)
The authors analyzed novel isoforms with respect to mouse Gencode version 10, to produce a long-read-enhanced and cell-type-resolved annotation.For these isoforms, we required all splice sites to be known in Gencode32 (version 10) and each junction and internal exon to be either annotated or observed at least twice in ScISOr-Seq. To reduce the effect of PCR artifacts on the improved mouse Gencode annotation to a minimum, and to allow for adding transcripts expressed at low levels, researchers produced an enhanced cell-type-resolved annotation that had good six-cycle PCR short-read support. For each added isoform, each intron and internal exon was required to be annotated in Gencode, or to be supported by two or more six-cycle PCR short reads.
To validate the correct calling of the individual cell of origin for each isoform, the authors performed immunopanning(????)
Examined alternative splicing in the Bin1 gene
* In addition to four annotated alternate exons ( A1, A3, A4 and A5) in mouse Gencode for Bin1, authors found two more alternate exons, A2 and A6, in ≥3 reads.
Coordination of alternate exons is of crucial biological importance, so they searched for this in our ScISOr-Seq data. They found 25 genes with coordination of alternate exons that were separated by intermediate exons.
Testing all exon pairs, adjacent or separated by intermediate exons, they found 633 genes with coordination, including all 25 with intermediate exons. Thus, most coordinated pairs were adjacent exon pairs.
20% (5 of 25) of coordination events of alternate exons, which were sepa- rated by constitutive exons, were a result of differences in isoform abundance .
Multiple deeply sequenced replicates are needed for precise quantification. Use of long-read technology in ScISOr-Seq makes accurate quantification expensive for now. Our estimates for specificity and sensitivity of barcode recognition in long reads are based on using 16-mer 10xGenomics barcodes for 6,000–7,000 cells. If the number of cells is increased to >1 million while still relying on 16-mer barcodes, the authors would advise reassessment of specificity and sensitivity, as specificity is likely to drop