1. Introduction
Single-cell (sc) biology is a rapidly evolving field that interrogates various ‘omics layers (genomics, epigenomics, proteomics) at SC resolution to define cellular heterogeneity in homeostasis and disease [
1,
2,
3]. In order to ensure reliable results that provide the resolution and quality necessary to make meaningful scientific conclusions and to improve reproducibility, experimental protocols need to be optimized, evaluated, and standardized. This proves to be a challenge because several factors could affect the quality of the analyzed samples, leading to poor data quality. These include the type of tissue (PBMC, tumor, bone marrow, or cell line) which can impact the nuclear isolation protocol; the biological condition being studied (homeostasis, development, cancer, infection, or inflammation); the method of sample collection, handling, storage, and maintenance; cell viability and the number of cells available. Moreover, most protocols are typically defined using ideal samples that do not reflect the clinical and translational research environment, the so-called “real world” samples.
Epigenomic profiling of DNA accessibility (open chromatin regions) helps create a genome-wide characterization of transcription factor (TF) binding events that orchestrate gene expression [
4]. Open chromatin regions represent complementary information to gene expression that allows identifying TF and its targets, which are typically missed by transcriptome assays, given its low expression [
5]. TF activity signatures, also known as TF footprinting, can be traced back to the identity of the cell [
6,
7], making it a valuable resource for characterizing tissue heterogeneity. Combining single-cell level data from both gene expression and open chromatin allows us to define sub-populations at higher resolution and study the complex interactions between the genes and the TFs that regulate them. The 10x Genomics Chromium single nucleus snATAC + snRNA platform allows us to profile both open chromatins using the assay for transposase-accessible chromatin (ATAC) [
8] and gene expression (GEX) information from the same nuclei, bypassing the need for computational integration of this information retrieved from non-paired cells.
In this study, we characterize experimental protocol optimization for the snATAC + snRNA assay that has been utilized in our laboratory in a variety of studies, including mouse mammary stem cells [
9], patient-derived xenografts (PDX) and patient samples from ovarian cancer, and blood samples from COVID-19 patients [
10]. Performing simultaneous snATAC and snRNA-seq on the same cells requires careful sample preparation and sequencing involving several steps that will be discussed here: cell isolation (single-cell suspension from the tissue or sample of interest using appropriate methods, such as enzymatic digestion or mechanical dissociation); nuclei isolation (separate the nuclei from the cells using appropriate methods, such as lysis and centrifugation); data quality analysis (analyze the sequencing data to perform integrated analysis of snATAC-seq and snRNA-seq data using appropriate bioinformatics tools).
In contrast to the snATAC + snRNA assay that captures only nuclear transcripts (snRNA), scRNA-seq is an assay that captures transcripts from the whole cell (scRNA), which includes both the nucleus and the cytoplasm. Each method has specific advantages and disadvantages. While scRNA provides a complete transcriptomic profile about individual cells and allows for better identification of rare cell types, the requirement for enzymatic dissociation of samples which may disrupt gene expression profiles, and the time-sensitive nature of the technique makes obtaining high-quality RNA from individual cells challenging [
11]. snRNA, on the other hand, can provide more consistent and reliable data since nuclei are easier to isolate and RNA is more stable inside of nuclei than in whole cells [
12]. This is especially useful when working with tissues that cannot be easily dissociated [
13,
14]. The ability to work with frozen samples has also made snRNA a promising tool for the characterization of tissue atlases [
15,
16]. A disadvantage of this approach is that using nuclei instead of whole cells can result in the loss of information about cell-cell interactions and other critical cellular processes occurring outside the nucleus [
17]. To better understand the differences between the transcriptomic signatures captured by the two assays, we performed both assays on matched PBMC samples originating from the same 18 patients. On analyzing the data obtained from both assays, we compare and characterize the power of each method in identifying immune cell subpopulations and their corresponding gene markers.
3. Results
3.1. Nuclei Isolation for snATAC + snRNA Sequencing
PBMC samples (n=18) were processed according to protocol A (
Figure 1A). Blood was derived from patients with symptomatic COVID-19, CHIP, or both (Methods). From cell suspensions, nuclei can be extracted without a dissociation step according to protocol A (
Figure 1A), facilitating the acquisition of high-quality nuclei (
Figure 2A), according to the standard (
Figure 2B).
snATAC + snRNA sequencing was also performed on 18 patients diagnosed with ovarian cancer. The tumor tissue was collected from debulking surgery. For solid tumor samples ensuring adequate tissue dissociation poses several challenges, especially with cancer samples, recognized for their complexity and significant inter-patient variability. A suitable protocol has to promote tissue dissociation, preserving nuclei membrane integrity across all the samples.
Tissue digestion status evaluation during sample preparation was critical. The presence of debris, nuclei aggregates, and membrane damage can negatively impact the assay and the sequencing quality. Over or under-digestion is equally detrimental to the data quality. It is important to obtain complete tissue dissociation of tissue and cell aggregates to avoid clogging the microfluidic system during the capture step while maintaining the integrity of the nuclear membrane. Residual peri-nuclear cytoplasm can also reduce the quality of GEX sequencing results.
Several measures can be taken if nuclei suspension contamination is recognized. Utilizing a fine needle syringe was helpful in dissociating nuclei aggregates; if needed, a strainer was used to remove unwanted debris and cell aggregates. Choosing a protocol tailored for sample characteristics minimizes several of these issues. Lastly, optimization of the duration of cell lysis, especially in the case of tissue dissociation, was also critical (see below).
In 9 OC samples, nuclei isolation for single cells followed collagenase-based tissue digestion (
Figure 1A; protocol B). Two samples using protocol B were removed from further processing due to low-quality nuclei (
Figure 2A; protocol B). The nuclei isolation protocol for complex tissues C, D (
Figure 1), and E (Figure S1) was tested in 11 OC samples.
Protocol performance was compared during sample preparation by nuclei morphology, and its implications on downstream data quality are further discussed. Nuclei isolation following protocol B resulted in one sample clogging the microfluidic system. Under microscopic visualization, this sample had numerous debris (tissue remnants and lipid components), the most likely reason for sample clogging (
Figure 2A; protocol B). Another specimen was discarded due to the amount of debris. Also, sample 7 (
Figure 2A) exemplifies that some nuclei had intact membranes for protocol B, but frequently membrane blebs were noticed. Fewer aggregates were visualized with complex tissue dissociation (protocols C and D,
Figure 2A), and no clogging events were observed after incubation time increment. In addition, protocols C and D showed better preservation of nuclear membranes (
Figure 2A).
The incubation step for protocols C and D requires sample-based time optimization. Five and thirty minutes incubation times were tested using the same OC sample (sample 8). Both time periods yielded single nuclei suspension and minimal debris or aggregates (
Figure 2A, protocol C). Interestingly, the shorter incubation time (5 minutes) failed the barcoding step. This resulted in less sample volume after the capturing step and uneven mixing compared to the 30 minutes incubation.
Targeting 5,000 cells was considered ideal for both samples, PBMC and OC. However, due to the lower viability of the PBMC samples, 4000 was the higher target number of cells. For OC samples, the majority were targeted at 5000 (n=16). The ratio between the number of cells retrieved and the goal, known as capture efficiency, is demonstrated in Figure S2A.
Ovarian cancer samples (n=18, 90%) that met nuclei preparation quality criteria from all protocols and PBMC samples (n=18) were sequenced according to the 10x Genomics Multiome platform (Methods).
3.2. Sequencing Data QC for PBMC (n=18) and Fresh OC Samples (n=11)
Sequenced reads (
Figure 1B) from ATAC and GEX libraries were processed using 10x Genomics Cell Range ARC. Five representative QC statistics from the ATAC libraries and six from the gene expression (GEX) libraries, as reported by Cell Ranger ARC, were considered for quantitative sample quality comparison (
Figure 3A). Downstream data analysis was performed using Signac, where QC statistics per cell were analyzed, allowing for filtering high-quality cells, if necessary. Here PBMC and OC samples QC statistics are compared according to their nuclei isolation protocol (
Figure 3 A-D).
Globally, the PBMC samples performed below the cut-off for ATAC fraction of high-quality reads and ATAC TSS enrichment score (
Figure 3A). Cells under inflammatory stress are expected to produce poor-quality results, especially under stressors such as viral infection and CHIP (typically caused by clones carrying mutations in epigenetic regulators like
TET2,
DNMT3A and
ASXL1 [
33]), contributing to a lower fraction of high-quality reads. In this situation, cell-level quality control analysis and filtering (
Figure 3D) are crucial to ensure that only high-quality cells are used to make biological inferences. Despite that, the mean TSS fragment length distribution was adequate (
Figure 3C).
Regarding gene expression, results from samples processed using protocol A were satisfactory. Overall, 55 percent of the samples (n=10) archived higher than 60% GEX fraction of transcriptomic read. Also, 78% of the samples (n=14) had higher than 1,000 GEX median unique molecules identified (UMI) counts per cell (
Figure 3).
Significant heterogeneity among samples on ATAC and GEX components was observed for ovarian cancer samples (
Figure 3A), possibly due to the high complexity of cancer tissue components. ATAC fraction of high-quality reads, ATAC TSS enrichment score, GEX median unique molecules identified (UMI) counts per cell, and GEX fraction of transcriptomic reads parameters can guide analysis decision-making since they have a recommended threshold regardless of the specimen.
ATAC fraction of high-quality fragments represents the percentage of high-quality fragments with a valid barcode associated with cell-containing partitions [Technical Note – Cell Ranger ARC Web Summary Files for Single Cell snATAC + snRNA ATAC + Gene Expression Assay • Rev A]. When this fraction is less than 40%, it indicates that a significant fraction of ATAC fragments were not associated with a barcoded cell. This can be explained by high ambient ATAC contamination, increased cell accessibility, or poor nuclei preparation. All samples processed according to protocol B (n=7) did not meet the minimum threshold (
Figure 3A). Remarkably, all samples (n=10) with the nuclei extracted following protocol C had a higher than 40% threshold for ATAC fraction of high-quality fragments (
Figure 3A; Protocol C mean = 76.9%; Protocol D mean = 54.4%).
The median UMI count represents gene expression elements (RNA molecules converted into cDNA) per cell, thus correlating with sequencing depth. Given that all the samples had an adequate number of reads (
Figure 1B), read depth was further evaluated by the number of unique genes detected per cell, equally consistent between protocols (
Figure 3D).
ATAC fragment size distribution also correlates with sample quality. To characterize DNA accessibility, samples must have fragments of the inter-nucleosome region around them. This distribution is only achieved with protocol C for fresh ovarian cancer samples (
Figure 3C), suggesting that protocol B is not ideal for fresh cancer samples.
GEX fraction of transcriptomic reads translates the RNA percentage aligned with the genome and assigned to a barcoded cell [Technical Note – Cell Ranger ARC Web Summary Files for Single Cell snATAC + snRNA ATAC + Gene Expression Assay • Rev A]. Lower values can be due to environmental RNA contaminating the single-nuclei droplet. In accordance, the protocol C sample outperformed protocol B preparation. Considering the apparent superiority trend, a paired comparison was conducted to control for sample specificities.
The complex tissue dissociation protocol was better than protocol B for OC samples in the paired comparison (
Figure 3B, sample 7). Protocol C had an ATAC fraction of high-quality reads of 70% compared to 26% obtained by the B protocol. Also, the complex tissue dissociation protocols’ GEX fraction of transcriptomic reads was 2.4 times higher (
Figure 3B). As was foreseeable, the quality differences impacted cell identification and clustering (Figure S3A, sample 7). Taken together, optimal nuclei preparation has been proven crucial to assay performance.
To further optimize protocol C, two incubation times were tested. Sample 8 (
Figure 3B) was dissociated for 5 and 30 minutes. As mentioned, the shorter incubation period for protocol C failed the barcode step, resulting in empty droplets that were wrongly counted as cells (20000 against 5752 for the 30 minutes incubation), culminating in a poor GEX fraction of transcriptomic reads per cell and a higher percentage of filtered-out cells (Figure S2B). These technical issues culminated with inadequate clustering (ATAC and GEX) for sample 8 when dissociated for only five minutes (Figure S3B), showing the same pattern observed in compromised sample examples in 10x genomics technical notes [
34].
An intermediary incubation time point was tested to assess for possible tissue over-digestion within 30 minutes of incubation. Fresh preparation of sample 9 was incubated for 10 and 30 minutes. Both preparations yielded good QC values, yet the longer incubation had a lower GEX fraction of transcriptomic reads (
Figure 3B). As previously mentioned, tissue over-digestion can impact GEX reads, even if membrane changes are not detected (
Figure 2B, protocol C); this probably happened in the 30-minute incubation time. Also, the longer incubation had fewer cells retrieved (1,937, compared to 3,740 for 10 minutes). However, both targeted five thousand cells and had similar sample concentrations (4850 nuclei/µL for 10 minutes, against 3520 nuclei/µL), decreasing capture efficiency for the 30 minutes incubation (Figure S2A). Similar clustering was obtained for ATAC and GEX, demonstrating that, regardless of the absolute QC values, both incubation time points were feasible for the OC preparation (Figure S3C).
3.3. Frozen OC Sample Optimization
After optimizing nuclei extraction protocols for fresh tissue, the same process was performed for frozen tissue. The advantages of frozen tissue include allowing retrospective sample studies and curated specimen selection. Sample 9, previously discussed in the fresh sample preparation, was also processed after being frozen for paired comparisons.
The OC frozen preparation of sample 9 was sequenced before and after FACS sorting (protocols D and E, respectively) (
Figure 1A and S1A). The rationale for sorting was to minimize the need for longer incubation and debris removal. This condition was tested only in frozen tissue since it has lower viability and higher nuclear remnants. FACS nuclei were selected based on 7-AAD signal (Figure S1B). A head-to-head comparison of nuclei, QC analysis, and clustering was performed.
Nuclei morphology was satisfactory throughout. Albeit the nuclei suspension for the FACS samples had no debris (Figure S1C), both samples performed similarly in the barcoding step. Furthermore, all QC values met the standard threshold and were consistent within samples (
Figure 3B). Clustering for GEX and ATAC was also similar between both frozen preparations (Figure S3C).
Considering all the samples processed according to protocol C (n=5) and protocol D/E (n=6), DNA accessibility and RNA expression are consistent throughout (
Figure 3A). Interestingly, frozen samples detected more unique genes per cell, ensuring greater sequencing depth and cell population representation (
Figure 3D). These results suggest that frozen tissue recapitulates the quality and biology of fresh tissue. In that manner, frozen tissue preparation became the standard of our laboratory for OC samples.
3.4. Comparison of the Transcriptomes Profiled from snATAC + snRNA and scRNA-Seq from a Matched Cohort
By performing both scRNA-seq and snATAC + snRNA assays on matched samples originating from the same individuals, we compared nuclear transcriptomic signatures against whole cell transcriptomes and their ability to capture immune cell subpopulations. scRNA profiles of 70,184 cells and snRNA profiles of 43,160 nuclei from matched PBMC samples from 18 patients were pooled. Dimensionality reduction using principal component analysis (PCA) and uniform manifold approximation and projection (UMAP), followed by cell type identification using SingleR, allowed us to identify the typical immune cell populations of B-cells, natural killer (NK) cells, CD4+ T-cells, CD8+ T-cells, CD14 monocytes, CD16 monocytes, and an intermediate monocytic population among others (
Figure 4A). The cell type identification method used here is a reference-based method wherein the transcriptomic profiles of individual cells are mapped to reference profiles of known cell types which in this case come from the Monaco immune data - bulk RNA-seq samples of sorted immune cell populations from GSE107011 [
28]. To evaluate the effect of the reference used for cell type identification, we compared two approaches for labeling cell types in the snRNA data - first using the Monaco reference and secondly by mapping the snRNA data onto the scRNA data and then transferring labels using Azimuth. Since the snRNA and scRNA data come from matched samples, the latter approach should be able to make more biologically meaningful mapping of cell states. Both snRNA and scRNA captured a very similar distribution of cell types, with subtle differences between the two cell type identification methods (
Figure 4A, B and S4A). We observed a decreased proportion of NK cells in snRNA when using scRNA as a reference compared to the Monaco reference, making the data more in line with scRNA proportions. However, when comparing the distribution of CD4+ T-cells and gamma delta T-cells, there was an increased proportion of both types in snRNA (using scRNA reference) that differed from both scRNA and snRNA (using Monaco reference). scRNA identified a larger proportion of CD8+ T-cells than snRNA in both approaches.
To better distinguish the two platforms in identifying cell type-specific transcriptomic signatures, we performed PCA on pseudo-bulk profiles obtained by aggregating the counts from each cell type from each platform. The PCA revealed that the first two principal components (PC) cluster the profiles by platform, while the third and the fourth PCs cluster transcriptomic signatures by major cell subpopulations (myeloid and lymphoid lineages) (
Figure 4C). This suggests that the batch effect between scRNA and snRNA is captured by the first two PCs, so removing these PCs should facilitate pooled data analysis from both platforms. This effect is consistent regardless of the reference for snRNA cell type identification.
In
Figure 4D, the number of cells labeled differently by the two methods of cell type identification were compared. Overall there is a high concordance between the two methods. We observe some cross-labeling of cell types within the lymphoid and the myeloid lineages suggesting that the two methods perform similarly when identifying broad transcriptomic signatures. But they differ in the resolution of subtle signals between cell subtypes. We also compared the number of overlaps between the markers of each cell type - genes that are significantly differentially expressed (p-value < 0.05) between cells of each cell type and all other cells across the two platforms (Figure S4B) with snRNA cell types identified using Monaco reference on the left and using scRNA reference on the right. High concordance in marker genes within the lymphoid and the myeloid lineages cell type was observed, reinforcing the previous inference that both cell type identification methods and platforms perform equally in resolving broad cell lineages. At the same time, there are subtle differences in capturing subpopulations.
On analyzing the correlation of gene expression, averaged by cell type and sample, between scRNA and snRNA, the genes were both positively correlated and negatively correlated (Spearman’s correlation test p < 0.05) (
Figure 4E and S4C). But positively correlated genes were, on average, expressed higher than the negatively correlated genes (
Figure 4F). This could mean that genes with a very low detection threshold are recognized with different sensitivity by the two platforms. All the marker genes were positively correlated (
Figure 4G), implying that the cell type-specific genes are detected similarly by both scRNA and snRNA.
4. Discussion
Performing ATAC and RNA sequencing from the same nuclei increases assay definition and ability to characterize cell states, including the activity of key transcription factors. As described here, it is critical to ensure good nuclei preparations, and different specimens will need specific dissociation steps. For example, nuclei could be directly extracted from PBMC (protocol A), but human tissue samples required a well-defined dissociation process.
Two different tissue digestion methods were tested in OC samples, one based on collagenase and the other using NP-40 as its major detergent (Methods). When performing the assay, the only step in evaluating sample quality is through visual evaluation of nuclei morphology. According to the standard (
Figure 2B), both methods could yield good-quality nuclei when analyzed under the microscope. Although the microfluidic system capture step failed with both protocols (
Figure 2) protocol B and sample 8 protocol C with 5 minutes incubation, it was more frequent in protocol B, resulting in two samples lost. Furthermore, the collagenase-based method (protocol B) had worse QC sequencing values than the NP-40-based method (protocols C, D, and E).
Poor performance in QC correlates with the inability to identify cells, hence poor sub-population characterization (Figure S3). This occurs mainly due to the extensive disposal of reads for improper barcoded nuclei or DNA and RNA fragments arising from contamination. Consequently, fewer cells achieve quality thresholds and can be used for downstream analysis (Figure S2B). Ensuring that the number of cells obtained is close to the target value (capture efficiency, Figure S2A) is also important for the optimal usage of sequenced reads. This refined protocol optimization is particularly important when dealing with scarce patient samples.
Beyond choosing the appropriate dissociation method, accessing optimal incubation time is needed. Sample 8 exemplifies that for the same specimen, the longer incubation time was decisive for the success of the experiment. Five minutes of incubation was not enough to dissociate the tissue (
Figure 2A). Consequently, capturing a single-cell suspension and barcoding was not ideal, jeopardizing all the following steps.
The methods proved to be very different when post-sequencing quality control statistics for both ATAC and GEX were compared. All samples from protocol B had their high-quality ATAC fragments lower than recommended threshold. The open chromatin region was adequately assessed when OC samples were processed using the complex tissue dissociation technique (with fresh and frozen tissue, protocols C, D, and E). The previous protocols also exceeded method B in the GEX parameters.
Notwithstanding the importance of Cell Ranger ARC parameters, careful interpretation is needed since sample biology influences the results. For example, the PBMC cohort had inadequate high-quality ATAC fragments per cell, most likely due to the nature of the samples rather than poor sample preparation, emphasizing context-specific interpretation of these scores. In addition, the raw QC statistics do not translate directly to sequencing quality; multifactorial analysis is needed.
The single-cell definition can be used to favor sample quality assessment. All the mentioned criteria can be applied to each individual cell (Figure S3). This separation allows QC filtering by cell, ensuring that only high-quality data is used for downstream analysis.
The snATAC + snRNA platform is at its full potential when used to characterize different cell populations that compose a particular tissue, embryonic development, disease process, or even inter-patient variability. Ensuring adequate preparation for frozen tissue increases assay versatility. Sample 9 paired comparison assured results reproducibility with frozen preparations (Figure S3C). In addition, the high-quality QC values for protocols D and E endorse the feasibility of handling frozen cancer tissues.
Furthermore, the implication of FACS sorting for frozen samples was investigated. Sample 9 was sequenced before and after FACS. There were no significant differences in sequencing QC results (
Figure 3), making FACS sorting an optional step. Nevertheless, the goal of FACS sorting in this paper was to increase sample purity. Yet, FACS is a powerful tool to separate different populations based on cell surface markers or ploidy [
35].
Lastly, since the snATAC + snRNA platform quantifies gene expression only from within the nucleus, a comparative analysis of data obtained from scRNA and snRNA measurements from matched samples was done to help inform better decisions in study design. Although similar in their output, scRNA, and snRNA approaches vary in identifying different cell types and their respective proportions. We show that adjusting the reference dataset used for the cell type identification, such as using matched scRNA data as a reference for snRNA cell type identification, may further align the two modalities in a biologically meaningful sense. The choice between scRNA and snRNA sequencing should be driven by the biological question, sample composition, and preservation as fresh or frozen tissue.
Each high throughput sequencing strategy has advantages and limitations. scRNA-seq measures the expression profile of the whole cell and preserves an intact membrane allowing for further downstream applications such as CITE-Seq, which can profile cell surface protein interactions [
36]. On the other hand, snRNA-seq is an especially beneficial approach when working with frozen samples, such as those from tissue biobanks and tissue samples that are difficult to dissociate. In a direct comparison of snRNA-seq and scRNA-seq of the adult kidney, Wu et al. showed that snRNA-seq achieves comparable gene detection and spatial information while eliminating dissociation-induced changes in gene expression [
37]. Andrews et al. [
38] showed that snRNA-seq could identify rare mesenchymal and precursor cell types in the healthy human liver but that certain subtypes of immune cells were only distinguishable in scRNA-seq. While the preparation process of working with scRNA-seq data is a significant source of variability [
39], nuclei provide greater RNA stability and have a reproducible expression of transcripts when compared to whole cells [
40].
In addition, using nuclei instead of whole cells can result in loss of information about important cellular processes that occur outside the nucleus. While there are differences between the transcripts obtained from the two platforms, our analysis suggests that both platforms equally capture the underlying cell type signatures, and it is possible to pool information from both platforms removing the batch effect by removing the principal components that correlate with the sequencing platforms. Moreover, we observe that highly expressed genes and marker genes are positively correlated between the two platforms. In contrast, genes that are at low detection levels are negatively correlated, suggesting that the two platforms vary in detecting specific low-expression genes. Careful consideration is necessary when choosing the platforms when the genes of interest are lowly expressed. In summary, both scRNA-seq and snRNA-seq are powerful tools for analyzing gene expression, and the choice between them depends on the specific research question and the type of cells being analyzed.
In summary, we analyzed how different nuclei preparation can impact the output of snATAC + snRNA sequencing for PBMC and OC samples. To compare methodologies, we utilized several parameters, such as nuclei morphology, sequencing depth, QC values per sample and for each cell, capture efficiency, and dimensionality reduction techniques, such as UMAP. To control for sample specificity, several paired comparisons between protocols were performed. Also, the PBMC cohort had both snRNA and scRNA sequencing results compared for differences in cell identification. Our results show how careful selection of an appropriate nuclei isolation protocol is needed for good sequencing results and the importance of context-based analysis of QC results. Finally, snRNA and scRNA results were consistent, especially considering cell specific markers and abundant transcripts.
Figure 1.
Overview of protocols and sequencing depth statistics. (A) Schematic of the main steps pertaining to the four single nuclei isolation protocols discussed. (B) Box plot showing the number of sequenced read pairs in ATAC and GEX, grouped by protocol. The middle line represents the median, the lower and upper edges of the rectangle represent the first and third quartiles, and the lower and upper whiskers represent the interquartile range (IQR) × 1.5. Outliers beyond the end of the whiskers are plotted individually. Protocol A, n = 18; Protocol B, n = 7, Protocol C, n = 5; Protocol D, n = 5.
Figure 1.
Overview of protocols and sequencing depth statistics. (A) Schematic of the main steps pertaining to the four single nuclei isolation protocols discussed. (B) Box plot showing the number of sequenced read pairs in ATAC and GEX, grouped by protocol. The middle line represents the median, the lower and upper edges of the rectangle represent the first and third quartiles, and the lower and upper whiskers represent the interquartile range (IQR) × 1.5. Outliers beyond the end of the whiskers are plotted individually. Protocol A, n = 18; Protocol B, n = 7, Protocol C, n = 5; Protocol D, n = 5.
Figure 2.
Representative images of single nuclei from the various protocols and samples examined in this study. (A) Protocol A shows nuclei isolated from PBMC (samples 14 and 28). Protocol B shows nuclei prepared from ovarian cancer tissues. Shown in order are examples of eliminated samples (inside red square) with undigested tissue aggregates (white arrow), lipid droplet contaminants (orange arrow), and debris (blue arrow), as well as a successful nuclei preparation (sample 7). Protocol C shows nuclei isolated with a 5 minutes incubation and 30 minutes incubation from sample 8. Protocol D shows nuclei preparation from samples 9 and 10. (B) Representative images of single nuclei organized in decreasing order of preparation quality were used as a standard. Scale bars throughout represent 50 µm.
Figure 2.
Representative images of single nuclei from the various protocols and samples examined in this study. (A) Protocol A shows nuclei isolated from PBMC (samples 14 and 28). Protocol B shows nuclei prepared from ovarian cancer tissues. Shown in order are examples of eliminated samples (inside red square) with undigested tissue aggregates (white arrow), lipid droplet contaminants (orange arrow), and debris (blue arrow), as well as a successful nuclei preparation (sample 7). Protocol C shows nuclei isolated with a 5 minutes incubation and 30 minutes incubation from sample 8. Protocol D shows nuclei preparation from samples 9 and 10. (B) Representative images of single nuclei organized in decreasing order of preparation quality were used as a standard. Scale bars throughout represent 50 µm.
Figure 3.
Quality control statistics comparing the protocols examined. (A) Estimated number of cells, confidently mapped read pairs (ATAC), a fraction of high-quality fragments in cells (ATAC), mean raw read pairs per cell (ATAC), number of peaks (ATAC), TSS enrichment score (ATAC), a fraction of transcriptomic reads in cells (GEX), mean raw reads per cell (GEX), median UMI counts per cell (GEX), median genes per cell (GEX), percent duplicates (GEX) and total genes detected (GEX) per sample grouped by protocol, shown as a boxplot where the middle line represents the median, the lower and upper edges of the rectangle represent the first and third quartiles and the lower and upper whiskers represent the interquartile range (IQR) × 1.5. Outliers beyond the end of the whiskers are plotted individually. Protocol A, n = 18; Protocol B, n = 7, Protocol C, n = 5; Protocol D, n = 5. (B) The fraction of high-quality fragments in cells (ATAC) and a fraction of transcriptomic reads in cells (GEX) are shown for samples 7, 8, and 9, for which paired protocol and different incubation time comparisons were made. (C) Fragment length distribution and mean TSS enrichment score - the ratio of fragments centered at the TSS to fragments in TSS flanking regions from all samples grouped by tissue and protocol. (D) Violin plots showing the number of unique genes detected (RNA), number of unique peaks detected (ATAC), and the number of reads mapping to mitochondrial genes (RNA) calculated per cell in all samples and grouped by protocol. Protocol A, n = 57817; Protocol B, n = 17277; Protocol C, n = 33759; Protocol D, n = 31623.
Figure 3.
Quality control statistics comparing the protocols examined. (A) Estimated number of cells, confidently mapped read pairs (ATAC), a fraction of high-quality fragments in cells (ATAC), mean raw read pairs per cell (ATAC), number of peaks (ATAC), TSS enrichment score (ATAC), a fraction of transcriptomic reads in cells (GEX), mean raw reads per cell (GEX), median UMI counts per cell (GEX), median genes per cell (GEX), percent duplicates (GEX) and total genes detected (GEX) per sample grouped by protocol, shown as a boxplot where the middle line represents the median, the lower and upper edges of the rectangle represent the first and third quartiles and the lower and upper whiskers represent the interquartile range (IQR) × 1.5. Outliers beyond the end of the whiskers are plotted individually. Protocol A, n = 18; Protocol B, n = 7, Protocol C, n = 5; Protocol D, n = 5. (B) The fraction of high-quality fragments in cells (ATAC) and a fraction of transcriptomic reads in cells (GEX) are shown for samples 7, 8, and 9, for which paired protocol and different incubation time comparisons were made. (C) Fragment length distribution and mean TSS enrichment score - the ratio of fragments centered at the TSS to fragments in TSS flanking regions from all samples grouped by tissue and protocol. (D) Violin plots showing the number of unique genes detected (RNA), number of unique peaks detected (ATAC), and the number of reads mapping to mitochondrial genes (RNA) calculated per cell in all samples and grouped by protocol. Protocol A, n = 57817; Protocol B, n = 17277; Protocol C, n = 33759; Protocol D, n = 31623.
Figure 4.
Comparison of the transcriptomes profiled from snATAC + snRNA (GEX) and scRNA-seq from a matched cohort. (A) UMAP projection showing 70184 cells from scRNA colored by cell types identified using SingleR with Monaco reference (left) and 43160 cells from snRNA, cell types identified using Monaco reference (middle) or using scRNA from the matched cohort as reference (right). The bar at the bottom shows the proportion of each cell type identified using each method. Treg, regulatory T-cells; gdT, gamma delta T-cells; MAIT, Mucosal-associated invariant T cells; NK, natural killer cells; Mono, monocytes; Int, intermediate; cDC, classical dendritic cells; pDC, plasmacytoid dendritic cells; HSPC, hematopoietic stem, and progenitor cells. (B) Radar plot showing the comparison of proportions of each cell type identified by each method. (C) Principal component analysis of the transcript counts aggregated by platform and cell type with snRNA cell types identified using Monaco reference (top) and scRNA from matched cohorts as reference (bottom). (D) Heatmap showing the log (base 10) of the number of cells that overlap cell type identification by the two references - Monaco and scRNA from matched cohort. (E) Histogram showing the distribution of genes significantly correlated (Spearman correlation, test for association p < 0.05) in expression (averaged over both sample and cell type) between scRNA and snRNA. (F) Log (base 10) average expression (averaged over all cells) of genes separated by positive and negative Spearman correlation coefficients. (G) Histogram showing the distribution of marker genes significantly correlated (Spearman correlation, test for association p < 0.05) in average expression between scRNA and snRNA.
Figure 4.
Comparison of the transcriptomes profiled from snATAC + snRNA (GEX) and scRNA-seq from a matched cohort. (A) UMAP projection showing 70184 cells from scRNA colored by cell types identified using SingleR with Monaco reference (left) and 43160 cells from snRNA, cell types identified using Monaco reference (middle) or using scRNA from the matched cohort as reference (right). The bar at the bottom shows the proportion of each cell type identified using each method. Treg, regulatory T-cells; gdT, gamma delta T-cells; MAIT, Mucosal-associated invariant T cells; NK, natural killer cells; Mono, monocytes; Int, intermediate; cDC, classical dendritic cells; pDC, plasmacytoid dendritic cells; HSPC, hematopoietic stem, and progenitor cells. (B) Radar plot showing the comparison of proportions of each cell type identified by each method. (C) Principal component analysis of the transcript counts aggregated by platform and cell type with snRNA cell types identified using Monaco reference (top) and scRNA from matched cohorts as reference (bottom). (D) Heatmap showing the log (base 10) of the number of cells that overlap cell type identification by the two references - Monaco and scRNA from matched cohort. (E) Histogram showing the distribution of genes significantly correlated (Spearman correlation, test for association p < 0.05) in expression (averaged over both sample and cell type) between scRNA and snRNA. (F) Log (base 10) average expression (averaged over all cells) of genes separated by positive and negative Spearman correlation coefficients. (G) Histogram showing the distribution of marker genes significantly correlated (Spearman correlation, test for association p < 0.05) in average expression between scRNA and snRNA.