TrAnnoScope: A Modular Snakemake Pipeline for Comprehensive Full-Length Transcriptome Analysis and Functional Annotation
How to cite: Pektas, A.; Panitz, F.; Thomsen, B. TrAnnoScope: A Modular Snakemake Pipeline for Comprehensive Full-Length Transcriptome Analysis and Functional Annotation. Preprints 2024, 2024110489. https://doi.org/10.20944/preprints202411.0489.v1 Pektas, A.; Panitz, F.; Thomsen, B. TrAnnoScope: A Modular Snakemake Pipeline for Comprehensive Full-Length Transcriptome Analysis and Functional Annotation. Preprints 2024, 2024110489. https://doi.org/10.20944/preprints202411.0489.v1
Abstract
Transcriptome assembly and functional annotation are essential for understanding gene expression and biological function. Nevertheless, many existing tools lack the flexibility to integrate both short- and long-read sequencing data or fail to provide a complete, customizable workflow for transcriptome analysis. Here, we present TrAnnoScope, a comprehensive transcriptome analysis pipeline that provides a complete, customizable workflow capable of efficiently processing and integrating short- and long-read sequencing data to generate high-quality, full-length transcripts with detailed functional annotation. The pipeline encompasses steps from quality control to functional annotation, employing a range of tools and established databases, such as SwissProt, Pfam, Gene Ontology (GO), and Kyoto Encyclopedia of Genes and Genomes. As a case study, TrAnnoScope was applied to RNA-Seq data from zebra finch brain, ovary, and testis tissues. TrAnnoScope produced comprehensive transcriptome, demonstrating strong alignment with the reference genome (99.63%) and capturing a significant proportion of nearly complete protein sequences (92.7%). The functional annotation process yielded extensive matches to known protein databases and successfully assigned relevant functional terms to majority of the transcripts. As such, TrAnnoScope successfully integrates multiple sequencing technologies to generate comprehensive transcriptomes with minimal user input. Its modular design, flexibility, and ease of use make it a valuable tool for researchers working with complex datasets, particularly for non-model organisms.
Keywords
RNA-Seq; reproducible pipeline; high-performance computing (HPC); transcriptome analysis; functional annotation; ISO-seq; snakemake; long-read sequencing
Subject
Copyright: This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Comments (0)
We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.
Leave a public commentSend a private comment to the author(s)