A Genomic Language Model for Chimera Artifact Detection in Nanopore Direct RNA Sequencing

Oct 26, 2024·
Yangyang Li
,
Ting-You Wang
Qingxiang (Allen) Guo
Qingxiang (Allen) Guo
,
Yanan Ren
,
Xiaotong Lu
,
Qi Cao
,
Rendong Yang
· 1 min read
DOI
Abstract
Chimera artifacts in nanopore direct RNA sequencing (dRNA-seq) can significantly distort transcriptome analyses, yet their detection and removal remain challenging due to limitations in existing basecalling models. We present DeepChopper, a genomic language model that precisely identifies and removes adapter sequences from base-called dRNA-seq long reads at single-base resolution, operating independently of raw signal or alignment information to effectively eliminate chimeric read artifacts. By removing these artifacts, DeepChopper substantially improves the accuracy of critical downstream analyses, such as transcript annotation and gene fusion detection, thereby enhancing the reliability and utility of nanopore dRNA-seq for transcriptomics research.
Type
Publication
bioRxiv
Qingxiang (Allen) Guo
Authors
Postdoctoral Scholar | Cancer Genomics & AI
Postdoctoral scholar at Northwestern University developing computational approaches for cancer genomics. My research integrates long-read sequencing, structural variant analysis, and deep learning to decode the regulatory complexity of cancer genomes.
Authors
Authors