OBJECTIVES: CRISPR-Cas9 nucleases are widely used to introduce targeted DNA double-strand breaks (DSBs) for genome engineering, but the long-term impact of these lesions on local epigenetic information remains poorly characterized. In a companion research article, we used Cas9-assisted targeted nanopore sequencing (CTS) to reveal that CRISPR-Cas9-induced DSBs can disrupt local epigenetic maintenance across multiple genomic contexts and cell systems. Here, we present a structured description of the raw and minimally processed datasets underlying the study. These datasets provide base-resolution measurements of 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC) at the differentially methylated regions (DMRs) of several imprinted loci, two heterochromatic regions, a cancer-associated promoter epimutation region, and the SNRPN DMR at early/late passages of a clonal line. They enable re-analysis and methodological benchmarking of DSB-associated epigenetic instability.
DATA DESCRIPTION: We provide aligned BAM files and per-CpG methylation calls for multiple genomic contexts under both CRISPR-targeted and non-targeting control conditions. Specifically, the collection includes: (i) imprinted loci in human embryonic stem cells (hESCs), including small nuclear ribonucleoprotein polypeptide N (SNRPN), paternally expressed 10 (PEG10), and KCNQ1 opposite strand/antisense transcript 1 (KCNQ1OT1), (ii) heterochromatic regions in hESCs, including urothelial cancer associated 1 (UCA1), and cysteine rich C-terminal 1 (CRCT1)), (iii) the epimutation locus of MutL homolog 1 (MLH1) in RKO cells, and (iv) the DMR of SNRPN locus in early- and late-passage derivatives of a single hESC clone. For each collection, there is a dataset that includes both the raw aligned Nanopore sequencing reads (BAM) deposited in the NCBI Sequence Read Archive (SRA) and the corresponding processed per-CpG 5mC/5hmC matrices deposited in Zenodo. All higher-level analyses in the research article-such as DMR calling, haplotype-resolved analyses, and structural variant (SV) characterization-are fully reproducible using these deposited data. Additional processed analyses are comprehensively documented in the companion article and are therefore not duplicated here. Together, these datasets offer a rich resource for benchmarking long-read methylation analysis workflows and further investigation of DSB-associated epigenetic instability across diverse genomic contexts.