We use cookies to improve your experience. By continuing to browse this site, you accept our cookie policy.×
ReportsOpen Accesscc iconby iconnc iconnd icon

Target-seq: single workflow for detection of genome integration site, DNA translocation and off-target events

    Pei-Zhong Tang

    *Author for correspondence:

    E-mail Address: pei-zhong.tang@thermofisher.com

    Thermo Fisher Scientific, Inc., MA, USA

    ,
    Bo Ding

    Thermo Fisher Scientific, Inc., MA, USA

    ,
    Christopher Reyes

    Thermo Fisher Scientific, Inc., MA, USA

    ,
    David Papp

    Thermo Fisher Scientific, Inc., MA, USA

    &
    Jason Potter

    **Author for correspondence:

    E-mail Address: Jason.Potter@themofisher.com

    Thermo Fisher Scientific, Inc., MA, USA

    Published Online:https://doi.org/10.2144/btn-2023-0013

    Abstract

    Designed donor DNA delivery through viral or nonviral systems to target loci in the host genome is a critical step for gene therapy. Adeno-associated virus and lentivirus are leading vehicles for in vivo and ex vivo delivery of therapeutic genes due to their high delivery and editing efficiency. Nonviral editing tools, such as CRISPR/Cas9, are getting more attention for gene modification. However, there are safety concerns; for example, tumorigenesis due to off-target effects and DNA rearrangement. Analysis tools to detect and characterize on-target and off-target genome modification post editing in the host genome are pivotal for evaluating the success and safety of gene therapy. We developed Target-seq combined with different analysis tools to detect the genome integration site, DNA translocation and off-target events.

    Method summary

    Target-seq is a modification of a method previously developed for CRISPR off-target detection. This simplified version enables genome-wide off-target detection, as well as DNA translocations and genome integration sites following donor DNA delivery through viral- and nuclease-mediated methods. The modified method can also be scaled up for high-throughput analyses in a 96-well format.

    Gene editing and therapy has become a promising technique that introduces or edits a gene to prevent or cure a disease or medical disorder. Donor DNA containing the desired change is delivered through viral or nonviral systems to the target loci in the host genome in the initial and critical step. For large donor or gene delivery, lentivirus [1–3] and adeno-associated virus (AAV) [4,5] are the vehicles of choice due to their large donor insertion capacity and robust delivery efficacy. Lentiviral vectors can hold up to 8 kb donor DNA [6], which is randomly integrated into the host genome to create a long-term effect, ideal for generation of a stable cell or animal model or gene replacement therapy [7,8]. Despite the significantly improved safety profile of modern lentiviral vectors in gene therapy trials, the unintended potential insertion of the vector into host genomes in transcriptionally active sites harbors an inherent genotoxic potential, especially in patients with pre-existing somatic mutations [9,10]. The modern recombinant AAV vector for gene therapy application has the capacity to insert up to 4 kb donor DNA, and its virus can infect both dividing and quiescent cells and persist in an extrachromosomal state without integrating into the genome of the host cell [5,11,12]. Interestingly, the early studies on AAV1-LPL(S447X) intramuscular injection showed controversial results regarding the random nuclear integration and hotspots in mitochondria [4–6]. Although hundreds of US FDA-approved lentivirus- or AAV-based gene therapy clinical trials are either completed or in progress [7,10,12–14], the possible viral integration concern still needs to be addressed.

    Non-viral-based genome editing tools including CRISPR/Cas9 [15,16] and transcription activator-like effector nucleases (TALENs) [17,18] are applied to directly modify genes through nonhomologous end joining, an error-prone process that produces insertions/deletions disrupting a target gene. If a repair donor DNA template with homology arms was supplied, the break could be repaired according to this template, allowing for precise gene editing in the host genome [19–21]. However, these nucleases not only cleave and create a double-strand DNA break (DSB) at the on-target site, but also carry the risk of making a DSB at off-target (OT) sites that contain similar sequences [22–25]. Even at the on-target site, DSBs could cause DNA rearrangements including DNA translocations (DTs) and large deletions, especially when trying to edit two or more genes simultaneously [26]. Several clinical trials applied this double knockout (DKO) strategy for TRAC and CD52 or TRAC and B2M to generate allogeneic off-the-shelf chimeric antigen receptor (CAR) T cells for cancer therapy [25–28].

    An analysis tool to detect and characterize where the donor DNA integrated, as well as whether any OT events or on-target DNA rearrangement occurred, is critical for judging the success of gene delivery systems and their potential therapeutic effect. Although different detection methods have been developed, they have been developed with a particular application in mind. For example, GUIDE-seq was developed for CRISPR-induced OT detection in cells [29,30], Linear amplification-mediated PCR [31–33] and targeted locus amplification were developed to localize large DNA or transgene integration sites [34], and quantitative PCR (qPCR) for DTs [26]. Here we developed a next-generation sequencing (NGS)-based workflow, called ‘Target-seq’, that can be used for the various applications described above. We developed different analysis tools for each application. The genome integration site (GIS) tool detects the integration site and copy number of donor DNA delivered in host cells, while two additional tools permit the discovery and relative quantitation of OTs and DTs. We demonstrated Target-seq results in three therapeutic applications: GISs of the two important transgenes (Cas9 and CAR) delivered in induced pluripotent stem cells (iPSCs) and primary T cells through lentivirus and AAV; nuclease-induced on-target DTs in TRAC and CD52 DKO primary T cells; and genome-wide OT profiles from different Cas9 proteins on a broad panel of gRNAs targeting therapeutically relevant genes.

    Materials & methods

    Primary T cells

    Peripheral blood mononuclear cells (PBMCs) were purified from leukopak (ALLCELLS, CA, USA) using CTS™ Rotea™ Counterflow Centrifugation System (Thermo Fisher Scientific, MA, USA) and were then activated for 3 days to enrich the T cells (at 1 × 106 cells/ml) with CTS Dynabeads™ CD3/CD28 in CTS OpTmizer™ T-Cell Expansion SFM medium containing 100 U/ml CTS IL-2 recombinant human protein, 6 mM GlutaMAX™ supplement and 2% CTS Immune Cell SR or human AB serum (Thermo Fisher Scientific). Before electroporation, the cells were separated from the magnetic beads using a magnetic separator for 1–2 min. Finally, the supernatant containing the T-cells was transferred into a fresh tube.

    SpCas9 proteins, sgRNA, TALEN & double-strand DNA tag & adaptors

    The wild-type SpCas9 was purchased from Thermo Fisher Scientific. The Sniper HiFi Cas9 [35] nucleotide sequence for protein expression in bacteria was synthesized by GeneArt (Thermo Fisher Scientific) and protein was purified in-house using three resins (Thermo Fisher Scientific) packed in spin columns: POROS heparin, POROS 50 HQ and POROS 50 HS. The Alt-R® HiFi-Cas9 and TrueCut™ HiFi Cas9 were purchased from Integrated DNA Technologies (IA, USA) and Thermo Fisher Scientific, respectively.

    The sgRNA, TALEN mRNA, primers and adaptor oligos were ordered from Thermo Fisher Scientific. The sgRNA and TAL sequences are listed in Supplementary Table 1. The primers for GIS of Lenti-Cas9 Target-seq and validation are listed in Supplementary Table 2. The primers for GIS of AAV-CAR Target-seq and validation are listed in Supplementary Table 3. The oligo sequences for double-strand DNA tag (dsTag) and barcoded adaptor A (BC-A) and primers for OT detection are listed in Supplementary Table 4. The primers of Target-seq and validation for DT are listed in Supplementary Table 5. The dsTag and BC-A adaptors were annealed in a reaction containing 1 × TE buffer, 100 mM NaCl and 50 µM of top and bottom oligos at 85°C for 5 min and gradually cooled down to room temperature in 20–30 min.

    Generation of stable lentiviral-SpCas9 expression iPSC line

    The SpCas9 expression sequence was synthesized by Thermo Fisher Scientific and cloned into pLenti6 expression vector using pLenti6/V5™ Directional TOPO® Cloning Kit (Thermo Fisher Scientific). The virus was packaged using ViraPower™ Lentiviral Packaging Kit and ViraPower 293FT™ Cell Line (Thermo Fisher Scientific). The viral titer was determined by qPCR. The parental iPSC line was maintained using StemFlex™ medium (Thermo Fisher Scientific) in a rhLaminin-521-coated culture plate (Thermo Fisher Scientific). The virus was transduced into iPSC parental cells with 1:1 multiplicity of infection in a 24-well plate seeded 1 × 105 iPSCs per well and filled with 0.5 ml of StemFlex medium and StemFlex supplement with 8 µg/ml of polybrene. The plate was spun at 200 × g for 15 min to improve transduction efficiency. The plate was incubated at 37°C, 5% CO2 for 24 h, then changed to StemFlex medium and incubated for another 48 h. Then we began selection, using StemFlex medium with puromycin (0.5 µg/ml) until single colonies were formed. After 2 weeks of cell expansion, several clones were picked for validation of Cas9 expression. Western blot was used for Cas9 protein expression. The clone with highest western blot expression was picked for karyotype analysis using KaryoStat™ (Thermo Fisher Scientific) and for pluripotency analysis using PluriTest® (Thermo Fisher Scientific). GIS of Cas9 integration in iPSCs was detected using Target-seq as described below.

    For validation of GIS, we designed primers flanking both putative GISs for Lenti-Cas9 (Supplementary Table 2). For full-length insert amplification, long-range PCR was performed using Phusion Hot Start II DNA Polymerase (Thermo Fisher Scientific) and primers from the host genome upstream and downstream of the GIS. The long-range PCR started with initial denaturation at 98°C for 30 s, followed by a 24-cycle (four cycles for each annealing temperature) touchdown program: 98°C for 5 s, different annealing temperatures (66, 64, 62, 60 and 58°C) for 10 s and 72°C for 2 min. After 24 cycles of touchdown program with different annealing temperatures, another 16 cycles were run comprising 98°C for 5 s, 56°C for 10 s and 72°C for 2 min. The final extension step was 72°C for 3 min. For the junction PCR at the 5′ and 3′ ends, the same Phusion Hot Start II DNA Polymerase and touchdown program were used except that the time of extension was changed to 40 s. The PCR products were resolved onto 2% E-Gel™ agarose gel (Thermo Fisher Scientific).

    CAR donor delivered by nuclease-mediated AAV into TRAC in T cells

    The CAR CD19-P13K expression sequence was ordered from Thermo Fisher Scientific and cloned into pAAV expression vector at NheI and HindIII sites. The 500-bp homologous arm from TRAC around the DSB site was cleaved by CRISPR/Cas9 (TRAC sgRNA-0) (Supplementary Table 3) near the translation start site. The TALEN cleavage site using TRAC-TALEN (Supplementary Table 3) was 10 bp downstream of the CRISPR/Cas9 site. The AAV was produced using the Gibco™ AAV-MAX Helper-Free AAV Production System Kit (Thermo Fisher Scientific) following the manufacturer’s protocol. The viral titer was determined by qPCR.

    The delivery of CAR donor into TRAC was carried out in two steps. The first step was making a DSB in the target site (near the translation ATG) of TRAC in primary T cells; the second was adding AAV-CAR donor into the T cells edited by CRISPR/Cas9 or TALEN. For the first step, either CRISPR/Cas9 ribonucleoprotein (7.5 pmol gRNA and 1.25 µg TrueCut Cas9v2) or TALEN (0.25 µg mRNA from each TALEN, left and right) was electroporated into 0.5 × 106 T cells using the Neon system (Thermo Fisher Scientific) following the manufacturer’s protocol. After 15–30 min, AAV donor (multiplicity of infection: 45,000) was added to the T cells transfected with CRISPR/Cas9 or TALEN. The T cells were harvested after 72 h and analyzed using Attune™ (Thermo Fisher Scientific) for TRAC knockout using an anti-TCR monoclonal antibody and for CAR knock-in using an anti-V5 antibody (Thermo Fisher Scientific). The T cells were collected for GIS of AAV-CAR knock-in using Target-seq as described below.

    For validation of GIS, we designed primers flanking both putative GISs of AAV-CAR (Supplementary Table 3). The same long-range PCR used for Lenti-Cas9 described above was performed for the full-length insert amplification of AAV-CAR.

    Genome-wide OT profiling for evaluation of different HiFi-Cas9 proteins

    To test whether Target-seq (simplified TEG-seq) (Figure 1A) maintained the same sensitivity and specificity for OT detection as TEG-seq (Tag-Enriched GUIDE-seq) [36], we chose HEK site4 sgRNA (Supplementary Table 1), from which high OTs were detected [29,36] and performed both Target-seq and TEG-seq. The major difference of the two workflows was that in the Target-seq, we eliminated the last step in TEG-seq, i.e. the enrichment of last PCR product from BC-A and P1 primers using a biotinylated oligo complementary to the tail sequence of BC-A adaptor and captured by streptavidin magnetic bead selection [36]. The comparison test showed that Target-seq obtained a similar OT profiling to that from TEG-seq (data not shown). We then used Target-seq to evaluate three HiFi-Cas9 proteins (Sniper, Alt-R HiFi-Cas9 and TrueCut HiFi Cas9) on HEK4 in HEK293 cells. We also performed 96-well-format Target-seq on 21 sgRNAs (Supplementary Table 1) in T cells, targeting four therapeutically relevant genes (TRAC, TRBC, CD52 and PD1) for broader evaluation on the two HiFi-Cas9s (Alt-R HiFi-Cas9 and TrueCut HiFi Cas9) and wt-Cas9. Some high OT sgRNAs (TRBC-4, PD1-4 and PD1-5) were purposely designed and chosen using the CRISPOR design tool [37] to facilitate OT detection and comparison between the two HiFi-Cas9 proteins.

    Figure 1. Diagram of Target-seq workflow and data analysis.

    (A) Target-seq library preparation workflow: (1) Target-specific primer design in target region. (2) Genomic DNA fragmentation to the size range (200–800 bp) using Ion Shear™, an enzyme-based partial digestion. (3) Ligation of a universal adaptor (P1) to DNA fragments. (4) Two rounds of PCR. The first PCR was performed in separate tubes for forward and reverse directions using P1 and TSP1 primers. The second round of PCR used P1 and a nested 5′ PHO (phosphorylated) target-specific primer (*TSP2) that generated the PCR product containing a 5′ phosphate only in the *TSP2-F/P1 or *TSP2-R/P1 products, but not in the P1/P1 product. (5) Ligation of a barcoded adaptor A (BC-A) to the *TSP2-F/P1 or *TSP2-R/P1 products, but not to the P1/P1 product. (6) Pooling of the barcoded samples and bulk amplification of the target using adaptor BC-A- and P1-complementary primers. (7) Next-generation sequencing. (B) Data analysis workflow for genome integration site. (C) Data analysis workflow for off-target. (D) Data analysis workflow for DNA translocation.

    DT detection in double gene-knockout T cells

    To detect DT by using Target-seq, we created DKO samples in T cells, in which two sgRNAs (TRAC sgRNA1 and CD52 sgRNA5) (Supplementary Table 1) were co-delivered with the wild-type SpCas9 protein using the Neon electroporation system (Thermo Fisher Scientific). The T cells were harvested 3 days post-transfection. The gDNA was extracted and applied for Target-seq using four target-specific (TS) primers (Supplementary Table 5) designed at about 100 bp upstream and downstream of the cleavage sites for both TRAC and CD52 genes. The detail of the Target-seq procedure is described below.

    To measure the overall percentage of DTs and repair in cis between TRAC and CD52 in the DKO samples, multiplex PCR was performed using Phusion Hot Start II DNA Polymerase and four primers, TRAC-Left (TL), TRAC-Right (TR), CD52-Left (CL) and CD52-Right CR) (Supplementary Table 5) were mixed in one reaction. The mixed amplicons were sequenced using Ion Torrent NGS S5 (Thermo Fisher Scientific). The reads were aligned to a set of references containing two in cis (wild-type) references (TRAC and CD52) and four artificially cross-translocated references (TRAC-L × CD52-L; TRAC-L × CD52-R; TRAC-R × CD52-L; and TRAC-R × CD52-R). The percentages of amplicons aligned to the six references were calculated.

    The validation of DT was performed by PCR using Phusion Hot Start II DNA Polymerase and different combinations of the four primers (TL, TR, CL and CR). The PCR products from different combination of primers were resolved on 2% EX agarose gels.

    Target-seq sample preparation & NGS workflow

    Cells were harvested from the above experiments, and gDNA was isolated using PureLink™ Genomic DNA Mini Kit or PureLink Pro 96 Genomic DNA Purification Kit (Thermo Fisher Scientific). Target-seq library preparation was performed in 96-well format using Ion Xpress™Plus gDNA Fragment Library Preparation reagents (Thermo Fisher Scientific). The workflow is described in Figure 1 and included seven steps as follows, followed by data analysis. (1) Target-specific primer design. (2) gDNA fragmentation to obtain fragments of 200–800 bp for short-read NGS using Ion-Shear, an enzyme-based partial digestion. (3) Ligation of a universal adaptor (P1) to the DNA fragments. (4) Two rounds of PCR, with the second round of PCR using a nested 5′ phosphate target-specific primer paired with the P1 primer. (5) Ligation of a barcoded adaptor A (BC-A) specifically to the amplicon that contains the 5′phosphate-ligated target sequence. (6) Pooling of the barcoded samples and bulk amplification of the target-containing amplicons using adaptor BC-A and P1 complementary primers. (7) NGS on Ion GeneStudio™ S5 Prime system (Thermo Fisher Scientific).

    • Primer design is critical for Target-seq. The main goal of Target-seq is to amplify and sequence the DNA fragment that contains a partial sequence from relevant donor or the vector containing donor and partial sequence from receiver (usually host genome). There are two important notes to consider when designing the primers: primer location and primer modification. The primer location varies depending on different applications and sequencing platforms. We used Geneious Prime (Biomatters, MA, USA) for primer design. For OT detection, two primers from each direction (forward and reverse) in the dsTag region were designed. For GIS, two to four primers were designed near the region of predicted GIS sequence (e.g., the long terminal repeat [LTR] region for Lenti-Cas9 integration or the inverted terminal repeat [ITR] and homology arm for AAV-CAR delivery). The proximal distance between the TS primer and the expected GIS should be in the range of 50–200 bp if using a short-read sequencing platform (e.g., Illumina and Ion Torrent) because their read length is 100–400 bp, from which the other half of the read could contain a 50- to 200-bp sequence from the host genome for GIS identification. The distance between target-specific primers and GIS could be longer and more flexible if using a Sanger or long-read sequencing platform. The drawback to using Sanger sequencing is that the low output reads may not be sensitive enough to detect GIS. In a case where the GIS cannot be predicted from the donor DNA template, for example, with random integration of a plasmid, we suggest designing more primers (e.g., two primers for every 300–500 bp). For DT, which usually occurs in multiplex-edited samples, two forward and reverse primers should be designed from each edited site. The second consideration for primer design is primer modification. The primer used for the first PCR should not be modified. The primers used for the second (nested) PCR need to be modified with a 5′ phosphate that facilitates the ligation between the target-containing PCR product and barcoded adaptor A (BC-A).

    • gDNA fragmentation. gDNA was fragmented using Ion Shear™ Plus Enzyme Mix (Thermo Fisher Scientific). Two reactions were set up for each sample, with two different incubation times (2 min and 4 min) at 37°C. Each reaction contains 25 µl with 1× reaction buffer, 2.5 µl Ion Shear Plus Enzyme Mix and 200–500 ng of gDNA. After 2 or 4 min of incubation at 37°C, the reactions were immediately cooled to 4°C and 2.5 µl (1/10) of 0.5 M EDTA added to stop the reaction. The size range was checked by loading 2.5 µl reaction with 7.5 µl of water onto a 2% EX gel. The size range of the majority of fragments should be 200–800 bp. Fragments from the two (2 and 4 min) reactions were then pooled; they can be further size-selected using AMPure XP beads (Beckman Coulter, CA, USA) following the two-step (0.5× volume and 1.25× volume) protocol from the manufacturer’s manual to further enrich for DNA fragments in the 200- to 800-bp range. After AMPure XP bead selection, the DNA fragments were eluted in 20 µl low TE buffer (10 mM Tris-HCl, pH 8.0 and 0.1 mM EDTA).

    • P1 adaptor ligation. If using the Ion Torrent platform for NGS, the P1 adaptor (Thermo Fisher Scientific) can be ligated to gDNA fragments in a 50-µl reaction containing 10 µl 5× ligase buffer, 1 µl dNTP mix (10 µM), 1 µl Ion P1 adaptor, 5 units of T4 DNA ligase, 1 µl of Platinum™ Tfi Exo(-) DNA polymerase and 10 µl (50–100 ng) of gDNA fragments. The reaction is incubated at 25°C for 15 min and ligase inactivated at 75°C for 5 min.

    • Target-containing sequence amplification was performed in two PCR reactions. The first PCR was performed using a TS primer located within the target region and a P1 primer ligated to the end of the DNA fragment in a 12.5-µl reaction containing 2.5 µl of 5× Ion AmpliSeq HiFi Mix (Thermo Fisher Scientific), 2 µl of P1 primer (1 µM), 2 µl of F1 or R1 primer (1 µM), 2 µl of P1-ligated DNA fragment and 4 µl water. The PCR program started with denaturation at 99°C for 2 min, then 17 cycles of 99°C for 15 s and 60°C for 60 s, with the final extension at 60°C for 5 min. The second PCR was performed using Phusion Hot Start II DNA Polymerase and nested 5′ phosphate primers and P1 primer in a 12.5-µl reaction containing 2.5 µl of 5× Phusion Green HF Buffer, 0.25 µl of 10 mM dNTP mix, 0.375 µl of 100% DMSO, 0.125 µl of Phusion HF polymerase, 2 µl of 5′ PHO nested F2 or R2 primer (1 µM), 2 µl of P1 primer (1 µM), 1 µl of the first PCR product and 4.25 µl of water. To reduce nonspecific amplification, we used a touchdown PCR program; this started with initial denaturation at 98°C for 2 min, followed by a ten-cycle (two cycles for each annealing temperature) touchdown program of 98°C for 15 s, different annealing temperatures (66, 64, 62, 60 and 58°C) for 15 s and 72°C for 20 s. After ten cycles of the touchdown program with different annealing temperatures, the run continued with 15 cycles of 98°C for 15 s, 56°C for 15 s and 72°C for 20 s. The final extension step was 72°C for 5 min. The PCR products were cleaned, and size-selected using AMPure XP beads following the two-step (0.5× volume and 1.25× volume) manufacturer’s protocol.

    • Ligation of a barcoded adaptor A (BC-A). Using the Ion Torrent NGS platform, the barcoded IonXpress adaptor A (Thermo Fisher Scientific) was specifically ligated to the amplicon containing the 5′ phosphate in a 15-µl reaction containing 3 µl of 5× ligase buffer, 0.3 µl dNTP mix (10 mM), 1.2 µl of IonXpress barcoded adaptor A (1:4 diluted), 0.3 µl of T4 DNA ligase (5 units/µl), 1 µl of Platinum Tfi Exo(-) DNA Polymerase and 2.8 µl of cleaned nested PCR product. The reaction was incubated at 16°C for 20 min, then at 25°C for 20 min, and the ligase inactivated at 75°C for 20 min.

    • All barcoded samples were pooled for bulk amplification. The barcode adaptor A-ligated products can be pooled based on the requirements of sample number, application and NGS read output. We usually make two pools, one for the forward-primer reaction and one for the reverse-primer reaction, because we find that the size ranges are sometimes quite different between the two, especially for OT detection, which creates high bias for the output read number from the forward and reverse reactions. For each pool, we usually combine 10–12 barcoded samples. Once the samples were pooled, they were cleaned and size-selected using AMPure XP beads following the two-step (0.5× volume and 1.25× volume) manufacturer’s protocol. After this bead cleanup, the barcode-ligated DNA fragments were eluted in 20 µl low TE buffer. The final bulk NGS-PCR was performed in a reaction containing 100 µl of Platinum PCR SuperMix High Fidelity, 5 µl of 25× Ion A and P1 primer mix and 5 µl of pooled DNA sample. The PCR program started with denaturation at 95°C for 5 min, then 14 cycles of 95°C for 15 s, 58°C for 15 s and 70°C for 30 s, with the final extension step at 70°C for 5 min. The PCR products were cleaned up using AMPure XP beads following the one-step (1.25× volume) protocol of the manufacturer’s manual, and the final DNA elution was performed in 30 µl of low TE buffer (10 mM Tris-HCl, pH 8.0 and 0.1 mM EDTA).

    • The Ion Chef™ (Thermo Fisher Scientific) and GeneStudio S5 system was used for NGS. The Ion 540™ Chip Kit was used, which usually generates 30–60 million reads. The concentration of the final sample was measured using the Qubit™ dsDNA High Sensitivity kit (Thermo Fisher Scientific). The sample was diluted to 200 pM with low TE buffer. Equal amounts (25 µl) of forward (containing 10–12 barcoded reactions) and reverse (containing 10–12 barcoded reactions) pooled samples (200 pM) were combined and loaded onto each well of the Ion Chef reagent cartridge for emulsion amplification, then loaded to one 540 Chip to generate 30–60 million reads per chip; in this way, each barcoded reaction is represented by about 1 million reads.

    For Target-seq data analysis, we developed three data analysis workflows because GIS, OT and DT focus on different targets.

    For the GIS workflow, the focus is to identify donor vector–host genome hybrid reads which requires two-step alignment. First, we aligned all reads to the donor vector with TMAP aligner (v. 5.12) from the Torrent Suite software (Thermo Fisher Scientific) using default settings. We used the donor vector for the first step because it is much smaller than the host genome and aligned faster. The partially aligned reads were selected based on alignment flag = 0 (forward aligned) or 16 (reverse aligned) and Concise Idiosyncratic Gapped Alignment Report (CIGAR; https://github.com/samtools/hts-specs) code (soft-clipping >25 bp). Second, the soft-clipped parts of these partially aligned reads were aligned to the host genome again with TMAP aligner using the default settings. Finally, if the reads aligned to the host genome (alignment flag = 0 [forward aligned] or 16 [reverse aligned] and aligned length >25 bp), they were collected and considered as putative GIS reads. The read numbers at each clustered genome location were counted. The genome location, 30-bp conjunction sequence and read counts for each putative GIS were output in a csv file.

    For the DT workflow, as in the GIS analysis, the goal is to identify hybrid reads. Here, however, the hybrid reads contain sequences from different genomic loci. In the case of DKO sample of TRAC and CD52, these were located at Chr14:23016459 for TRAC and Chr1:26644549 for CD52. So, similar to GIS, we first aligned the reads to the reference genome with TMAP (default setting) and selected partially aligned reads (alignment flag = 0 [forward aligned] or 16 [reverse aligned] and CIGAR code [soft-clipping >25 bp]. Secondly, the soft-clipped parts of the selected reads were aligned to the reference genome with TMAP (default setting). If the reads aligned to the reference genome (alignment flag = 0 [forward aligned] or 16 [reverse aligned] and aligned length >25 bp), they were identified as hybrid reads at the different genomic loci. The genome location, 30-bp conjunction sequence and read counts for each putative DT were output in a .csv file.

    The OT detection workflow is a motif-based search, in which the on-target gRNA binding site served as motif. If the read contained a sequence similar to that of the motif, the genome location and read number were calculated and output as OT candidates. The Motif Search plug-in can be set for PAM (protospacer adjacent motif) sequence (consensus: NGG and non-consensus: NAG and NGA) and mismatch number. The settings for this study were seven mismatches for NGG and five mismatches for NAG and NGA. A Microsoft Excel® formula was used to filter more background of random reads; for example, the RPM (Reads Per Million) <10 RPM). If RPM >10, but not appearing in at least two replicate samples, it would be still considered as random background. Final OTs were determined by high, medium and low confidence based on mismatch number, reproducibility in multiple samples and RPM from the final manually analyzed Excel file. For OT candidates with low-to-medium confidence, we validated the results by targeted amplicon-seq validation, amplifying the target region for each OT hit with target-specific primers and verifying the existence of indels.

    Results & discussion

    Target-seq workflow

    We previously developed the TEG-seq workflow for detection of in cellulo CRISPR-induced OTs using a short-read NGS method [36]. Here we simplified the workflow and named it ‘Target-seq’ (Figure 1A) because it can be applied to detect different ‘target’ DNA sequences that either exist natively or artificially integrated into the host genome. The target sequence to be detected in the host genome can be a known viral vector donor DNA, a known DNA tag inserted at OT sites, or a known sequence near DT sites. The brief workflow includes: first, target-specific primer design; second, gDNA fragmentation to the size range (200–800 bp) for short-read NGS using enzyme-based partial digestion; third, ligation of a universal adaptor (P1) to DNA fragments; fourth, two rounds of PCR – the first PCR performed in two separate tubes for forward and reverse directions using P1 and TSP1 target-specific (forward and reverse) primer (TSP1), and the second using the nested 5′ phosphate target-specific (forward and reverse) primer (*TSP2) paired with a P1 primer that generates the PCR product containing a 5′ phosphate in the *TSP2/P1 products but not the P1/P1 product; fifth, ligation of a barcoded adaptor (BC-A) to the *TSP2 /P1 product but not to the P1/P1 product; sixth, pooling of barcoded samples followed by bulk amplification of the target-containing fragments using adaptor BC-A and P1 complementary primers; and finally, NGS sequencing followed by data analysis using various tools.

    To facilitate library preparation in a 96-well format, we tested and simplified our original TEG-seq workflow [36]. We found that the third round of PCR – using a barcoded adaptor A (BC-A) tailing primer paired with P1 primer after step 5 above, followed by biotinylated oligo/streptavidin bead enrichment – was a bottleneck step for high-throughput library preparation and resulted in little improvement in the specificity enrichment. We omitted these two steps from the original TEG-seq workflow [36] and found that using a 5′ phosphate target-specific primer (*TSP2) for the second round of PCR in step 4 was critical to reduce P1/P1 PCR and other unwanted products and thus increased the detection sensitivity.

    Data analysis tools were developed based on different applications. The workflow of GIS analysis (Figure 1B) is based on two steps of read alignment. The first step is aligning reads to the donor sequence and collecting the reads that contain both the donor sequence and an unaligned sequence (soft-clip sequence). The second step is aligning these soft-clip sequences to the host genome. The generated results show a list of vector–genome hybrid reads revealing their GIS sequence and genomic location. The workflow of OT analysis (Figure 1C) is a one-step sequence alignment of the 20-base CRISPR gRNA on-target binding sequence to the corresponding genome, followed by collecting reads that contain similar sequences to the on-target sequence with up to seven mismatches. These reads are then filtered out from reads without a PAM sequence. The potential OTs are then counted and categorized as high, medium or low confidence based on replication appearances between samples, mismatched bases, consensus PAM, location of mismatch and read count. The workflow of DT analysis (Figure 1D) is also based on two steps of read alignment. First, the reads are aligned to the corresponding genome and collected if they contain a partially aligned sequence and partial unaligned sequence (soft-clip sequence). Second, the soft-clip sequences are aligned to the same corresponding genome again. Finally, the reads that contain sequences from two distinct genomic loci are collected and counted.

    To detect low-frequency events of GIS, OT and DT, we used a short read-based (100–400 bp) NGS platform (Ion Torrent) for Target-seq. The Target-seq workflow can be applied to other NGS platforms, such as the Illumina platform. The first step of library preparation was to shear the gDNA from edited samples to small fragments within the proper size range for adaptor ligation and PCR amplification so that the final amplicon size could be sequenced by the specific sequencing platform. Usually, gDNA fragmentation was carried out using a sonicator (e.g., Covaris), from which the gDNA fragments require end repair for downstream adaptor ligation. This step limited the 96-well high-throughput library preparation. Although recent improvements came using transposon-based methods to facilitate gDNA fragmentation and adaptor ligation [38], it was hard to control gDNA fragment size and to avoid sequence-biased transposase adaptation. We used the IonShear enzyme mixture that can perform gDNA fragmentation in a 96-well format and offers better control over the desired gDNA fragment size range by controlling the incubation time or enzyme amount. The simplified whole workflow enabled us to perform Target-seq in a 96-well format. Although we used the short-read NGS platform, the Target-seq workflow can be applied to long-read sequencing platforms like Sanger sequencing or the Pacific-Bioscience NGS platform to detect medium-to-high frequency editing events.

    GIS of Cas9 donor delivered by lentivirus into iPSCs

    To detect donor DNA integration through lentivirus, we constructed a wild-type SpCas9 expression sequence into an integrase-defective lentivirus vector. The packaged virus was transduced into iPSCs. The stable Cas9 iPSC line was confirmed to express Cas9 protein by western blot. To identify where the Lenti-Cas9 was integrated and how many copies were in the stable Cas9 iPSC line, we performed Target-seq using the short-read Ion Torrent NGS platform. We also used another method, targeted locus amplification [34], to confirm the GIS localization and copy number detected by Target-seq. For Target-seq, the location of the designed target-specific primers is critical for the success of GIS identification using short-read NGS. Because the read length for short-read NGS platforms (Illumina or Ion Torrent) ranges between 100 and 400 bp, the primers must be designed at the region of the known vector sequence in the proximity of the expected GIS at 50–200 bp, so that the other half of the amplicon (of 100–400 bp) contains 50–200 bp of host genome sequence. The GIS-containing amplicon should have a hybrid sequence from both the lentiviral vector and the host genome. Early studies indicated that lentiviral integration in the host genome was through an LTR site [33,39], but the exact region of LTR to be integrated into the genome was not determined. Therefore, we designed three primers on both 5′ and 3′ LTRs toward the genome (Figure 2A) for Target-seq. Two major clustered GISs were detected by Target-seq from all three TS primers at the 3′ LTR (Figure 2B). No hybrid reads were detected from three TS reverse primers at the 5′ LTR. Only partial lentiviral vector sequence at the right segment of the LTR sites was integrated with the Cas9 into the host genome (Figure 2A). This indicated that lentivirus integration was likely driven through the LTR sequences. Furthermore, due to this partial integration of the right segment of the LTR sites and the resulting loss of the TS reverse primer binding sites, we did not obtain any hybrid reads from Target-seq specific to this locus (Figure 2A).

    Figure 2. Genome integration site identification of Cas9 donor integrated through lentivirus in iPSC (induced pluripotent stem cells).

    (A) Diagram of the Lenti-Cas9 cassette. Six TS primers (three from each side of the LTR site) toward the host genome were designed in the LTR region for Target-seq. Two primers (GF and GR) were designed near the GIS for validation of full-length Lenti-Cas9 cassette amplification. Two more primers were designed at Cas9 C-terminal (C9F) and N-terminal (C9R) for validation of 5′ (using GF and C9R) and 3′ (using C9F and GR) junction side of GIS. The blue dashed lines indicate the region of Lenti-Cas9 cassette integrated into host genome and the partial region to the right of the LTR detected in the integrated Lenti-Cas9 cassette. The red dashed line is the partial vector sequence located at the right side of the LTR and connect to the host (iPSC) genome indicated by the red arrow. (B) Output result of GIS: GIS ID, 30 bp iPSC genome sequences start (red arrow) at junction site between the vector and genome, chromosome position, gene annotation and normalized read number (RPM) from each TS primer. (C) Genotyping validation: full-length Lenti-Cas9 insert amplified by long-range PCR (left) and 5′ and 3′ junction PCR result (right). Red arrows indicate the PCR products with expected sizes indicated. The expected PCR size amplified by primers of GF and GR is ∼7.5 kb (full-length insert) from the insert knock-in allele and ∼0.1 kb from the non-insert allele. The expected PCR size at 5′ junction is ∼1.45 kb and at 3′ junction is ∼1.95 kb amplified by corresponding primers.

    GIS: Genome integration site; iPSC: Induced pluripotent stem cell; LTR: Long tandem repeat; RPM: Reads per million; TS: Target specific.

    For validation, we designed a pair of primers on the genome (GF and GR) flanking the GISs and performed long-range PCR to amplify the full-length Cas9 donor insert (∼7.5 kb). We expected to see two bands for the heterozygous knock-in editing: one small amplicon (150–200 bp) from the allele either unedited or edited with an indel, and one large amplicon (∼7.5 kb) from the allele housing the Cas9 donor insert. Two clear bands were observed for GIS-1 at Chr21:33558477 (Figure 2C), one small band at about 150 bp and one large band at about 7.5 kb. End-sequencing confirmed the 7.5 kb product was the Lenti-Cas9 donor DNA inserted in one allele. For GIS-2 (Chr22:38358563), a small PCR product was observed. However, we did not see a clear single large band as with GIS-1. Instead, it was a large-sized smear, which indicated that the Cas9 donor inserted into the GIS-2 locus might be different from that in GIS-1. To further verify GIS-2, we designed two extra primers at the N-terminal (C9R) and C-terminal (C9F) of Cas9 and performed junction PCR using GF and C9R primers for the 5′ junction and C9F and GR primers for the 3′ junction. Junction PCR showed the correct size of amplicon for both GIS-2 and GIS-1 (Figure 2C). Thus we confirmed that both GIS-1 and GIS-2 contained the Cas9 donor DNA. The reason for the inability to amplify a single large band of Cas9 donor insert for GIS-2 remains undetermined. It is possible that the Lenti-Cas9 cassette inserted at GIS-2 might have gone through unexpected recombination as a concatemer or insertional mutagenesis around GIS-2, which was possible due to the long and similar repeat sequence of the LTR at both 5′ and 3′ ends of the cassette [40]. If the Cas9 donor DNA inserted as multiple copies or insertional mutagenesis occurred around GIS-2, PCR could fail to amplify the long or mutated donor insert. Targeted locus amplification confirmed that both GIS-1 and GIS-2 contained the Lenti-Cas9 donor insert. A gene annotation search revealed that GIS-1 was located in the exon of SON coding for an RNA-binding protein and GIS-2 was located in the gene intronic region of a pseudogene, TPTEP2.

    Lentivirus can efficiently infect dividing and nondividing cells and integrate viral complementary DNA into the host genome. The ability of lentivirus to integrate into the host genome raises safety concerns for gene therapy. Early studies indicated that lentiviruses favor integration into active transcription units [40,41–43]. The two GISs in this study were in the exon or intron of active transcription units. The other concern for using lentivirus to deliver donor DNA is that the donor DNA integrates into the host genome through the LTR (∼600-bp) flanking at 5′ and 3′ of the DNA insert constructed in the plasmid. The sequences of 5′ and 3′ LTR are very similar, which could potentially create unwanted recombination and mutagenesis around the integration site [40]. The GIS-2 in this study could be a possible case of this scenario, given that the genotyping validation was obviously different from that of GIS-1.

    GIS of CAR donor delivered by nuclease-mediated AAV into T cells

    To detect possible donor DNA integration through AAV in a therapeutically relevant model, we constructed a CAR-expressing sequence in an AAV6 vector. The packaged viral donor inserts contained a CAR cassette and flanking 500-bp homology arms to the TRAC gene. We expected the AAV CAR donor to be delivered and integrated through homologous recombination at the target site of TRAC exon 1 through a nuclease (CRISPR and TALEN)-cleaved DSB that disrupted the TCR function and expressed CAR function using the endogenous TRAC promoter in T cells. The targeted site was first cleaved by either CRISPR/Cas9 or TALEN, followed by transduction of the AAV viral particle. Our goal was to use Target-seq to identify whether there are any OT viral integrations besides the targeted integration through nuclease-mediated homologous recombination at the TRAC homology arm site. We assumed that the targeted knock-in of the CAR cassette would take place through homologous recombination facilitated by the TRAC homology arms at the nuclease-mediated DSB site, while the random integration of AAV might occur through the viral ITR sites like lentivirus. We designed target-specific (TS) primers in the regions of ITRs and TRAC homology arms (Figure 3A). Target-seq detected a major integration cluster site (GIS-1) from TS primers within the TRAC homology arm from both CRISPR/Cas9- and TALEN-mediated samples. The location of the integration site spanned about 60 bp near the 5′ TRAC homology arm. The total read numbers (RPM) from CRISPR/Cas9- and TALEN-mediated samples were similar (Figure 3B). Target-seq also detected some reads located at different chromosome positions from the TS primer designed in the ITR region. However, the read numbers were low, and none of them were detected in both CRISPR/Cas9- and TALEN-mediated samples as for GIS-1.

    Figure 3. Genome integration site identification of chimeric antigen receptor donor delivered through CRISPR- or transcription activator-like effector nuclease-mediated adeno-associated virus into the targeted TRAC site in T cells.

    (A) Diagram of the AAV-CAR cassette with homologous arm from TRAC. Six TS primers (three from each site) toward host genome were designed in the ITR and HR regions for Target-seq. Two primers (GF and GR) were designed in the host genome sequence outside of 5′ HR and 3′ HR for genotyping validation of full-length CAR cassette insertion. CAR was delivered through two steps: cleaving and creating a double-strand break at the TRAC target site to knock out TRAC, then delivering CAR to the target site of TRAC through AAV. Dashed line indicates the potential integration sites. (B) Output result of GIS: GIS ID, 60-bp host sequences next to GIS-1 from TRAC, chromosome position, gene annotation and read number (RPM) from CRISPR or TALEN cleaved samples. Red dashed line indicates the location of GIS-1 cluster sequence next to the TRAC 5′ HR. (C) Genotyping validation for GIS-1: the large band (~3.5 kb) is the full-length CAR insert amplified by the primers of GF and GR from insert knock-in allele; small bands at about 1 kb are PCR products from non-CAR insert allele, which contained wild-type and edited indel.

    AAV: Adeno-associated virus; CAR: Chimeric antigen receptor; GIS: Genome integration site; HA: Homology arm; ITR: Inverted terminal repeat; RPM: Reads per million; TALEN: Transcription activator-like effector nuclease; TS: Target specific.

    For validation, we designed a pair of primers on the genome (GF and GR) around GIS-1 and other potential loci and performed PCR to amplify the full-length CAR donor insert, which is about 3.5 kb. Only GIS-1 showed the expected genotyping pattern (Figure 3C), typical of heterozygous knock-in patterns from both CRISPR/Cas9 and TALEN edited samples. One large band (∼3.5 kb) containing the full-length CAR insert was amplified from one allele, and small PCR products (∼1 kb) were amplified from the non-insert allele, being either wild-type or indels. The negative control sample showed a homogeneous single band. Sequencing of the large PCR product confirmed it was the CAR insert with partial 5′ and 3′ homology arms from TRAC. Genotyping validation for other loci could not detect any CAR insert besides the single band as shown in the negative control.

    Although the wild-type AAV could potentially integrate into the host cell genome, especially at a specific site (designated AAVS1) in human chromosome 19 [44–49], the modified recombinant AAV as gene therapy vector has eliminated this integrative capacity by removal of the rep and cap from the DNA of the vector [50]. Early studies have shown that random integration from some modified AAVs was detectable but occurs at a very low frequency [51]. Concerns over the random integration drove us to perform Target-seq on samples infected with AAV to deliver donor DNA of CAR into a targeted region of the TRAC gene in T cells to make CAR-T cells for cancer therapy. We obtained high (>80%) CAR knock-in efficiency in the pool of T cells and detected a major cluster of integration site (GIS-1) in the TRAC gene. Sequencing confirmed partial TRAC homology arms near the GIS, but no ITR sequence. This indicated that the AAV CAR donor was delivered into the GIS-1 site through a nuclease-mediated homology directed repair (HDR) mechanism rather than random integration. We did not detect any integration site in chromosome 19; several other candidates were detected, but we could not confirm them by using PCR genotyping. The failed genotyping could be due to the sequence change caused by unfaithful recombination where the validation primers were designed near the integration site at the identical sequence of ITR at both 5′ and 3′ ends of the AAV insert. It could also be possible that some nonspecific amplification during Target-seq library preparation created background noise. Without genotyping data, we could not confirm any GIS candidates at other genome loci; neither could we exclude random integration if it was at very low frequency, or it was integrated in the region where the primers were not designed for genotyping. A long-read platform with deep sequencing may help solve this issue.

    Genome-wide OT screen for evaluation of HiFi Cas9

    Target-seq was also applied to evaluate the genome-wide OT profile for three common HiFi Cas9 proteins. Two of them (Alt-R HiFi-Cas9 and TrueCut HiFi Cas9) are commercially available. One of them (Sniper) was published [35]. We first evaluated a guide RNA targeting HEK site 4 (HEK4), which has been well characterized and shown a high OT number in HEK293 cells [29,30,36]. The HEK4 gRNA was delivered with the various Cas9 variant proteins in ribonucleoprotein format. Target-seq detected 31 OTs for the wild-type SpCas9 (Figure 4). All three HiFi Cas9 proteins generated fewer OTs than the wild-type SpCas9. Among the three HiFi Cas9 proteins, TrueCut HiFi Cas9 generated significantly fewer OTs than Sniper and Alt-R HiFi-Cas9.

    Figure 4. Off-target reads per million detected by Target-seq in samples edited by three HiFi-Cas9s on HEK4 sgRNA in HEK293 cells.

    (.) is identical base to on-target; (-) is a deleted base; lower-case letters are inserted bases. The numbers in the last columns are reads per million for each target.

    PAM: Protospacer adjacent motif.

    As described above in the Target-seq workflow, we eliminated two steps – third-round PCR and biotinylated oligo/streptavidin bead purification – from the original TEG-seq workflow [36]. This simplified workflow enabled us to perform genome-wide OT screening in a high-throughput format. We screened 21 gRNAs targeting four therapeutically relevant genes (CD52, TRAC, TRBC and PD1) to evaluate the fidelity of three Cas9 proteins in T cells (Figure 5). Some of the gRNAs (TRAC-1, TRAC-4, TRBC-1 and PD1-1) were chosen because they have been used for CAR-T cancer therapy studies [19–21], from which fewer OTs were detected. Some of them (TRBC-4, PD1-4 and PD1-5) with high predicted OTs were chosen for facilitating OT detection and comparison on different HiFi Cas9s. In this study we compared two commercial HiFi Cas9s (Alt-R HiFi-Cas9 and TrueCut HiFi Cas9) with the wild-type SpCas9. To present the total OT number as well as the OT event probability rate (%) relative to the corresponding on-target rate, we calculated the off/on ratio for each OT. As expected, Target-seq detected many OTs from three gRNAs (TRBC-4, PD1-4 and PD1-5). The percentage ratios from some OTs were even higher than their corresponding on-targets in the samples edited with the wild-type Cas9 and Alt-R HiFi-Cas9. We noticed these gRNAs contained higher GC-rich sequences. Compared with the three Cas9 proteins, the wild-type SpCas9 generated the highest number of OTs for all four genes and 21 gRNAs, while the TrueCut HiFi Cas9 generated the lowest number of OTs. Although the Alt-R HiFi-Cas9 generated fewer OTs than the wild-type Cas9, it generated many more OTs than the TrueCut HiFi Cas9. For those gRNAs (TRAC-1, TRAC-4, TRBC-1 and PD1-1) commonly used for CAR-T cancer therapy studies, no OT was detected from TrueCut HiFi Cas9, while many OTs were detected from wild-type Cas9 and Alt-R HiFi-Cas9.

    Figure 5. Broad genome-wide off-target comparison between two HiFi Cas9 proteins (Alt-R® HiFi-Cas9 and TrueCut™ HiFi Cas9) and wild-type SpCas9.

    The 21 sgRNAs targeting four genes (CD52, TRAC, TRBC and PD1) were studied in T cells for off-target profiling using Target-seq. The sgRNA and double-strand DNA tag were co-transfected with Cas9 protein using Neon electroporation. The off/on ratio was calculated based on RPM from individual off-targets divided by the RPM from the corresponding on-target. Red dots represent on-target events, set as 100%. Gray dots represent the off-target events of their off/on ratio. The percentage off/on ratios indicated the off-target cleavage probability relative to their corresponding on-target. The gray dots above the red dots (100%) indicate that the RPM from the off-target were higher than from the on-target. The sgRNA name and ID number are marked under the gene name. RPM: Reads per million.

    Genome-wide OT detection at a large scale using a high-throughput format is still challenging. We performed Target-seq library preparation in a 96-well format for 21 gRNAs targeting four therapeutically relevant genes in T cells to compare the OT profiles for some commercially available Cas9 proteins. To facilitate library preparation in a 96-well format, we used a simplified workflow and demonstrated that Target-seq can detect low OT events at 0.01% relative to the corresponding on-target using the Ion GeneStudio S5 system. Although other more sensitive OT detection methods have been developed [52–54], they measure the OT activity in vitro, by nature of the fact that the gDNA substrates are removed from a cellular context and stripped of all proteins. These methods tend to be used to identify all possible on-target and OT cleavage sites for a particular gRNA. It is a similar scenario to OT prediction tools [22–24,37,55], from which most predicted OTs are not accessible by nucleases due to the chromatin structure or protected by the cellular context. A considerable discrepancy for OT profiling between in vitro and in cellulo or between in silico and in cellulo has been observed [29]. Thus OT detection and relative quantitation (ratio of off/on) in in cellulo assays like Target-seq is important for evaluation of the editing reagents (gRNA and Cas9) used for gene therapy. Using Target-seq, we demonstrated that TrueCut HiFi Cas9 generated far fewer OTs than wild-type SpCas9 and Alt-R HiFi-Cas9 on a broad panel of therapeutic targets, which could be a beneficial gene editing tool for therapeutic applications.

    DT detection in double gene knockout T cells

    DTs often occur in multiple editing events (e.g., DKO TRAC and CD52 in T cells to facilitate allogeneic, off-the-shelf immune-cell therapy) [26]. Using this as an example, we performed DKO of TRAC and CD52 using CRISPR/Cas9 ribonucleoprotein in T cells and utilized Target-seq to measure repair in cis (repaired on the same gene) and repair by translocation (Figure 6). Several primers (TS-TL, TS-TR, TS-CL and TS-CR) around the nuclease cut sites were designed for Target-seq (Figure 6A & Supplementary Table 5). The libraries generated from each primer were barcoded and sequenced. The total mapped reads and DT reads containing hybrid sequences from different chromosomal locations – for example, TRAC (chr14:23016459) and CD52 (chr1:26644549) – were generated as shown in Figure 6B. Most DT reads represented DTs resulting from the cross between the TRAC and CD52 loci. Other DT reads were detected which contained hybrid reads partially from TRAC or CD52 and partially from chromosomal loci other than TRAC or CD52, though the read numbers were very low compared with the background. For DKO samples, the highest number of DT reads between TRAC and CD52 were detected from the TL primer (5.92%), while the CL and CR primers generated 4.42% and 4.66% of DT reads, respectively. Interestingly, the TR primer showed far fewer DT reads (0.83%). We checked some hybrid sequences from CL and CR libraries and found that most of the reads showed a translocation with the TL-side sequence. In the negative control samples, we observed ∼0.1% background DT. For validation, we performed PCR using a combination of different primers to check the repair-in-cis and repair-by-translocation products on a gel. For repair in cis, we observed products in both DKO and negative control samples (Figure 6C). For DT between TRAC and CD52, only two pairs of cross primers (TL/CL and TL/CR) amplified a visible PCR product from DKO but not negative control samples (Figure 6C). Primer pairs including TR (TR/CL and TR/CR) did not amplify any visible PCR products, which was consistent with the low read number from the TR primer of Target-seq. To measure the relative percentage of DTs between TRAC and CD52 in the DKO sample, we performed multiplex PCR with all four primers in one reaction. The amplicon was then sequenced. The average read numbers for repair in cis and translocation were calculated and are shown in Figure 6D. The percentage translocation was 13.5% in DKO and 0.3% (background) in negative control samples.

    Figure 6. DNA translocation detection in double gene-knockout T cells.

    (A) Primers designed for TRAC and CD52 double knock-out (DKO) samples in T cells. The two sgRNAs targeting TRAC and CD52 were codelivered with Cas9 protein using electroporation. The genomic DNA samples were extracted and used for Target-seq and PCR validation. The graph illustrates the DNA repair result in DKO samples. (B) Target-seq result from four TS primers. Most DNA translocations were from TRACCD52 cross-reads, especially from primer TL, CL and CR (highlighted with gray) in the table. Other non-TRACCD52 crossing reads were detected, but at a level as low as the background from the NC sample. (C) Genotyping validation. PCR validation using the primers designed in (A) in DKO and NC samples showed visible products in DKO with repair in cis and NC samples (left panel). For repair with DT, only CL/TL and CR/TL primers amplified visible PCR products in DKO samples (right panel). (D) Relative quantitation of DNA translocation. The four primers designed from (A) were pooled in a single reaction to amplify DNA repair products including repair in cis and repair with DT between TRAC and CD52. Next-generation sequencing reads were aligned to the two wild-type (repair in cis) and four DT references. The percentage of repair in cis and DTs from the DKO and NC samples were calculated and plotted.

    CL: CD52-Left; CR; CD52-Right; DT: DNA translocation; NC: Negative control; TL: TRAC-Left; TS: TRAC-Right.

    Nuclease-induced DNA rearrangement at the target site is a concern for gene therapy. DTs are common and occur at a high frequency, especially when the genome in the cell is simultaneously cleaved at different loci. We detected different rates (0.86–5.92%) of DTs from different primers using Target-seq. This may relate to the sequence near the DSB, which affects the rate of local microhomologous recombination; the more microhomologous sequence between two cleavage ends, the more likely that translocation would occur. Although DT between two known target genes (e.g., TRAC and CD52) can be detected and semi-quantitated by qPCR with the primers across two cleavage sites [26], DT between a known target gene (e.g., TRAC or CD52) and an unknown sequence (e.g., natural DNA break or OT DSB) cannot be detected by qPCR. Single-primer based Target-seq could be used to detect this kind of DT.

    Conclusion

    Target-seq is a sequencing-based method that could be applied for genome-wide screening of GIS, OT events and DT. Depending on the sequencing platform and depth, the sensitivity of detection varies. The short-read deep sequencing platform mainly described in this study is sensitive and robust for OT and DT. The simplified workflow and 96-well high-throughput sample preparation format enables genome-wide OT profiling on a large panel of sgRNAs, targeting therapeutically relevant genes for comparison of different HiFi-Cas9 fidelities. The TrueCut HiFi Cas9 showed significantly higher fidelity than other HiFi-Cas9s. For viral-delivered samples, Target-seq detected the expected major GISs for both Lenti-Cas9 and AAV-CAR. It also detected some minor GIS hits, especially from AAV-CAR delivery. We tried to verify those minor hits by genotyping validation, but failed; this might be due to the nonspecific background of the method or unwanted viral recombination near the GIS flanking sequence, leading to loss of primer binding sites and a resulting failed amplification. Long-read deep sequencing may complement our method in the validation of minor GIS.

    Future perspective

    The genome-wide detection of GIS, OT and DNA rearrangements, including DTs and large deletions, is still challenging. The ideal method is direct sequencing of the whole host genome after the genome is edited. However, to detect low-frequency events, this requires high sequencing depth and whole-genome coverage, which is not currently cost-effective or efficient. Current methods rely on PCR amplification at the target sites using primers specific to the DNA tag added during cell transfection to the cells (e.g., for OT) or specific to a known-sequence region (e.g., LTR for GIS and the DSB site for DT) as described in this report. In this case, nonspecific amplification could occur, leading to some background in the output sequencing data for the candidate hits. This usually requires validation after discovering candidate hits to ensure that the GIS, OT and DT are real events. The short-read NGS mainly described in this report showed high sensitivity for detection, but it can still miss some events if the primer was not designed at the right place. Long-read deep NGS with higher accuracy would serve as a good alternative method for discovering GIS, OT and DNA rearrangement. Eventually, sequencing of the whole genome is the ideal way if more affordable and efficient tools become available.

    Executive summary

    Introduction

    • The detection of genome integration site (GIS), off-target (OT) events and DNA translocation (DT) are critical for the success of gene and cell therapies.

    • Target-seq is a simplified version of TEG-seq (Tag-Enriched GUIDE-seq) that enables high-throughput detection of GIS, OT and DT.

    Methods

    • A detailed workflow for sample preparation and analysis for each detection of GIS, OT and DT was described.

    • The method was optimized for 96-well format that allows for high-throughput detection.

    Results & discussion

    • The large-scale OT profiling comparison showed that the fidelity of TrueCut™ HiFi Cas9 was significantly higher than those of other HiFi Cas9s, which could be a benefit to gene and cell therapy.

    • Target-seq can be applied to detect on-target DTs with a relative quantitation for different translocation patterns.

    • Using nuclease-mediated targeted delivery of chimeric antigen receptor through adeno-associated virus to the TCR genomic region was highly efficient and specific.

    Conclusion

    • We demonstrated that Target-seq can be efficiently used for the detection of GIS, OT and DT in nuclease and viral-edited cells.

    Supplementary data

    To view the supplementary data that accompany this paper please visit the journal website at: www.future-science.com/doi/suppl/10.2144/btn-2023-0013

    Author contributions

    P Tang developed the method, design and performed most of the bench work, analyzed and generated data and drafted the manuscript. B Ding developed bioinformatic tools. C Reyes and D Papp performed some of the bench work. D Papp and J Potter edited the manuscript.

    Financial & competing interests disclosure

    All authors are currently employed by Thermo Fisher Scientific. P Tang was the inventor of the TEG-seq patent application mentioned in this paper and described in a previous paper (reference 36). The remaining authors declare no competing interests. The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.

    No writing assistance was utilized in the production of this manuscript.

    Open access

    This work is licensed under the Attribution-NonCommercial-NoDerivatives 4.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/

    Papers of special note have been highlighted as: • of interest

    References

    • 1. Yanez-Munoz RJ, Balaggan KS, MacNeil A et al. Effective gene therapy with nonintegrating lentiviral vectors. Nat. Med. 12(3), 348–353 (2006).
    • 2. Cartier N, Hacein-Bey-Abina S, Bartholomae CC et al. Hematopoietic stem cell gene therapy with a lentiviral vector in X-linked adrenoleukodystrophy. Science 326(5954), 818–823 (2009).
    • 3. Mátrai J, Cantore A, Bartholomae CC et al. Hepatocyte-targeted expression by integrase-defective lentiviral vectors induces antigen-specific tolerance in mice with low genotoxic risk. Hepatology 53(5), 1696–1707 (2011).
    • 4. Kaeppel C, Beattie SG, Fronza R et al. A largely random AAV integration profile after LPLD gene therapy. Nat. Med. 19(7), 889–891 (2013).
    • 5. Cogné B, Snyder R, Lindenbaum P et al. NGS library preparation may generate artifactual integration sites of AAV vectors. Nat. Med. 20(6), 577–578 (2014).
    • 6. Kaeppel C, Beattie SG, Fronza R et al. Reply to: NGS library preparation may generate artifactual integration sites of AAV vectors. Nat. Med. 20(6), 578–579 (2014).
    • 7. Kuzmin DA, Shutova MV, Johnston NR et al. The clinical landscape for AAV gene therapies. Nat. Rev. Drug Discov. 20(3), 173–174 (2021).
    • 8. Cockrell AS, Kafri T. Gene delivery by lentivirus vectors. Mol. Biotechnol. 36(3), 184–204 (2007).
    • 9. Kuehle J, Turan S, Cantz T et al. Modified lentiviral LTRs allow Flp recombinase-mediated cassette exchange and in vivo tracing of ‘factor-free’ induced pluripotent stem cells. Mol. Ther. 22(5), 919–928 (2014).
    • 10. Gene therapy at the crossroads. Nat. Biotechnol. 40(5), 621 (2022).
    • 11. Garcia Casado J, Janda J, Wei J et al. Lentivector immunization induces tumor antigen-specific B and T cell responses in vivo. Eur. Journal Immunol. 38(7), 1867–1876 (2008).
    • 12. Shi Q, Wilcox DA, Fahs SA et al. Lentivirus-mediated platelet-derived factor VIII gene therapy in murine haemophilia A. J. Thromb. Haemost. 5(2), 352–361 (2007).
    • 13. Khimani AH, Thirion C, Srivastava A. AAV vectors advance the frontiers of gene therapy. Genet. Eng. Biotechnol. News 42(1), 38–40 (2022).
    • 14. Xiao X, Li J, Samulski RJ. Production of high-titer recombinant adeno-associated virus vectors in the absence of helper adenovirus. J. Virol. 72(3), 2224–2232 (1998).
    • 15. Cong L, Ran FA, Cox D et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339(6121), 819–823 (2013).
    • 16. Mali P, Yang L, Esvelt KM et al. RNA-guided human genome engineering via Cas9. Science 339(6121), 823–826 (2013).
    • 17. Bogdanove AJ, Voytas DF. TAL effectors: customizable proteins for DNA targeting. Science 333(6051), 1843–1846 (2011).
    • 18. Sander JD, Cade L, Khayter C et al. Targeted gene disruption in somatic zebrafish cells using engineered TALENs. Nat. Biotechnol. 29(8), 697–698 (2011).
    • 19. Schumann K, Lin S, Boyer E et al. Generation of knock-in primary human T cells using Cas9 ribonucleoproteins. Proc. Natl Acad. Sci. USA 112(33), 10437–10442 (2015).
    • 20. Roth TL, Puig-Saus C, Yu R et al. Reprogramming human T cell function and specificity with non-viral genome targeting. Nature 559(7714), 405–409 (2018).
    • 21. Stadtmauer EA, Fraietta JA, Davis MM et al. CRISPR-engineered T cells in patients with refractory cancer. Science 367(6481), 1–20 (2020).
    • 22. Haeussler M, Schonig K, Eckert H et al. Evaluation of off-target and on-target scoring algorithms and integration into the guide RNA selection tool CRISPOR. Genome Biol. 17(1), 148 (2016).
    • 23. Bae S, Park J, Kim JS. Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases. Bioinformatics 30(10), 1473–1475 (2014).
    • 24. Grau J, Boch B, Posch S. TALENoffer: genome-wide TALEN off-target prediction. Bioinformatics 29(22), 2931–2932 (2013).
    • 25. Schwarze LI, Głów D, Sonntag T, Uhde A, Fehse B. Optimisation of a TALE nuclease targeting the HIV co-receptor CCR5 for clinical application. Gene Ther. 28(9), 588–601 (2021).
    • 26. Poirot L, Philip B, Schiffer-Mannioui C et al. Multiplex genome-edited T-cell manufacturing platform for ‘off-the-shelf’ adoptive T-cell immunotherapies. Cancer Res. 75(18), 3853–3864 (2015).
    • 27. Graham C, Jozwik A, Pepper A, Benjamin A. Allogeneic CAR-T cells: more than ease of access? Cells 7(10), 155 (2018).
    • 28. Sheridan C. Off-the-shelf, gene-edited CAR-T cells forge ahead, despite safety scare: race to the clinic reignites for an off-the-shelf alternative to autologous CAR-T cell therapy, even as concerns over chromosomal abnormalities linger. Nat. Biotechnol. 40(1), 5–8 (2022).
    • 29. Tsai SQ, Zheng Z, Nguyen NT et al. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nat. Biotechnol. 33(2), 187–197 (2015).
    • 30. Tsai SQ, Joung JK. Defining and improving the genome-wide specificities of CRISPR-Cas9 nucleases. Nat. Rev. Genet. 17(5), 300–312 (2016).
    • 31. Schmidt M, Hoffmann G, Wissler M et al. Detection and direct genomic sequencing of multiple rare unknown flanking DNA in highly complex samples. Hum. Gene Ther. 12(7), 743–749 (2001).
    • 32. Schmidt M, Schwarzwaelder K, Bartholomae C et al. High-resolution insertion-site analysis by linear amplification-mediated PCR (LAM-PCR). Nat. Methods 4(12), 1051–1057 (2007).
    • 33. Gabriel R, Kutschera I, Bartholomae CC, von Kalle C, Schmidt M. Linear amplification mediated PCR-localization of genetic elements and characterization of unknown flanking DNA. J. Vis. Exp. 102(88), e51543 (2014).
    • 34. de Vree PJ, de Wit E, Yilmaz M et al. Targeted sequencing by proximity ligation for comprehensive variant detection and local haplotyping. Nat. Biotechnol. 32(10), 1019–1027 (2014).
    • 35. Lee JK, Jeong E, Lee J et al. Directed evolution of CRISPR-Cas9 to increase its specificity. Nat. Commun. 9(1), 3048 (2018).
    • 36. Tang PZ, Ding B, Peng L, Mozhayskiy V, Potter J, Chesnut JD. TEG-seq: an ion torrent-adapted NGS workflow for in cellulo mapping of CRISPR specificity. BioTechniques 65(5), 259–267 (2018). • The original method developed for CRISPR/Cas9 off-target detection and used here as the basis for the Target-seq method.
    • 37. Concordet JP, Haeussler M. CRISPOR: intuitive guide selection for CRISPR/Cas genome editing experiments and screens. Nucleic Acids Res. 46(W1), W242–W245 (2018).
    • 38. Schmid-Burgk JL, Gao L, Li D et al. Highly parallel profiling of Cas9 variant specificity. Mol. Cell 78(4), 794–800 (2020).
    • 39. Schmidt M, Zickler P, Hoffmann G et al. Polyclonal long-term repopulating stem cell clones in a primate model. Blood 100(8), 2737–2743 (2002).
    • 40. Suleman S, Payne A, Bowden J et al. HIV- 1 lentivirus tethering to the genome is associated with transcription factor binding sites found in genes that favor virus survival. Gene Ther. 29, 720–729 (2022).
    • 41. Ciuffi A. Mechanisms governing lentivirus integration site selection. Curr. Gene Ther. 8(6), 419–429 (2008).
    • 42. Schroder AR, Shinn P, Chen H, Berry C, Ecker JR, Bushman F. HIV-1 integration in the human genome favors active genes and local hotspots. Cell 110(4), 521–529 (2002).
    • 43. Lewinski MK, Yamashita M, Emerman M et al. Retroviral DNA integration: viral and cellular determinants of target-site selection. PLoS Pathog. 2(6), e60 (2006).
    • 44. Kotin RM, Menninger JC, Ward DC, Berns KI. Mapping and direct visualization of a region-specific viral DNA integration site on chromosome 19q13-qter. Genomics 10(3), 831–834 (1991).
    • 45. Kotin RM, Siniscalco M, Samulski RJ et al. Sitespecific integration by adeno-associated virus. Proc. Natl Acad. Sci. USA 87(6), 2211–2215 (1990).
    • 46. Samulski RJ, Zhu X, Xiao X et al. Targeted integration of adeno-associated virus (AAV) into human chromosome 19. EMBO J. 10(12), 3941–3950 (1991).
    • 47. Surosky RT, Urabe M, Godwin SG et al. Adeno-associated virus Rep proteins target DNA sequences to a unique locus in the human genome. J. Virol. 71(10), 7951–7959 (1997).
    • 48. Daya S, Berns KI. Gene therapy using adeno-associated virus vectors. Clin. Microbiol. Rev. 21(4), 583–593 (2008).
    • 49. Berns KJ. The unusual properties of the AAV inverted terminal repeat. Hum. Gene Ther. 31(9–10), 518–523 (2020).
    • 50. Chirmule N, Propert K, Magosin S, Qian Y, Qian R, Wilson J. Immune responses to adenovirus and adeno-associated virus in humans. Gene Ther. 6(9), 1574–1583 (1999).
    • 51. Crosetto N, Mitra A, Silva MJ et al. Nucleotide-resolution DNA double-strand break mapping by next-generation sequencing. Nat. Methods 10(4), 361–368 (2013).
    • 52. Frock RL, Hu J, Meyers RM et al. Genomewide detection of DNA double-stranded breaks induced by engineered nucleases. Nat. Biotechnol. 33(2), 179–187 (2015).
    • 53. Tsai SQ, Nguyen NT, Malagon-Lopez J, Topkar VV, Aryee MJ, Joung JK. CIRCLE-seq: a highly sensitive in vitro screen for genome-wide CRISPR-Cas9 nuclease off-targets. Nat. Methods 14(6), 607–614 (2017).
    • 54. Montague TG, Cruz JM, Gagnon JA, Church GM, Valen E. CHOPCHOP: a CRISPR/Cas9 and TALEN web tool for genome editing. Nucleic Acids Res. 42(Web Server Issue), W401–W407 (2014).
    • 55. Anderson KR, Haeussler M, Watanabe C et al. CRISPR off-target analysis in genetically engineered rats and mice. Nat. Methods 15(7), 512–514 (2018).