Customizing CRISPR–Cas PAM specificity with protein language models

customizing-crispr–cas-pam-specificity-with-protein-language-models
Customizing CRISPR–Cas PAM specificity with protein language models
  • Collias, D. & Beisel, C. L. CRISPR technologies and the search for the PAM-free nuclease. Nat. Commun. 12, 555 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Karvelis, T. et al. Rapid characterization of CRISPR–Cas9 protospacer adjacent motif sequence elements. Genome Biol. 16, 253 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  • Gasiunas, G. et al. A catalogue of biochemically diverse CRISPR–Cas9 orthologs. Nat. Commun. 11, 5512 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Yan, W. X. et al. Functionally diverse type V CRISPR–Cas systems. Science 363, 88–91 (2019).

    Article  CAS  PubMed  Google Scholar 

  • Anzalone, A. V., Koblan, L. W. & Liu, D. R. Genome editing with CRISPR–Cas nucleases, base editors, transposases and prime editors. Nat. Biotechnol. 38, 824–844 (2020).

    Article  CAS  PubMed  Google Scholar 

  • Christie, K. A. et al. Towards personalised allele-specific CRISPR gene editing to treat autosomal dominant disorders. Sci. Rep. 7, 16174 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  • Nishimasu, H. et al. Engineered CRISPR–Cas9 nuclease with expanded targeting space. Science 361, 1259–1262 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Walton, R. T., Christie, K. A., Whittaker, M. N. & Kleinstiver, B. P. Unconstrained genome targeting with near-PAMless engineered CRISPR–Cas9 variants. Science 368, 290–296 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kleinstiver, B. P. et al. Engineered CRISPR–Cas12a variants with increased activities and improved targeting ranges for gene, epigenetic and base editing. Nat. Biotechnol. 37, 276–282 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kleinstiver, B. P. et al. Broadening the targeting range of Staphylococcus aureus CRISPR–Cas9 by modifying PAM recognition. Nat. Biotechnol. 33, 1293–1298 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kleinstiver, B. P. et al. Engineered CRISPR–Cas9 nucleases with altered PAM specificities. Nature 523, 481–485 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  • Hu, J. H. et al. Evolved Cas9 variants with broad PAM compatibility and high DNA specificity. Nature 556, 57–63 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Huang, T. P. et al. High-throughput continuous evolution of compact Cas9 variants targeting single-nucleotide-pyrimidine PAMs. Nat. Biotechnol. 41, 96–107 (2023).

    Article  CAS  PubMed  Google Scholar 

  • Miller, S. M. et al. Continuous evolution of SpCas9 variants compatible with non-G PAMs. Nat. Biotechnol. 38, 471–481 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Nijkamp, E., Ruffolo, J. A., Weinstein, E. N., Naik, N. & Madani, A. ProGen2: exploring the boundaries of protein language models. Cell Syst. 14, 968–978 (2023).

    Article  CAS  PubMed  Google Scholar 

  • Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).

    Article  CAS  PubMed  Google Scholar 

  • Ruffolo, J. A. et al. Design of highly functional genome editors by modelling CRISPR–Cas sequences. Nature 645, 518–525 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Meeske, A. J. & Marraffini, L. A. RNA guide complementarity prevents self-targeting in type VI CRISPR Systems. Mol. Cell 71, 791–801 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Marraffini, L. A. & Sontheimer, E. J. CRISPR interference: RNA-directed adaptive immunity in bacteria and archaea. Nat. Rev. Genet. 11, 181–190 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Camargo, A. P. et al. IMG/VR v4: an expanded database of uncultivated virus genomes within a framework of extensive functional, taxonomic, and ecological metadata. Nucleic Acids Res. 51, D733–D743 (2023).

    Article  CAS  PubMed  Google Scholar 

  • Camargo, A. P. et al. IMG/PR: a database of plasmids from genomes and metagenomes with rich annotations and metadata. Nucleic Acids Res. 52, D164–D173 (2024).

    Article  CAS  PubMed  Google Scholar 

  • Ciciani, M. et al. Automated identification of sequence-tailored Cas9 proteins using massive metagenomic data. Nat. Commun. 13, 6474 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Adler, B. A. et al. CasPEDIA Database: a functional classification system for class 2 CRISPR–Cas enzymes. Nucleic Acids Res. 52, D590–D596 (2024).

    Article  CAS  PubMed  Google Scholar 

  • Gleditzsch, D. et al. PAM identification by CRISPR–Cas effector complexes: diversified mechanisms and structures. RNA Biol. 16, 504–517 (2019).

    Article  PubMed  Google Scholar 

  • Anders, C., Niewoehner, O., Duerst, A. & Jinek, M. Structural basis of PAM-dependent target DNA recognition by the Cas9 endonuclease. Nature 513, 569–573 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Ruffolo, J. A. et al. Adapting protein language models for structure-conditioned design. Preprint at bioRxiv https://doi.org/10.1101/2024.08.03.606485 (2024).

  • Wei, J. et al. Closely related type II-C Cas9 orthologs recognize diverse PAMs. eLife 11, e77825 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Wimmer, F., Mougiakos, I., Englert, F. & Beisel, C. L. Rapid cell-free characterization of multi-subunit CRISPR effectors and transposons. Mol. Cell 82, 1210–1224 (2022).

    Article  CAS  PubMed  Google Scholar 

  • Sun, W. et al. Structures of Neisseria meningitidis Cas9 complexes in catalytically poised and anti-CRISPR-inhibited states. Mol. Cell 76, 938–952 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Huang, X. et al. Decoding CRISPR-Cas PAM recognition with UniDesign. Brief. Bioinform. 24, bbad133 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  • Hirano, S., Nishimasu, H., Ishitani, R. & Nureki, O. Structural basis for the altered PAM specificities of engineered CRISPR–Cas9. Mol. Cell 61, 886–894 (2016).

    Article  CAS  PubMed  Google Scholar 

  • Anders, C., Bargsten, K. & Jinek, M. Structural plasticity of PAM recognition by engineered variants of the RNA-guided endonuclease Cas9. Mol. Cell 61, 895–902 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Schmidheini, L. et al. Continuous directed evolution of a compact CjCas9 variant with broad PAM compatibility. Nat. Chem. Biol. 20, 333–343 (2024).

    Article  CAS  PubMed  Google Scholar 

  • Amrani, N. et al. NmeCas9 is an intrinsically high-fidelity genome-editing platform. Genome Biol. 19, 1–25 (2018).

    Article  Google Scholar 

  • Tsui, T. K. M., Hand, T. H., Duboy, E. C. & Li, H. The impact of DNA topology and guide length on target selection by a cytosine-specific Cas9. ACS Synth. Biol. 6, 1103–1113 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Luscombe, N. M., Laskowski, R. A. & Thornton, J. M. Amino acid-base interactions: a three-dimensional analysis of protein–DNA interactions at an atomic level. Nucleic Acids Res. 29, 2860–2874 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Walton, R. T., Hsu, J. Y., Joung, J. K. & Kleinstiver, B. P. Scalable characterization of the PAM requirements of CRISPR–Cas enzymes using HT-PAMDA. Nat. Protoc. 16, 1511–1547 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Grathwohl, W., Swersky, K., Hashemi, M., Duvenaud, D. & Maddison, C. Oops I took a gradient: scalable sampling for discrete distributions. In Proceedings of the 38th International Conference on Machine Learning (eds Meila, M. & Zhang, T.) (PMLR, 2021).

  • Li, W. & Godzik, A. CD-HIT: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).

    Article  CAS  PubMed  Google Scholar 

  • Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).

    Article  CAS  PubMed  Google Scholar 

  • van Dongen, S. Graph clustering via a discrete uncoupling process. SIAM J. Matrix. Anal. Appl. 30, 121–141 (2008).

    Article  Google Scholar 

  • Deorowicz, S., Debudaj-Grabysz, A. & Gudyś, A. FAMSA: fast and accurate multiple sequence alignment of huge protein families. Sci. Rep. 6, 33964 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Eddy, S. R. Accelerated profile HMM searches. PLoS Comput. Biol. 7, e1002195 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).

    Article  CAS  PubMed  Google Scholar 

  • Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2—approximately maximum-likelihood trees for large alignments. PLoS ONE 5, e9490 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  • Letunic, I. & Bork, P. Interactive Tree of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 49, W293–W296 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Liu, Z. et al. Versatile and efficient genome editing with Neisseria cinerea Cas9. Commun. Biol. 5, 1–7 (2022).

    Article  Google Scholar 

  • Hand, T. H., Das, A. & Li, H. Directed evolution studies of a thermophilic type II-C Cas9. Methods Enzymol. 616, 265–288 (2019).

    Article  CAS  PubMed  Google Scholar 

  • Hirano, H. et al. Structure and engineering of Francisella novicida Cas9. Cell 164, 950–961 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Cui, Z. et al. FrCas9 is a CRISPR/Cas9 system with high editing efficiency and fidelity. Nat. Commun. 13, 1425 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kim, E. et al. In vivo genome editing with a small Cas9 orthologue derived from Campylobacter jejuni. Nat. Commun. 8, 14500 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Hirano, S. et al. Structural basis for the promiscuous PAM recognition by Corynebacterium diphtheriae Cas9. Nat. Commun. 10, 1968 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  • Zetsche, B. et al. Cpf1 is a single RNA-guided endonuclease of a class 2 CRISPR–Cas system. Cell 163, 759–771 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zetsche, B., Abudayyeh, O. O., Gootenberg, J. S., Scott, D. A. & Zhang, F. A survey of genome editing activity for 16 Cas12a orthologs. Keio J. Med. 69, 59–65 (2020).

    Article  CAS  PubMed  Google Scholar 

  • Strecker, J. et al. Engineering of CRISPR–Cas12b for human genome editing. Nat. Commun. 10, 212 (2019).

  • Harrington, L. B. et al. A scoutRNA is required for some type V CRISPR–Cas systems. Mol. Cell 79, 416–424 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Burstein, D. et al. New CRISPR–Cas systems from uncultivated microbes. Nature 542, 237–241 (2017).

    Article  CAS  PubMed  Google Scholar 

  • Karvelis, T. et al. PAM recognition by miniature CRISPR–Cas12f nucleases triggers programmable double-stranded DNA target cleavage. Nucleic Acids Res. 48, 5016 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Wang, Y. et al. A highly specific CRISPR-Cas12j nuclease enables allele-specific genome editing. Sci. Adv. 9, eabo6405 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Strecker, J. et al. RNA-guided DNA insertion with CRISPR-associated transposases. Science 365, 48–53 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Urbaitis, T. et al. A new family of CRISPR-type V nucleases with C-rich PAM recognition. EMBO Rep. 23, e55481 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Wu, W. Y. et al. The miniature CRISPR–Cas12m effector binds DNA to block transcription. Mol. Cell 82, 4487–4502 (2022).

    Article  CAS  PubMed  Google Scholar 

  • Al-Shayeb, B. et al. Diverse virus-encoded CRISPR–Cas systems include streamlined genome editors. Cell 185, 4574–4586 (2022).

    Article  CAS  PubMed  Google Scholar 

  • Zhang, Y. et al. Catalytic-state structure and engineering of Streptococcus thermophilus Cas9. Nat. Catal. 3, 813–823 (2020).

    Article  CAS  Google Scholar 

  • Tran, M. H. et al. A more efficient CRISPR–Cas12a variant derived from MA2020. Mol. Ther. Nucleic Acids 24, 40–53 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Gao, L. et al. Engineered Cpf1 variants with altered PAM specificities. Nat. Biotechnol. 35, 789–792 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Russel, J., Pinilla-Redondo, R., Mayo-Mun˜oz, D., Shah, S. A. & Sørensen, S. J. CRISPRCasTyper: automated identification, annotation, and classification of CRISPR–Cas loci. CRISPR J. 3, 462–469 (2020).

    Article  CAS  PubMed  Google Scholar 

  • Chen, Z. & Zhao, H. A highly sensitive selection method for directed evolution of homing endonucleases. Nucleic Acids Res. 33, e154 (2005).

    Article  PubMed  PubMed Central  Google Scholar 

  • Rohland, N. & Reich, D. Cost-effective, high-throughput DNA sequencing libraries for multiplexed target capture. Genome Res. 22, 939–946 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Gooden, A. A., Evans, C. N., Sheets, T. P., Clapp, M. E. & Chari, R. dbGuide: a database of functionally validated guide RNAs for genome editing in human and mouse cells. Nucleic Acids Res. 49, D871–D876 (2020).

    Article  PubMed Central  Google Scholar 

  • Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  • Clement, K. et al. CRISPResso2 provides accurate and rapid genome editing sequence analysis. Nat. Biotechnol. 37, 224–226 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar