Archive

  • Visit JGI.DOE.GOV
News & Publications
Home › Publications › Searching for fat tails in CRISPR-Cas systems: Data analysis and mathematical modeling

Searching for fat tails in CRISPR-Cas systems: Data analysis and mathematical modeling

Published in:

Plos Computational Biology 17(3) , 21 (Mar 2021)

Author(s):

Pavlova, Y. S., Paez-Espino, D., Morozov, A. Y., Belalov, I. S.

DOI:

10.1371/journal.pcbi.1008841

Abstract:

Understanding CRISPR-Cas systems-the adaptive defence mechanism that about half of bacterial species and most of archaea use to neutralise viral attacks-is important for explaining the biodiversity observed in the microbial world as well as for editing animal and plant genomes effectively. The CRISPR-Cas system learns from previous viral infections and integrates small pieces from phage genomes called spacers into the microbial genome. The resulting library of spacers collected in CRISPR arrays is then compared with the DNA of potential invaders. One of the most intriguing and least well understood questions about CRISPR-Cas systems is the distribution of spacers across the microbial population. Here, using empirical data, we show that the global distribution of spacer numbers in CRISPR arrays across multiple biomes worldwide typically exhibits scale-invariant power law behaviour, and the standard deviation is greater than the sample mean. We develop a mathematical model of spacer loss and acquisition dynamics which fits observed data from almost four thousand metagenomes well. In analogy to the classical ‘rich-get-richer’ mechanism of power law emergence, the rate of spacer acquisition is proportional to the CRISPR array size, which allows a small proportion of CRISPRs within the population to possess a significant number of spacers. Our study provides an alternative explanation for the rarity of all-resistant super microbes in nature and why proliferation of phages can be highly successful despite the effectiveness of CRISPR-Cas systems. Author summary About half of bacterial species and most of archaea are equipped with CRISPR-Cas systems of adaptive immunity to protect them from their natural enemies-bacteriophages. The memory of CRISPR-Cas contains a catalogue of the fingerprints of previously experienced offenders which is passed down to the bacterial progeny. The microbial resistance to viruses largely depends on the number of records in this CRISPR array. Our analysis combining metagenomics data and mathematical modelling shows that the size of CRISPR arrays in microbial populations generally follows a power law distribution. Power law distributions have been found in many other complex systems (earthquakes, financial markets, animal movement). We argue that our model explains the presence of a power law in CRISPR arrays and the rareness of all-resistant super microbes.

View Publication

Share this:

  • Click to share on Facebook (Opens in new window)
  • Click to share on LinkedIn (Opens in new window)
  • Click to share on Pinterest (Opens in new window)
  • Click to share on Twitter (Opens in new window)
  • Click to print (Opens in new window)
  • JGI.DOE.GOV
  • Disclaimer
  • Accessibility / Section 508
Lawrence Berkeley National Lab Biosciences Area
A project of the US Department of Energy, Office of Science

JGI is a DOE Office of Science User Facility managed by Lawrence Berkeley National Laboratory

© 1997-2025 The Regents of the University of California