Archive

  • Visit JGI.DOE.GOV
News & Publications
Home › Publications › Extreme-Scale Many-against-Many Protein Similarity Search

Extreme-Scale Many-against-Many Protein Similarity Search

Published in:

00 , 1-12 ( 2022)

Author(s):

Selvitopi, Oguz, Ekanayake, Saliya, Guidi, Giulia, Awan, Muaaz G., Pavlopoulos, Georgios A., Azad, Ariful, Kyrpides, Nikos, Oliker, Leonid, Yelick, Katherine, Buluç, Aydin

DOI:

10.1109/sc41404.2022.00006

Abstract:

Similarity search is one of the most fundamental computations that are regularly performed on ever-increasing protein datasets. Scalability is of paramount importance for uncovering novel phenomena that occur at very large scales. We unleash the power of over 20,000 GPUs on the Summit system to perform all-vs-all protein similarity search on one of the largest publicly available datasets with 405 million proteins, in less than 3.5 hours, cutting the time-to-solution for many use cases from weeks. The variability of protein sequence lengths, as well as the sparsity of the space of pairwise comparisons, make this a challenging problem in distributed memory. Due to the need to construct and maintain a data structure holding indices to all other sequences, this application has a huge memory footprint that makes it hard to scale the problem sizes. We overcome this memory limitation by innovative matrix-based blocking techniques, without introducing additional load imbalance.

View Publication

Share this:

  • Click to share on Facebook (Opens in new window)
  • Click to share on LinkedIn (Opens in new window)
  • Click to share on Pinterest (Opens in new window)
  • Click to share on Twitter (Opens in new window)
  • Click to print (Opens in new window)
  • JGI.DOE.GOV
  • Disclaimer
  • Accessibility / Section 508
Lawrence Berkeley National Lab Biosciences Area
A project of the US Department of Energy, Office of Science

JGI is a DOE Office of Science User Facility managed by Lawrence Berkeley National Laboratory

© 1997-2025 The Regents of the University of California