Researchers at Arizona State University have developed new tools to advance microbiome research, making the identification and analysis of microbes more efficient and accurate. These developments address challenges in understanding microbial communities, which play important roles in human health and ecosystems.
The team, led by Zhu from the Biodesign Center for Fundamental and Applied Microbiomics and ASU's School of Life Sciences, published two studies detailing their work. The first study, appearing in Nature Communications, introduces TMarSel (Tree-based Marker Selection), a tool that automates the selection of marker genes used to build evolutionary trees of microbes. Traditional methods relied on a fixed set of marker genes, but with metagenomics now producing millions of genomes—often incomplete or uneven—TMarSel evaluates thousands of gene families to select those that create the most reliable evolutionary trees.
This approach helps researchers track disease evolution, monitor environmental changes affecting microbial communities, and improve studies on topics such as the gut microbiome’s impact on health.
The second study describes scikit-bio, an open-source software library published in Nature Methods. Scikit-bio provides over 500 functions tailored for analyzing large biological datasets typical in microbiome research. It is maintained by a community of more than 80 contributors and has been cited extensively across fields including medicine, ecology, climate science, and cancer biology.
"Scikit-bio gives scientists the tools they need to analyze huge biological datasets," said Zhu. "It is particularly useful for studying microbiomes - communities of microbes that live in a specific environment, such as the human gut."
Both TMarSel and scikit-bio aim to make large-scale biological research more reproducible and reliable as data sets continue to grow rapidly due to advances in DNA sequencing technology.
Zhu's contributions highlight ASU's focus on combining computational approaches with biology to produce resources used globally by scientists. As DNA sequencing becomes faster and cheaper, these tools are expected to help transform increasing amounts of genetic data into meaningful scientific knowledge.