MAG Catalogues

Author

Affiliation

Sandy Rogers

Published

October 5, 2023

MAG Catalogues as a resource

MAGs ¹ are an approach to deriving genome-resolved information from metagenomic datasets.

MGnify’s MAG Catalogues are biome-specific, clustered, annotated collections of MAGs. Biomes are selected on the grounds of data availability, community interest, and project objectives.

Practical 1: finding MAGs by taxonomy on the MGnify website

Search the MGnify website

Search the All genomes list for the genus Jonquetella

Question

In which catalogues is that genus found?

What do thise biomes have in common, and how does this align with the species found? ²

Now, we want to get a FASTA sequence for this genus.

Find the “best” MAG

Using what we’ve learned about QC on the course, look at the detail statistics of the Jonquetella MAGs. Which one is best? ³

Download the DNA sequence FASTA file of the “best” MAG

We will use it later.

Practical 2: query MGnify catalogues using sourmash

Sourmash is a tool to compare DNA sequences against each other. The MGnify Genomes resource uses the sourmash library to create sketches (hashes) of every genome in every catalogues. You can query this index using your own sequences (typically MAGs you have retrieved from elsewhere or assembled yourself).

Query the catalogues using the Jonquetella MAG

Use the MAG sequence FASTA file you earlier retrieved. ⁴

Question

In which catalogues is a match found for that query genome?

What use cases can you think of for this kind of cross-catalogue search? ⁵

Practical 3: query MGnify catalogues using sourmash, programmatically

The MGnify website is just a client of the MGnify API ⁶.

For this part of the practical, there is a Jupyter Notebook you can follow along and try to complete the code blocks.

To open it on your training VM:

Step

cd ~/mgnify-notebooks
git status
# make sure you're on the "comparative_practice_2023" branch
task edit-notebooks

After a few seconds, some URLs will be printed in the terminal. Open the last one (http://127.0.0.1:8888/lab?token=.....), by right-clicking on the URL and selecting “Open Link”, or by copying-and-pasting it into a web browser like Chromium/Firefox.

Find and open the ‘Search MGnify Genomes (course practical 2023)’ notebook in the ‘Python examples’ directory.

Follow along the steps (completing some code blocks) in the notebook.

Use the Jupyter Notebook after the course

This notebook is based on a publicly accessible version. You can use this at any time.

It is available to use from your web browser, no installation needed: notebooks.mgnify.org
You can see a completed version of it, with all the outputs, on docs.mgnify.org
You can use a prebuilt docker image and our public notebooks repository: github.com/ebi-metagenomics/notebooks. This should work on any computer you can install Docker on.
You can try and install all the dependencies yourself ¯\_(ツ)_/¯

Footnotes

Metagenome Assembled Genomes↩︎
Hint… what does anthropi in the species J. anthropi derive from?↩︎
Hint… each MAG’s detail page overview tab shows stats including completeness, contamination, and N₅₀.↩︎
If you got lost earlier, download it from MGYG000304175.fna ↩︎
There are interesting use cases for researchers (checking which environments a species is found in, checking whether a newly assembled genome is novel etc), as well as use cases for services like MGnify (cross-linking genomes between catalogues where those datasets are not clustered together).↩︎
Application Programming Interface↩︎

Reuse

Apache 2.0