MAG Catalogues
MAG Catalogues as a resource
MAGs 1 are an approach to deriving genome-resolved information from metagenomic datasets.
MGnify’s MAG Catalogues are biome-specific, clustered, annotated collections of MAGs. Biomes are selected on the grounds of data availability, community interest, and project objectives.
Practical 1: finding MAGs by taxonomy on the MGnify website
Now, we want to get a FASTA sequence for this genus.
Practical 2: query MGnify catalogues using sourmash
Sourmash is a tool to compare DNA sequences against each other. The MGnify Genomes resource uses the sourmash library to create sketches (hashes) of every genome in every catalogues. You can query this index using your own sequences (typically MAGs you have retrieved from elsewhere or assembled yourself).
Practical 3: query MGnify catalogues using sourmash, programmatically
The MGnify website is just a client of the MGnify API 6.
For this part of the practical, there is a Jupyter Notebook you can follow along and try to complete the code blocks.
To open it on your training VM:
Use the Jupyter Notebook after the course
This notebook is based on a publicly accessible version. You can use this at any time.
- It is available to use from your web browser, no installation needed: notebooks.mgnify.org
- You can see a completed version of it, with all the outputs, on docs.mgnify.org
- You can use a prebuilt docker image and our public
notebooks
repository: github.com/ebi-metagenomics/notebooks. This should work on any computer you can install Docker on. - You can try and install all the dependencies yourself
¯\_(ツ)_/¯
Footnotes
Metagenome Assembled Genomes↩︎
Hint… what does
anthropi
in the speciesJ. anthropi
derive from?↩︎Hint… each MAG’s detail page overview tab shows stats including completeness, contamination, and N50.↩︎
If you got lost earlier, download it from MGYG000304175.fna↩︎
There are interesting use cases for researchers (checking which environments a species is found in, checking whether a newly assembled genome is novel etc), as well as use cases for services like MGnify (cross-linking genomes between catalogues where those datasets are not clustered together).↩︎
Application Programming Interface↩︎