3 Environmental Metagenomics

Environmental Metagenomics

Chapter taken from Hozzein, 2020 [1] available under the Creative Commons Attribution 3.0 License.

1. Introduction

Metagenomics can be defined as the techniques and procedures that are used for the culture-independent analysis of the total genomic content of microorganisms living in a certain environment [2]. It has many useful applications with very promising potential in both medical and environmental microbiology. The most common use of metagenomics in environmental microbiology is studying the diversity of microbial communities in particular environments through the analysis of rRNA genes and how these communities change in response to changes in physical and chemical properties of these environments [3].

Metagenomics also provides an opportunity to obtain and identify novel enzymes with industrial applications from extreme environments where unculturable extremophiles live. In such circumstances, functional metagenomics enables the isolation of genes coding for extremozymes, enzymes that are capable of being catalytically active in extreme conditions, or genes that will allow for better understanding of the mechanisms that make such organisms resistant to extreme environmental conditions [4].

Metagenomics has special importance when it comes to studying soil microbiology. It is estimated that the number of distinct microorganisms in 1 gram of soil exceeds the number of microbial species cultured so far [5]. Therefore, metagenomics seems to be the ideal culture-independent technique for unraveling the biodiversity of soil microorganisms and to study how this biodiversity is affected with continuously changing conditions.

2. Sequencing technologies and metagenomics

Recently, taxonomic profiling, characterization, and analysis of microbial communities are being mostly performed using different next-generation sequencing (NGS) platforms. Metagenomic samples are high-throughput, short-read sequences, and the cost is relatively decreasing. In addition, these platforms are advantageous, avoiding the need for cloning of DNA fragments [6].

Recent advances in NGS technologies were developed to suit various numbers of applications, cost, and capabilities [7]. The most commonly used platforms are the 454 Life Sciences (Roche) and Illumina systems (Solexa) [8]. The 454 sequencing technology, which was the first commercially available next-generation technology, is based on the pyrosequencing technique. It provides high throughput and relatively cheap analysis [9]. During the sequencing reaction in this technique, nucleotide incorporation into the growing chain is detected by the capture of the released pyrophosphate, which is converted into a light through an enzymatic reaction. Different nucleotides are sequentially added into each nucleotide incorporation event; therefore the light signal can be attributed to a specific nucleotide. Finally, the light signals are converted into sequence information. In the 454 pyrosequencer, the DNA fragments are amplified after being fixed on beads in a water-oil emulsion [10]. Pyrosequencing has been employed widely in the analysis of microbial diversity in many environments including marine environments [11] and different soil environments [1213].

Illumina sequencing technology relies on the use of fluorescently labeled reversible terminator nucleotides. Instead of being chemically modified to prevent further DNA synthesis (dideoxynucleotides) which is the case with Sanger sequencing, the terminator nucleotides are attached with blocking group that can be removed from the nitrogen base in a single step. DNA synthesis takes place on a chip where primers are attached. After each cycle, the dyes attached to each nucleotide are excited by laser followed by scanning of the incorporated bases. In order for the next synthesis cycle to proceed, the blocking group and the dye are first removed by a chemical reaction. Illumina sequencing platform was successfully used to study microbial diversity in many environments [141516].

In addition to the abovementioned technologies, recently developed sequencing technologies are available and being employed in metagenomic studies. These include SOLiD 5500 W Series developed by Applied Biosystems, single-molecule real-time (SMRT) DNA sequencing from Pacific Biosciences, and Ion Torrent semiconductor sequencing [8]. More innovative technologies are being developed that could be of great use for metagenomic studies in the near future. Strand sequencing technologies, currently being developed by Oxford Nanopore technologies, enable the sequencing of intact DNA strand that passes through a protein nanopore [17]. Irys Technology, developed by BioNano Genomics, represents one of the very promising new technologies in genomics era [8].

3. The metagenomic approaches

Metagenomics research strategy starts with selecting a proper ecological or biological environment of interest that hosts a wide variety of microbial communities which may have potential biotechnological applications. Environments that attract metagenomic researchers are mainly those characterized with extreme conditions or unique environmental conditions. These include environments with highly acidic or alkaline pH; high metal concentrations, pressures, or radiation; and high salinity or extreme temperatures [4].

Metagenomic analysis starts with isolating genomic DNA that represents the whole community in the soil sample, constructing a DNA library from the isolated DNA, and screening the available library for a target gene. It is important here to select the DNA extraction method that will provide enough yield and DNA that represents the diversity of the whole microbial community in the target environment. This is still one of the most challenging steps of metagenomic analysis. The chemical and physical characteristics of soils are very wide and complex, depending on the type of the soil examined, that will make it difficult to develop a reference method for DNA extraction from soils. Besides, soils contain many substances that are co-extracted with genomic DNA and harbor inhibitory effects on the downstream processing of the extracted DNA. Examples include humic and fulvic acids [18]. Therefore, optimization and comparison between different extraction methods are usually required for each type of soil [19202122].

A DNA library is then constructed from the genomic DNA isolated from the target environment. This is performed by fragmenting the isolated DNA into fragments with appropriate sizes that would allow for their cloning. This is performed by either using restriction enzyme digestion or mechanical shearing. DNA fragments obtained from such processes are cloned into the proper cloning vector. Plasmid vectors are used for small DNA fragments, and the libraries generated are called small-insert genomic libraries. Large inserts are cloning into cosmid or fosmid vectors which can hold inserts up to 40 kb in size or BAC vector which can carry inserts with sizes that exceed 40 Kb [23].

DNA libraries are usually constructed in a microorganism that is well-studied and is easy to manipulate inside the laboratory such as Escherichia coli. In case there is a need for expressing the genes within the DNA inserts in other microorganisms, shuttle vectors are used to transfer the libraries into a proper host [24].

Finally, a screening assay is applied to search for a gene of a particular function, and the gene product is functionally analyzed. There are two different metagenomic strategies that are commonly used in research. The first one is focused on the use of marker genes such as the ribosomal genes 16S rRNA [25] and 18S rRNA [26] to study the composition of the microbial community in a certain environment or specific protein-coding gene with medical or industrial importance [272829]. Such a strategy is called targeted metagenomics. The second approach is the shotgun metagenomics, in which a wide coverage of genomic DNA sequences is achieved using high-throughput next-generation sequencing to assess the entire taxonomic structure or functional potential of microbial communities [30].

The most challenging aspect of the screening process in metagenomics is the analysis of a huge amount of sequence data that are generated from the constructed library. A wide range of bioinformatic tools has been developed over the years to help analyze the metagenomic data and compare it to available online databases.


This work was funded by the Deanship of Scientific Research at Princess Nourah bint Abdulrahman University, through the Research Groups Program Grant no. (RGP-1438-0006).


  1. Wael N. Hozzein (March 25th 2020). Introductory Chapter: Metagenomics and Metagenomic Approaches, Metagenomics – Basics, Methods and Applications, Wael N. Hozzein, IntechOpen, DOI: 10.5772/intechopen.87949. Available from: https://www.intechopen.com/chapters/68040
  2. Daniel, R. (2005). The metagenomics of soil. Nature Reviews Microbiology, 3(6), 470–478. https://doi.org/10.1038/nrmicro1160
  3. Delmont, T. O., Robe, P., Cecillon, S., Clark, I. M., Constancias, F., Simonet, P., Hirsch, P. R., & Vogel, T. M. (2011). Accessing the Soil Metagenome for Studies of Microbial Diversity. Applied and Environmental Microbiology, 77(4), 1315–1324. https://doi.org/10.1128/AEM.01526-10
  4. Mirete, S., Morgante, V., & González-Pastor, J. E. (2016). Functional metagenomics of extreme environments. Current Opinion in Biotechnology, 38, 143–149. https://doi.org/10.1016/j.copbio.2016.01.017
  5. Torsvik, V., & Øvreås, L. (2002). Microbial diversity and function in soil: from genes to ecosystems. Current Opinion in Microbiology, 5(3), 240–245. https://doi.org/10.1016/S1369-5274(02)00324-7
  6. Ansorge, W. J. (2009). Next-generation DNA sequencing techniques. New Biotechnology, 25(4), 195–203. https://doi.org/10.1016/j.nbt.2008.12.009
  7. Scholz, M. B., Lo, C.-C., & Chain, P. S. G. (2012). Next generation sequencing and bioinformatic bottlenecks: the current state of metagenomic data analysis. Current Opinion in Biotechnology, 23(1), 9–15. https://doi.org/10.1016/j.copbio.2011.11.013
  8. Oulas, A., Pavloudi, C., Polymenakou, P., Pavlopoulos, G. A., Papanikolaou, N., Kotoulas, G., Arvanitidis, C., & Iliopoulos, loannis. (2015). Metagenomics: Tools and Insights for Analyzing Next-Generation Sequencing Data Derived from Biodiversity Studies. Bioinformatics and Biology Insights, 9, BBI.S12462. https://doi.org/10.4137/BBI.S12462
  9. Ronaghi, M. (2001). Pyrosequencing Sheds Light on DNA Sequencing. Genome Research, 11(1), 3–11. https://doi.org/10.1101/gr.150601
  10. Goodwin, S., McPherson, J. D., & McCombie, W. R. (2016). Coming of age: ten years of next-generation sequencing technologies. Nature Reviews Genetics, 17(6), 333–351. https://doi.org/10.1038/nrg.2016.49
  11. Egge, E., Bittner, L., Andersen, T., Audic, S., de Vargas, C., & Edvardsen, B. (2013). 454 Pyrosequencing to Describe Microbial Eukaryotic Community Composition, Diversity and Relative Abundance: A Test for Marine Haptophytes. PLOS ONE, 8(9), e74371-. https://doi.org/10.1371/journal.pone.0074371
  12. Wang, X., Hu, M., Xia, Y., Wen, X., & Ding, K. (2012). Pyrosequencing Analysis of Bacterial Diversity in 14 Wastewater Treatment Systems in China. Applied and Environmental Microbiology, 78(19), 7042–7047. https://doi.org/10.1128/AEM.01617-12
  13. Alex, A., & Antunes, A. (2015). Pyrosequencing Characterization of the Microbiota from Atlantic Intertidal Marine Sponges Reveals High Microbial Diversity and the Lack of Co-Occurrence Patterns. PLOS ONE, 10(5), e0127455-. https://doi.org/10.1371/journal.pone.0127455
  14. Fan, W., Huo, G., Li, X., Yang, L., Duan, C., Wang, T., & Chen, J. (2013). Diversity of the intestinal microbiota in different patterns of feeding infants by Illumina high-throughput sequencing. World Journal of Microbiology and Biotechnology, 29(12), 2365–2372. https://doi.org/10.1007/s11274-013-1404-3
  15. Lentini, V., Gugliandolo, C., Bunk, B., Overmann, J., & Maugeri, T. L. (2014). Diversity of Prokaryotic Community at a Shallow Marine Hydrothermal Site Elucidated by Illumina Sequencing Technology. Current Microbiology, 69(4), 457–466. https://doi.org/10.1007/s00284-014-0609-5
  16. Hong, C., Si, Y., Xing, Y., & Li, Y. (2015). Illumina MiSeq sequencing investigation on the contrasting soil bacterial community structures in different iron mining areas. Environmental Science and Pollution Research, 22(14), 10788–10799. https://doi.org/10.1007/s11356-015-4186-3
  17. Jain, M., Olsen, H. E., Paten, B., & Akeson, M. (2016). The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biology, 17(1), 239. https://doi.org/10.1186/s13059-016-1103-0
  18. Young, J. M., Rawlence, N. J., Weyrich, L. S., & Cooper, A. (2014). Limitations and recommendations for successful DNA extraction from forensic soil samples: A review. Science & Justice, 54(3), 238–244. https://doi.org/10.1016/j.scijus.2014.02.006
  19. Finley, S. J., Lorenco, N., Mulle, J., Robertson, B. K., & Javan, G. T. (2016). Assessment of microbial DNA extraction methods of cadaver soil samples for criminal investigations. Australian Journal of Forensic Sciences, 48(3), 265–272. https://doi.org/10.1080/00450618.2015.1063690
  20. Lim, N. Y. N., Roco, C. A., & Frostegård, Å. (2016). Transparent DNA/RNA Co-extraction Workflow Protocol Suitable for Inhibitor-Rich Environmental Samples That Focuses on Complete DNA Removal for Transcriptomic Analyses. Frontiers in Microbiology, 7, 1588. https://www.frontiersin.org/article/10.3389/fmicb.2016.01588
  21. Gupta, P., Manjula, A., Rajendhran, J., Gunasekaran, P., & Vakhlu, J. (2017). Comparison of Metagenomic DNA Extraction Methods for Soil Sediments of High Elevation Puga Hot Spring in Ladakh, India to Explore Bacterial Diversity. Geomicrobiology Journal, 34(4), 289–299. https://doi.org/10.1080/01490451.2015.1128995
  22. Mazziotti, M., Henry, S., Laval-Gilly, P., Bonnefoy, A., & Falla, J. (2018). Comparison of two bacterial DNA extraction methods from non-polluted and polluted soils. Folia Microbiologica, 63(1), 85–92. https://doi.org/10.1007/s12223-017-0530-y
  23. Simon C., Daniel R. (2010) Construction of Small-Insert and Large-Insert Metagenomic Libraries. In: Streit W., Daniel R. (eds) Metagenomics. Methods in Molecular Biology (Methods and Protocols), vol 668. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-60761-823-2_2
  24. Lam, K. N., Cheng, J., Engel, K., Neufeld, J. D., & Charles, T. C. (2015). Current and future resources for functional metagenomics. Frontiers in Microbiology, 6, 1196. https://www.frontiersin.org/article/10.3389/fmicb.2015.01196
  25. Païssé, S., Valle, C., Servant, F., Courtney, M., Burcelin, R., Amar, J., & Lelouvier, B. (2016). Comprehensive description of blood microbiome from healthy donors assessed by 16S targeted metagenomic sequencing. Transfusion, 56(5), 1138–1147. https://doi.org/https://doi.org/10.1111/trf.13477
  26. West, D. (2018). Use of an 18s rRNA metagenomics approach as a method of detection of multiple infections in field blood samples collected on FTA cards from cattle , MSc by research thesis, University of Salford.
  27. Nurdiani, D., Ito, M., Maruyama, T., Terahara, T., Mori, T., Ugawa, S., & Takeyama, H. (2015). Analysis of bacterial xylose isomerase gene diversity using gene-targeted metagenomics. Journal of Bioscience and Bioengineering, 120(2), 174–180. https://doi.org/10.1016/j.jbiosc.2014.12.022
  28. Ufarté, L., Laville, É., Duquesne, S., & Potocki-Veronese, G. (2015). Metagenomics for the discovery of pollutant degrading enzymes. Biotechnology Advances, 33(8), 1845–1854. https://doi.org/10.1016/j.biotechadv.2015.10.009
  29. Lanza, V. F., Baquero, F., Martínez, J. L., Ramos-Ruíz, R., González-Zorn, B., Andremont, A., Sánchez-Valenzuela, A., Ehrlich, S. D., Kennedy, S., Ruppé, E., van Schaik, W., Willems, R. J., de la Cruz, F., & Coque, T. M. (2018). In-depth resistome analysis by targeted metagenomics. Microbiome, 6(1), 11. https://doi.org/10.1186/s40168-017-0387-y
  30. Quince, C., Walker, A. W., Simpson, J. T., Loman, N. J., & Segata, N. (2017). Shotgun metagenomics, from sampling to analysis. Nature Biotechnology, 35(9), 833–844. https://doi.org/10.1038/nbt.3935



Icon for the Creative Commons Attribution 4.0 International License

Microbiomes: Health and the Environment by Dylan Parks is licensed under a Creative Commons Attribution 4.0 International License, except where otherwise noted.

Share This Book