Citation
Genome-enabled resolution of archaeal diversity in methane-cycling wetlands

Material Information

Title:
Genome-enabled resolution of archaeal diversity in methane-cycling wetlands
Creator:
Narrowe, Adrienne Beth
Place of Publication:
Denver, CO
Publisher:
University of Colorado Denver
Publication Date:
Language:
English

Thesis/Dissertation Information

Degree:
Doctorate ( Doctor of philosophy)
Degree Grantor:
University of Colorado Denver
Degree Divisions:
Department of Integrative Biology, CU Denver
Degree Disciplines:
Integrative and systems biology
Committee Chair:
Greene, Michael J.
Committee Members:
Miller, Christopher S.
Kechris, Katerina
Lozupone, Catherine
Mosier, Annika C.
Roane, Timberley M.

Notes

Abstract:
As the only known organisms that can produce methane, archaea are key players in the global methane cycle. Despite their critical role, archaea are often understudied compared to bacteria. With recent advances in DNA sequencing, we are discovering entire archaeal phyla, new members within known archaeal groups, and unexpected metabolic potential within the archaeal domain. Temperate freshwater wetlands are among the environments needing additional study of the archaeal community. Such mid-latitude, naturally occurring freshwater wetlands are estimated to be large contributors to global methane cycling, but the composition and function of their microbial communities are not well incorporated in global models of methane emissions. Here, using multiple methods of high-throughput DNA sequencing, we describe the microbial community structure and functional potential in Old Woman Creek (OH, USA), a model freshwater wetland, finding genomic, metabolic, and habitat diversity within and among archaeal groups that would not have been predicted based on previous knowledge. First, we developed a new domain-specific sequencing protocol, producing a more highly resolved archaeal community profile than was achievable with existing methods. This deep view of the archaeal community demonstrated that highly diverse assemblages of methanecycling and non methane-cycling archaea are present across the wetland; with habitat distributions that suggest environmentally defined niches. Next, using metagenomic sequencing to reconstruct archaeal genomes from the wetland soils, we identified multiple phylogenetically conserved evolutionary anomalies in the complement of fundamental information processing genes among the recently described Asgard archaea. Finally, further genomic reconstructions identified metabolic features among the Bathyarchaeota, which correspond to their intra-population habitat specificity we identified in the initial community profile. Metabolic reconstructions highlight broad potential for Bathyarchaeota to play key roles in facilitating methane production by providing substrate to the methanogenic archaea, and, surprisingly, members of this group may themselves contribute directly to methanecycling in this system. These findings offer additional insight into the evolution, diversity, and function of multiple archaeal groups within a model freshwater wetland, and provide the foundation for further study to better understand factors controlling methane cycling.
General Note:
Embargo ended 11/29/2018

Record Information

Source Institution:
University of Colorado Denver
Holding Location:
Auraria Library
Rights Management:
Copyright Adrienne Beth Narrowe. Permission granted to University of Colorado Denver to digitize and display this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.

Downloads

This item has the following downloads:


Full Text
GENOME-ENABLED RESOLUTION OF ARCHAEAL DIVERSITY IN
METHANE-CYCLING WETLANDS by
ADRIENNE BETH NARROWE B.A., Northwestern University, 1991 B.S., Wright State University, 2009 M.S., University of Colorado Denver, 2012
A dissertation submitted to the Faculty of the Graduate School of the University of Colorado in partial fulfillment of the requirements for the degree of Doctor of Philosophy Integrative and Systems Biology Program
2018


This dissertation for the Doctor of Philosophy degree by Adrienne Beth Narrowe has been approved for the Integrative and Systems Biology Program by
Michael J. Greene, Chair Christopher S. Miller, Advisor Katerina Kechris Catherine Lozupone Annika C. Mosier Timberley M. Roane
Date: May 12, 2018
11


Narrowe, Adrienne Beth (Ph.D., Integrative and Systems Biology Program) Genome-Enabled Resolution Of Archaeal Diversity In Methane-Cycling Wetlands Dissertation directed by Assistant Professor Christopher S. Miller
ABSTRACT
As the only known organisms that can produce methane, archaea are key players in the global methane cycle. Despite their critical role, archaea are often understudied compared to bacteria. With recent advances in DNA sequencing, we are discovering entire archaeal phyla, new members within known archaeal groups, and unexpected metabolic potential within the archaeal domain. Temperate freshwater wetlands are among the environments needing additional study of the archaeal community. Such mid-latitude, naturally occurring freshwater wetlands are estimated to be large contributors to global methane cycling, but the composition and function of their microbial communities are not well incorporated in global models of methane emissions.
Here, using multiple methods of high-throughput DNA sequencing, we describe the microbial community structure and functional potential in Old Woman Creek (OH, USA), a model freshwater wetland, finding genomic, metabolic, and habitat diversity within and among archaeal groups that would not have been predicted based on previous knowledge. First, we developed a new domain-specific sequencing protocol, producing a more highly resolved archaeal community profile than was achievable with existing methods. This deep view of the archaeal community demonstrated that highly diverse assemblages of methanecycling and non methane-cycling archaea are present across the wetland; with habitat distributions that suggest environmentally defined niches. Next, using metagenomic sequencing to reconstruct archaeal genomes from the wetland soils, we identified multiple
in


phylogenetically conserved evolutionary anomalies in the complement of fundamental information processing genes among the recently described Asgard archaea. Finally, further genomic reconstructions identified metabolic features among the Bathyarchaeota, which correspond to their intra-population habitat specificity we identified in the initial community profile. Metabolic reconstructions highlight broad potential for Bathyarchaeota to play key roles in facilitating methane production by providing substrate to the methanogenic archaea, and, surprisingly, members of this group may themselves contribute directly to methanecycling in this system. These findings offer additional insight into the evolution, diversity, and function of multiple archaeal groups within a model freshwater wetland, and provide the foundation for further study to better understand factors controlling methane cycling.
The form and content of this abstract are approved. I recommend its publication.
Approved: Christopher S. Miller
IV


TABLE OF CONTENTS
CHAPTER
I. INTRODUCTION.........................................1
II. HIGH-RESOLUTION SEQUENCING REVEALS UNEXPLORED ARCHAEAL
DIVERSITY IN FRESHWATER WETLAND SOILS...............13
III. COMPLEX EVOLUTIONARY HISTORY OF TRANSLATION ELONGATION
FACTOR 2 AND DIPHTHAMIDE BIOSYNTHESIS IN ARCHAEA AND PARABASALIDS........................................50
IV. BATHYARCHAEOTA POPULATIONS IN WETLAND SOILS CONTAIN
PREVIOUSLY UNKNOWN METABOLIC COMPLEXITY.............77
V. CONCLUSIONS AND FUTURE DIRECTIONS..................106
REFERENCES..............................................109
APPENDIX
A. Supplementary Material For Chapter II..................................133
B. Supplementary Material For Chapter III.................................141
C. Supplementary Material For Chapter IV..................................154
v


CHAPTERI
INTRODUCTION
Marker-gene sequencing-based exploration of microbial diversity
Understanding the composition of a microbial community in the environment is a first step to developing hypotheses regarding the community's role in biogeochemical cycling and the controls on that activity. Currently, this can be best achieved through the use of environmental DNA sequencing. Sequencing-based approaches to environmental microbiology, from marker-gene sequencing to metagenomics, have in recent years greatly improved our view of bacterial and archaeal diversity in terrestrial, aquatic, and host associated systems, and vast swaths of the tree of life are just now being discovered (Pace 1997; Ley et al. 2006; Wrighton et al. 2012; Brown et al. 2015; Castelle et al. 2015; Probst and Moissl-Eichinger 2015; Spang et al. 2015; Anantharaman et al. 2016; Hug, Baker, et al. 2016; Seitz et al. 2016; Zaremba-Niedzwiedzka et al. 2017).
The finding that the sequence of the small ribosomal subunit 16S rRNA gene could be used as a molecular phylogenetic marker to describe the relationships among microorganisms ushered in a new way of understanding the diversity of prokaryotic life on earth, and this gene was used to define the Archaea as a domain of life distinct from the Bacteria (Woese and Fox 1977). Polymerase chain reaction (PCR)-based approaches followed, which allowed for the amplification and subsequent sequencing of these gene sequences directly from environmental DNA samples (Lane et al. 1985; Pace 1997). It became apparent that the number of cultured and culturable microorganisms was a minute fraction of the microbial diversity found in any system (Staley and Konopka 1985). As improvements in DNA sequencing technology balanced massive increases in throughput with
1


decreases in read length, accuracy, and cost, it became possible to sequence multiple samples concurrently, and at higher sequencing depth per sample (Caporaso et al. 2011; Caporaso et al. 2012). Greater sequencing depth makes lower abundance sequences more likely to be detected, and the concurrent increases in sample number opened the door to the adaptation of classical ecological metrics to microbial community analyses (Lozupone and Knight 2007; Caporaso et al. 2010; Lozupone et al. 2011; Martiny et al. 2011; McMurdie and Holmes 2014). With these increases in scale, the known diversity in a multitude of environments exploded (Schloss and Handelsman 2004; Lozupone and Knight 2007; Huttenhower et al. 2012; Schloss et al. 2016), and the 16S rRNA public database SILVA (release 132) now contains almost 700,000 unique high quality sequences (Quast et al. 2013).
While freed from the biases associated with laboratory culturing, a new set of biases applies to 16S rRNA-based amplicon studies (Engelbrektson et al. 2010; Degnan and Ochman 2012; Klindworth et al. 2013; Karst et al. 2018). The 16S rRNA is a critical component of the small ribosomal subunit, and as such contains conserved structural elements, which is reflected in regions of especially conserved primary sequence (Woese et al. 1975). The PCR amplification of 16S rRNA gene sequences relies on the use of these highly conserved sequences as priming sites for 'universal' primer sequences (Lane et al. 1985), while the intervening, less-conserved sequence is used for phylogenetic analysis. In principle, the highly conserved sequences would be truly universally conserved across archaea and bacteria (both of which contain variants of the 16S rRNA gene.) However, these conserved regions are not 100% conserved, and even small variations can impact PCR primer binding and efficiency, rendering certain 16S rRNA genes either invisible or underrepresented in amplicon studies, skewing community composition and abundance
2


estimates (Baker, Smith, and Cowan 2003; Frank et al. 2008; Teske and Sorensen 2008; Miller et al. 2011; Degnan and Ochman 2012; Pinto and Raskin 2012; Klindworth et al.
2013; Karst et al. 2018). As 'universal' PCR primer design is based on sequence databases, this bias itself becomes amplified as primer design reflects that which has already been identified (Klindworth et al. 2013). In addition to amplification bias, accurate relative abundance estimates are difficult to estimate using the 16S rRNA gene, because it is not present as a single copy in many bacterial and in some archaeal genomes (Case et al. 2007; Kembel et al. 2012; Sun et al. 2013; Stoddard et al. 2015). In particular, the bacteria typically contain many more copies of the 16S rRNA gene than the archaea, so in environments where the bacteria may already outnumber the archaea (Bates et al. 2011), archaeal 16S rRNA genes may be difficult to detect in amplicon-based 16S rRNA marker gene studies. Despite these limitations, the broad application of 16S rRNA amplicon sequencing to study the distribution of marker gene sequences over time and within and across environments has greatly expanded our view of microbial diversity, microbial community assembly, and correlation of taxa with environmental factors.
Metagenomics for discovery of functional and phylogenetic diversity
16S rRNA amplicon sequencing data cannot describe microbial community function, which is the ultimate goal in understanding the interactions of microorganisms with each other and the environment. Thorough characterization of the metabolic role of a microorganism traditionally entails biochemical assays performed on pure culture, coupled with sequencing and annotation of the organism's isolate genome (Brauer et al. 2011). However, while most environmental microorganisms remain uncultured, their function can be inferred from genomic analyses. As compared to amplicon sequencing, which targets a
3


single gene, or shotgun genome sequencing of a single genome (Venter et al. 2001), shotgun metagenomic sequencing is the random fragmentation and sequencing of the total DNA from an environment (Tyson et al. 2004; Venter et al. 2004). Analysis of shotgun metagenomic data can be gene-centric, inferring the metabolic capacity of a system by cataloging the encoded genes in the entire community (Tringe et al. 2005), or genome-centric, where individual or population genomes are reconstructed (Tyson et al. 2004) and metabolisms described on the organism level rather than on the community level. In both cases, metagenomic assembly and homology-based gene annotations make possible the inference of unculturable microbial function (Tyson et al. 2004; Baker et al. 2010; Hess et al. 2011; Iverson et al. 2012; Wrighton et al. 2012).
In addition to metabolic insight, shotgun metagenomics also offered additional, sometimes surprising phylogenetic insights. Use of 16S rRNA sequencing based studies had revealed large sections of the microbial world that were known only from that sequencing approach. Then as these metagenomic assembly and binning methods improved, the limitations of amplicon-based surveys of microbial diversity became more apparent and it became clear that there are entire sections of the tree of life that were also hidden from view and which we now know only from shotgun metagenomic sequencing (Hug, Baker, et al. 2016). The recent discoveries of entire bacterial and archaeal phyla and superphyla remind us how much may yet remain to be discovered (Wrighton et al. 2012; Brown et al. 2015; Castelle et al. 2015; Spang et al. 2015; Eloe-Fadrosh et al. 2016; Seitz et al. 2016; Zaremba-Niedzwiedzka et al. 2017). That this phylogenetic diversity also encompasses additional functional diversity becomes more apparent with each new discovery (Wrighton et al. 2012; Kantor et al. 2013; Evans et al. 2015; Liu et al. 2018).
4


Large-scale functional inference; linking phylogeny and function
Many advances have been made using genome-centric metagenomics, particularly by providing insight into the microbial metabolisms relevant in biogeochemical cycling (Wrighton et al. 2012; Castelle et al. 2013; Baker et al. 2015; Anantharaman et al. 2016; Daly et al. 2016; Hug, Thomas, et al. 2016; Danczak et al. 2017). While the use of metagenomics to resolve unculturable genomes is expanding rapidly and being applied to an increasing number of environments, prokaryotic phylogenetic and metabolic diversity remains vastly undersampled at the genomic level, and for certain taxa, most of the described genomes arise from very few environments (Schulz et al. 2017). This scarcity of complete, environmentally-relevant genomes, from either cultured strains or from metagenomic reconstruction, means that function for any given detected taxa might be inferred based on relationships to an encompassing phylogenetic clade with a genomic sampling of size N=l. Making matters worse, available genomes are often derived from very different environments and at varying levels of phylogenetic relatedness to genomes in an environment of interest. For nearly all systems, it is unknown across what phylogenetic and environmental scales metabolic potential is conserved among related microorganisms.
Many marker-gene surveys of environments assume that the 16S rRNA marker gene and implied phylogeny is a reliable proxy for the full genomic (and thus functional) content of an organism. But the phenotypic variation among organisms that translates to environmental differentiation is not always captured by phylogenetic measures (Martiny, Treseder, and Pusch 2013; Barberan et al. 2014; Martiny et al. 2015; McLaren and Callahan 2018). Even for complex traits that are among the most phylogenetically conserved, such as methanogenesis (Martiny, Treseder, and Pusch 2013), phylogenetically varying patterns of
5


gene loss and retention have resulted in the recent discovery of a methanogenic order within an otherwise non-methanogenic cluster (Paul et al. 2012; Borrel et al. 2013) and methanogens identified outside the Euryarchaeota (Evans et al. 2015; Vanwonterghem et al. 2016), previously thought to be the only archaeal phylum to contain methanogens. Now, as more genomes become available, it becomes possible to look at the evolutionary relationship among phylogeny and traits. Eising methanogenesis as an example, it has been proposed on the basis of multiple archaeal genomes that the last archaeal common ancestor (LACA) was likely a methanogen (Borrel, Adam, and Gribaldo 2016; Sorokin et al. 2017; Spang and Ettema 2017; Adam, Borrel, and Gribaldo 2018). While many archaea harbor genes characteristic of methanogens, most archaeal taxa are not methanogenic, and the distribution of methanogens across the archaeal tree is much patchier than previously thought (Borrel, Adam, and Gribaldo 2016; Chistoserdova 2016; Adam, Borrel, and Gribaldo 2018). With a complex, multi-gene trait such as methanogenesis found to be variable at multiple phylogenetic levels, it is clear that inference of traits based on phylogenic relatedness, while convenient, is risky and subject to many qualifications.
Even for apparent species-level similarity, seemingly identical organisms may harbor unexpected genomic and metabolic differences (Chase et al. 2017; McLaren and Callahan 2018). Evidence suggests both that nearly identical organisms as assessed by 16S rRNA sequence similarity can harbor environmentally relevant differences in metabolic potential when compared at the genome level, and conversely that a larger 16S rRNA nucleotide sequence difference can envelop multiple genotypes which are ecologically indistinguishable and may represent a population genomic continuum (Shapiro and Polz 2014; Chase et al. 2017; McLaren and Callahan 2018). Such environmentally-relevant functional
6


'microdiversity' among closely related organisms has been explored in marine systems documenting multiple co-existing ecotypes of bacterioplankton including Prochlorococcus and Pelagibacterales (SARI 1) (Wilhelm et al. 2007; Kashtan et al. 2014) and has been similarly shown for nitrite-oxidizing Nitrospira (Gruber-Dorninger et al. 2015) within a built environment, for methanogens in estuarine sediments (Youngblut et al. 2015) and for terrestrial bacteria (Chase et al. 2017).
In the face of such phylogenetic, functional and ecological diversity it is clear that additional sampling is needed targeting a wide range of environments and taxa to better constrain the functional range associated to any phylogenetic grouping. Systematic attempts to address this broad undersampling have begun with efforts specifically targeting such 'microbial dark matter' using single-cell sequencing techniques (Ishoey et al. 2008; Wu et al. 2009; Rinke et al. 2013), and continue as recovery of genomes from metagenomes becomes more tractable and commonplace. Recently, metagenomic assembly and binning of more than 1,500 publicly available shotgun metagenomic sequencing datasets resulted in the deposition of over 8,000 new near-complete metagenome assembled genomes (MAGs) into public databases (Parks et al. 2017); but while MAGs were assembled, the corresponding metabolic capacity was not described. Other recent studies have produced tens to thousands of MAGs from environments such as estuarine sediments (Baker et al. 2015), aquifers (Anantharaman et al. 2016), cow rumen (Stewart et al. 2018), the sub seafloor (Jungbluth, Amend, and Rappe 2017), and ocean waters (Tully, Graham, and Heidelberg 2018), but many environments remain undersampled (including freshwater wetlands, which are the focus of the present study).
7


This project: Wetland archaeal community diversity
With this project, we address this sampling gap by characterizing archaeal diversity in a model methane-emitting freshwater wetland at the community, genome, and gene levels. Freshwater wetlands are a key source of atmospheric methane (Bastviken et al. 2011), and microorganisms, especially the archaea, are a critical component of the freshwater wetlands methane cycle, both producing and oxidizing this potent greenhouse gas (Nazaries et al. 2013). Thus, modeling how environmental factors affect the activity of methane-cycling microbial communities, and subsequently impact net methane emissions, needs to take into account both variability in microbial community composition and the variability of microbial metabolic capacity within that community. While environmental methane cycling has long been explored (Jannasch 1975), novel methanogenic and methanotrophic taxa continue to be described (Raghoebarsing et al. 2006; Ettwig et al. 2009; Borrel et al. 2013; Mondav et al. 2014; Evans et al. 2015; Lang et al. 2015; Vanwonterghem et al. 2016; Sorokin et al. 2017), and additional metabolic capabilities identified, such as the direct coupling of anaerobic methane oxidation to the reduction of sulfate, nitrate and metals (Beal, House, and Orphan 2009; Ettwig et al. 2010; Haroon et al. 2013; Egger et al. 2014; Arshad et al. 2015; Timmers et al. 2015; Ettwig et al. 2016). However, much still remains to be learned regarding the microorganisms engaged in these processes, particularly in situ, including their distribution, habitat preferences, and their potential impact on biogeochemical cycling in specific habitats.
Old Woman Creek (Huron, OH, EISA) is a freshwater estuarine wetland adjacent to Lake Erie. As a research station in the National Oceanic and Atmospheric Administration's (NOAA) National Estuarine Research Reserve System (NERRS), Old Woman Creek (OWC) has been studied for decades (Klarer and Millie 1992; Mitsch and Reeder 1992; Chin et al.
8


1998; Herdendorf, Klarer, and Herdendorf 2006; Bernal 2008); however, a comprehensive microbial census has not been performed. It is known that OWC is methane emitting (Nahlik and Mitsch 2010), but the identity and substrate utilization capacity of the methanogens present is not known. And while methanogens are directly responsible for the production of methane, the competing or cooperative members of the surrounding community are not known, nor are the ultimate biotic and abiotic controls on methane emissions from this wetland. An approach to understanding function in this wetland is to generate a full microbial community profile and a corresponding metabolic/genomic profile that is specific to this ecosystem. In addition to addressing specific questions about methane cycling, this in-depth profiling can be used to ask if the archaeal metabolisms in the wetland agree with predictions that would have been made based on phylogenetic relatedness to genomes sampled from other environments. With the paired community-level census and metagenome-assembled genomes, we examine genomic variation within specific archaeal populations, and also link this variation to observed ecological differentiation within these populations. How do the microbial community as a whole and the individual populations vary with site characteristics, both phylogenetically and functionally? Does observed phylogenetic variability correspond to functional variability? Do the genomically-inferred archaeal metabolisms in the wetland agree with predictions made based only on phylogenetic relatedness to genomes sampled from other sites in the wetland?
First, as discussed in Chapter II, we developed and employed a new approach for the targeted deep sequencing of archaeal and bacterial 16S rRNA genes and linked the resulting microbial community profiles to geochemical measures. A sampling strategy spanning hydrological and soil depth gradients enabled description of the spatial distribution of
9


community members. We found that the archaea and the bacteria are arrayed along well-defined gradients corresponding to soil depth, and geochemical measures. With this novel sequencing and bioinformatics strategy, we were able to describe the archaea at a level of resolution not typically achieved with amplicon sequencing, and we detected an unexpected level of archaeal diversity in the wetland. Multiple methanogenic taxa were identified, including newly described non-Euryarchaeotal phyla, and vast numbers of presumably non-methanogenic archaea were also identified: these organisms' role within the wetland remains enigmatic. These archaea displayed specific distributions across the wetland gradients, which suggested specific, environmentally defined, habitats. Remarkably, due to the resolution achieved with our sequencing approach, we were able to discern these habitat preferences at multiple taxonomic-levels, even to the level of individual OTUs (operational taxonomic unit; used as a proxy measurement for microbial species). As the differentiated distributions across the wetland were associated with specific geochemical measures, we hypothesized that the genomic content of the particular taxa associated with a particular site, and hence geochemical regime, would be selected by that habitat, and would differ from the content encoded in the genomes of even closely related taxa with different apparent habitat preferences.
To address this question, we selected soil samples for metagenomic sequencing where archaeal taxa of interest were predicted to be abundant based on the amplicon census. These samples also represented two ecosites with sharply differing geochemistry and archaeal community profiles, allowing us to begin to explore intra-population variation associated with site conditions. Metagenomic assembly indicated the presence of 3 Candidatus Thorarchaeota genomes present across 2 samples. The Thorarchaeota candidate phylum falls
10


within the recently described Asgard superphylum, and members of this superphylum are currently thought to be the closest extant relatives of the ancestral eukaryotic host cell (Zaremba-Niedzwiedzka et al. 2017). Comparative analysis of these Ca. Thorarchaeota genomes in the context of other archaeal and eukaryotic genomes uncovered anomalies in the complement of core information processing genes. These findings are discussed in Chapter III and include both losses and gains of genes associated with the elongation stage of protein synthesis. The apparent loss of diphthamide synthesis genes, otherwise universally present across archaea and eukaryotes, and the gain of a second copy of translation elongation factor 2, a single-copy gene with no known paralogs in the Archaea, raises multiple questions regarding the evolution and conservation of key components of protein synthesis.
Finally, in Chapter IV, we discuss the genomically-encoded metabolic versatility found within the Bathyarchaeota in the OWC wetland. The Bathyarchaeota were among the most abundant and numerous archaea that we identified in the initial amplicon census of the wetland. Within this phylum, multiple subgroups have been delineated, and broad habitat preferences described (Fillol, Auguet, et al. 2015; Lazar et al. 2015; Xiang et al. 2017). The frequent detection of Bathyarchaeota (previously known as Miscellaneous Crenarchaeota Group, MCG) from other methane-active sites (Biddle et al. 2006; Kubo et al. 2012; Lloyd et al. 2013; Evans et al. 2015) as well as this wetland led to an initial hypothesis that the Bathyarchaeota were integral members of methanogenic consortia, potentially supplying the methanogens with their required metabolic substrates. More recently, Bathyarchaeota have been found to be capable of acetogenesis, and even methanogenesis (Evans et al. 2015; He et al. 2016; Lazar et al. 2016). Our findings in the amplicon study that particular Bathyarchaeota subgroups are associated with specific sites within the wetland are reinforced
11


with metagenomic analysis of these samples. We recovered multiple partial genome bins from several Bathyarchaeota subgroups, which we treat as subgroup-specific population genomes for pathway analysis. These population genomes have distributions that were predicted based on their presence/absence and abundance in the amplicon study. We find that the capacity for acetogenesis is broadly conserved across these newly described Bathyarchaeota subgroups, while differing capacity for carbon assimilation within these population genomes suggests links to the geochemistry at the sites from which they were recovered. Interestingly, we also recovered Bathyarchaeotal mcrABG genes from several metagenomic assemblies. These are the hallmark genes for methanogenesis, and suggest that the methane production capacity within the OWC wetland may be more complex than initially thought.
12


CHAPTER II
HIGH-RESOLUTION SEQUENCING REVEALS UNEXPLORED ARCHAEAL DIVERSITY IN FRESHWATER WETLAND SOILS1 Originality-Significance Statement
Freshwater wetlands represent significant sources of atmospheric methane, yet we know surprisingly little about the microbial communities in these terrestrial systems. We use a novel domain-specific rRNA amplicon sequencing approach in combination with extensive sampling across a freshwater wetland to provide the first high-resolution view of both the archaeal and bacterial communities in these soils. Our methodology is especially powerful when used to explore the low abundance archaeal taxa with high phylogenetic resolution. Many of these taxa have previously been shown to play critical roles in carbon cycling in soils and sediments. We uncover new phylogenetic diversity in both methane-cycling and underexplored non-methane-cycling archaeal populations. These populations show habitat preferences that are structured at multiple phylogenetic levels, and which suggest complex interactions governing methane cycling.
Summary
Despite being key contributors to biogeochemical processes, archaea are frequently outnumbered by bacteria, and consequently are underrepresented in combined molecular surveys. Here, we demonstrate an approach to concurrently survey the archaea alongside the bacteria with high-resolution 16S rRNA gene sequencing, linking these community data to
1 Portions of this chapter were previously published in: Narrowe, A.B., Angle, J.C., Daly, R.A., Stefanik, K.C., Wrighton, K.C., and Miller, C.S. (2017) High-resolution sequencing reveals unexplored archaeal diversity in freshwater wetland soils. Environ. Microbiol. 19: 2192-2209, and are included with the permission of the copyright holder. Author contributions are listed at the end of the chapter.
13


geochemical parameters. We applied this integrated analysis to hydric soils sampled across a model methane-emitting freshwater wetland. Geochemical profiles, archaeal communities, and bacterial communities were independently correlated with soil depth and water cover. Centimeters of soil depth and corresponding geochemical shifts consistently affected microbial community structure more than hundreds of meters of lateral distance. Methanogens with diverse metabolisms were detected across the wetland, but displayed surprising OTU-level partitioning by depth. Candidatus Methanoperedens spp. archaea thought to perform anaerobic oxidation of methane linked to iron reduction were abundant. Domain-specific sequencing also revealed unexpectedly diverse non-methane-cycling archaeal members. OTUs within the underexplored Woesearchaeota and Bathyarchaeota were prevalent across the wetland, with subgroups and individual OTUs exhibiting distinct occupancy and abundance distributions aligned with environmental gradients. This study adds to our understanding of ecological range for key archaeal taxa in a model freshwater wetland, and links these taxa and individual OTUs to hypotheses about processes governing biogeochemical cycling.
Introduction
Temperate freshwater wetlands are an important source of biogenic atmospheric methane (Bastviken et al. 2011; Bridgham et al. 2013; Kirschke et al. 2013). As a group, these habitats are estimated to account for up to 40% of all methane emissions (Denman et al. 2007), yet can also function as carbon sinks. However, the diverse classification of habitats labeled as freshwater wetlands reflects wide variation in hydrology, plant cover, soil type, nutrient content, and pH (Federal Geographic Data Committee 2013), and includes both natural and constructed wetlands. Microbial communities mediate carbon cycling in
14


freshwater wetlands, but certain of these habitats are better studied microbiologically (Borrel et al. 2011; Bridgham et al. 2013); particularly rice fields (Conrad 2002; Conrad et al. 2012; Lee et al. 2014), lake and river sediments (Borrel et al. 2012; J. Wang et al. 2012; Bodelier et al. 2013), tidal estuaries (John Parkes et al. 2012; Webster et al. 2015), peatlands (Juottonen et al. 2012; Preston et al. 2012; Sun et al. 2012; Hawkins, Johnson, and Brauer 2014), and constructed wetlands (Nahlik and Mitsch 2010; Samso and Garcia 2013; Arroyo, Saenz de Miera, and Ansola 2015; He et al. 2015).
Although some similarity in key functional guilds may be common across wetland types, fundamental geochemical differences drive differences in microbial community structure, and thus inferred metabolic capabilities. Soil pH and redox, which vary by wetland type and within a single wetland, are two such controllers on methanogenesis, and are explicitly incorporated as key parameters in predicting methane production in biogeochemical models (Meng et al. 2012). However, geochemistry can also control methane cycling via indirect interactions with non-methanogenic members of the community, and these interactions can be wetland-type-specific. For example, in acidic Sphagnum-&ors\mdLtQ& peatlands, oxygen-dependent breakdown of phenolics by non-methanogenic community members leads to a cascade of increased decomposition and ultimately provides the substrates facilitating methanogenesis (Fenner and Freeman 2011). Yet these complex interactions are probably not relevant outside of peatlands, due to less polyphenolic load linked to differing aboveground vegetation. Thus, a full characterization of microbial community structure and membership across the full range of freshwater wetland types is necessary to improve models of carbon cycling in these ecosystems (Riley et al. 2011).
15


Archaea are expected to be a critical component of the microbial community in freshwater wetlands, in part because all known methanogens and most anaerobic methanotrophs are archaea. Most biogenic methane production in wetlands is thought to be driven by Euryarchaeota performing acetoclastic or hydrogenotropic methanogenesis, although much of that methane can be immediately consumed by bacterial and archaeal methane oxidizers (Segarra et al. 2015; Cai et al. 2016). Archaeal communities (and thus controls on methane emissions) vary by wetland type. Comparatively less is known about non-methane cycling archaea in freshwater wetlands. Many microbial diversity surveys have potentially missed archaeal populations present by exclusively targeting methane cycling microbes using functional marker genes for archaeal methanogenesis or bacterial methane oxidation (e.g. mcrA,pmoA) (Bourne, McDonald, and Murrell 2001; Luton et al. 2002).
There is a need to more extensively phylogenetically sample, as well as characterize the habitat preferences of, non-methane cycling archaea in freshwater wetlands, to develop hypotheses about the roles these organisms play in carbon cycling.
Archaea often constitute only a small fraction of the total microbial community in soils and sediments (Bates et al. 2011; Borrel et al. 2012; Webster et al. 2015), despite their large contributions to ecosystem-wide biogeochemical processes. Unfortunately, many 16S rRNA -gene-based studies employ PCR primers that simultaneously amplify both bacterial and archaeal 16S rRNA genes (Caporaso et al. 2012), under-sampling diversity of the less-abundant archaeal fraction (Y. Wang et al. 2012; Klindworth et al. 2013). To more deeply sample the archaeal members, primers specific to this domain have been developed (Baker, Smith, and Cowan 2003; Teske and Sorensen 2008; Klindworth et al. 2013), but often produce longer amplicons which have largely limited their use to lower-throughput
16


sequencing methods such as Sanger sequencing (Gantner et al. 2011) and pyrosequencing (Lee et al. 2015; Webster et al. 2015).
An approach that incorporates archaeal domain specificity and exploits higher-throughput short-read sequencing would both increase access to low-abundance archaeal community members and allow for increased sampling and replication. One solution is to assemble longer rRNA gene amplicons from short-read shotgun sequencing libraries with the EMIRGE algorithm (Miller et al. 2011; Miller et al. 2013; Ong et al. 2013). EMIRGE uses a large database of candidate rRNA genes to perform a templated assembly of rRNA reads from a sample. In an iterative process, mapped reads are used to correct the candidate rRNA genes to reflect gene sequences found in the sample, and mappings to corrected candidate genes are then used to probabilistically assign each read to candidate genes for relative abundance estimates. Because longer amplicons can be assembled, this method permits domain-specific primer selection independent of sequencing read lengths, exploits the higher throughput afforded by short-read sequencing, and allows for more extensive sampling and replication within a single study. Domain-specific primers which exclusively target the V3-V6 region of the archaeal or bacterial 16S rRNA gene have been described, and increase phylogenetic resolution when compared to shorter amplicons (Gantner et al. 2011; Ong et al. 2013; Singer etal. 2016).
In this study, we investigate the full extent of archaeal diversity within hydric soils of a model temperate, circumneutral freshwater wetland. Soils were sampled from Old Woman Creek (OWC) National Estuarine Research Reserve, a naturally occurring palustrine freshwater emergent wetland adjacent to Lake Erie, Ohio, USA. This wetland has well-characterized macroecology, hydrology, and geochemistry (Klarer and Millie 1992; Mitsch
17


and Reeder 1992; Chin et al. 1998; Herdendorf, Klarer, and Herdendorf 2006; Bernal 2008), with mean annual methane emissions up to 82 g CH4-C m-2. (Nahlik and Mitsch 2010). By applying a novel high-throughput, domain-specific V3-V6 16S rRNA sequencing approach, we asked how relative abundances of common and rare archaeal taxa co-vary with bacterial community structure and soil geochemistry across multiple phylogenetic and spatial scales Experimental Procedures Experimental Design and sample collection
Old Woman Creek National Estuary Research Reserve is a 573-acre freshwater wetland on the southern edge of Lake Erie near Huron, Ohio. Soil cores studied here were collected in October 2013 across two transects (labeled Transect 2, Transect 3). Three ~lm2 sites were sampled from each transect, spanning hydrologic gradients from i) seasonally exposed mud-flat (mud, M), to ii) a submerged mud-flat which drained during the 24-hour collection period due to storm erosion of the barrier beach (mud transition, MT), to iii) permanently flooded sediments covered by 24" of open water (open water covered, O; Figure 2.1). Samples from a third transect (Transect 1) were collected but were not analyzed here. Plot location was marked with a Mobile Mapper 100 GPS unit. Cores were collected to a depth of approximately 35 cm using a modified Mooring System soil corer (3” diameter Cellulose Acetate Butyrate core liner), and stored on ice for transport. Four soil cores were collected from each of sites M2, M3, and 03, while 3 cores each were collected from sites 02, MT2, and MT3. Soil was extruded from core liners, and each core was sectioned into 4 samples by depth as measured from the core surface: 0-5cm (Dl), 6-12 cm (D2), 13-23 cm (D3), and 24-35 cm (D4), yielding a total of 84 samples. Samples were transferred to sterile Whirl-pak bags and homogenized by hand. 30 g of soil from each homogenized sample was
18


removed and stored at 4°C for geochemical analysis, while the remaining soil was stored at -20°C for microbial analysis.
Geochemical measurements and analyses
For 76 of the 84 samples pH, nitrate, nitrite, total soil carbon, acetate, Fe (II), sulfate and phosphate were measured, although phosphate was below detection in all samples (Table SI). Cores MT2-core 3 and MT3-core 3 were not processed for geochemical data (8 samples). Fe (II) concentrations were measured using absorbance spectrophotometry at 510 nm using a Hach FerroVer Iron Reagent (Loveland, CO). Ion chromatography (Dionex ICS-2100 Ion Chromatography System with an AS 18 column) was used to determine concentrations of acetate, nitrite, nitrate, sulfate, and phosphate. For measures below detection, the value was set at half the detection limit for analyses (Table S2.1). 5 g of soil was added to 5 ml DI water in a 15 ml falcon tube (1:1 v/v) and vortexed to create a soil slurry. Soil slurry pH was measured with Accumet AB150 pH/mV meter, and passed through a 0.2 um filter. The filtered liquid was stored in 2 ml microcentrifuge tubes at -20°C until analysis. Soil samples were dried at 65°C for 24 hours in aluminum tins, ground to a powder using a mortar and pestle, and were stored in a dark, room temperature cabinet until analysis. Total, organic, and inorganic carbon were measured using a Shimadzu-5000A Solid Sample Combustion Unit. Total carbon samples were combusted at 900°C, while inorganic carbon samples were saturated in a 1:2 mix of 85% H3PO4 to Milli-Q water and then combusted at 200°C. Organic carbon was determined by subtracting the inorganic carbon content from the total carbon content of the sample. Because TIC was below detection in many samples or such a small component of total carbon (mean 5%), here we report total soil carbon, which approximates TOC values. Salinity data were obtained from
19


the NERRS Centralized Data Management Office. (http://cdmo.baruch.sc.edu/get/landing.cfm)
DNA extraction, amplification and 16S rRNA gene amplicon sequencing
For all 84 samples, total genomic DNA was extracted using the MoBio PowerSoil DNA Isolation Kit (Carlsbad, CA) from 0.4 g soil and quantified using Invitrogen Qubit dsDNA HS Assay (Life Technologies, Waltham, MA). The V3-V6 region of the 16S rRNA gene was PCR amplified and sequenced twice for each sample, separately targeting the bacterial and archaeal domains. Bacterial 16S rRNA gene amplification used the primers F338 and 1061R (Ong et al. 2013). Archaeal 16S rRNA gene amplification used the primers F349, and 1041R (Gantner et al. 2011; Klindworth et al. 2013). Full primer names, primer sequences and reaction conditions are shown in Table S2.5. Triplicate reactions per sample were pooled using Zymo Clean and Concentrator-5 (Irvine, CA). The V3-V6 amplicons (approx. 635bp for Archaea and 723 bp for Bacteria) were fragmented using the Nextera XT shotgun metagenomic library preparation kit (Illumina, San Diego, CA) to produce a multiplexed sequencing library. To increase insert size, the tagmentation reagents (TD buffer, ATM) were reduced to 90% of specified volumes and 25ul AMPure XP beads (Beckman Coulter, Indianapolis, IN) were used for the library cleanup step. Libraries for each domain were prepared and sequenced separately using 2x150 bp paired end reads on the Illumina MiSeq at the University of Colorado Anschutz Medical Campus Genomics and Microarray Core.
EMIRGE reconstruction of 16S rRNA gene amplicons
Following sequencing, reads were preprocessed using SeqPrep (https://github.com/jstjohn/SeqPrep) to remove adapter sequences and to merge overlapping
20


reads (-m 0.3 -n 0.7 -o 12 -Z 100000 -N 1 -A CTGTCTCTTA -B CTGTCTCTTA). Reads were quality trimmed using sickle (vl.33; -1 100 -q 2) (Joshi and Fass 2011). Merged reads were split at their midpoint in silico and added to non-overlapping reads for downstream analysis. Amplicons were reconstructed for each sample and for each domain using EMIRGE (Miller et al. 2011; Miller et al. 2013). EMIRGE candidate 16S rRNA databases (separate Bacterial and Archaeal) were produced using the Silva SSU Ref NR99 database (release 119) (Quast et al. 2013), trimmed to the expected V3-V6 amplicons (Werner et al. 2012) with PrimerProspector (Walters et al. 2011), using default parameters. To remove sequences likely to contain errors, artificial duplications, or chimeras, the databases were further length-filtered (610 bp < archaeal lengths < 650 bp, 2.1% removed; 675 < bacterial lengths < 775 bp, 0.8% removed). Candidate databases were sorted by length then clustered at 97% identity with usearch (-cluster_smallmem) version 5.2.236 (Edgar 2010), resulting in final candidate databases of 121 114 Bacterial and 2 550 Archaeal sequences. EMIRGE was parameterized to perform 80 iterations, to merge 100% identical sequences (-j 1.0), and to map all reads regardless of insert size (-i 150, -s 300). EMIRGE-reconstructed sequences for each sample were then clustered using a 97% sequence identity threshold (usearch -clustersmallmem, -id 0.97), and EMIRGE NormPrior abundances for all members in a cluster were summed.
OTU picking
We retained sequences of both domains as low as 0.02% estimated per-sample relative abundance after applying a minimum expected per base coverage threshold of 20X. EMIRGE-reconstructed sequences above a 20X expected coverage threshold from all samples were pooled for each domain, removing sequences containing Ns and those
21


predicted as chimeric using the DECIPHER webtool (Wright, Yilmaz, and Noguera 2012). Study-wide representative OTU sequences and per-sample abundances were identified using the following protocol. First, estimated read-counts were computed using the product of the EMIRGE-estimated relative abundance and the number of reads mapped per sample. The combined set of sequences from all samples were sorted by their per-sample estimated read-counts and dereplicated at 100% identity, and read-counts summed using usearch (-cluster smallmem, id 1.0, -usersort). The resulting set of unique sequences was sorted by decreasing study-wide summed read counts, and used as input to the cluster otus step of the UP ARSE pipeline (Edgar 2013), using a 97% sequence identity to identify representative OTUs. Finally, all EMIRGE sequences from all samples (those above and below the 20X threshold) were mapped back to the OTUs at 97% identity using usearch global (Edgar 2010) and estimated read counts were summed from mapped sequences to generate an OTU table. Taxonomy was assigned using the RDP classifier (Wang et al. 2007) (confidence level 0.8), which was retrained using the Greengenes 13 8 database (DeSantis et al. 2006) trimmed to the V3-V6 region using PrimerProspector as described above. To incorporate the most recent advances in archaeal taxonomy, the archaeal OTUs were also classified using the Silva SSURef NR99 database (version 128), using the same methods. Silva taxonomy labels were manually curated to remove non-taxonomic identifiers such as "uncultured archaeon". Where there was a discrepancy between Greengenes and Silva taxonomy or taxonomy was not assigned below “archaea,” OTUs were manually assigned taxonomy based on their position in the Silva tree. Three samples representing a fifth sediment depth were removed from further analyses due to insufficient replication. The final OTU tables were filtered to retain only those OTUs that appeared in at least 3 of 84 samples.
22


Simulation ofV4 region 16S rRNA amplicons and OTUpicking
All EMIRGE-reconstructed sequences were trimmed in silico to the V4 region of the 16S rRNA gene using Primer Prospector v. 1.0.1 with default settings and universal primer sequences F515 5'-GTGCCAGCMGCCGCGGTAA-3' and 806R 5'-GGACTACHVGGGTWTCTAAT-3'. By using the same sequences that were input to V3-V6 OTU picking, this procedure maintained the same underlying community structure to estimate the effects of shorter amplicons and reduced sequencing depth on archaeal community resolution. Sequences that were predicted to amplify using this primer set were input to the OTU picking protocol as described above. The confidence threshold for the RDP classifier was changed from 0.8 to 0.5 to reflect the shorter sequence length. The final OTU table was converted to relative abundance and all relative abundances reduced tenfold to simulate an archaeal community comprising only 10% of the sequencing in any sample. To identify which equivalent V3-V6 Woesearchaeota OTUs would have been identified with this protocol, we searched the V3-V6 Woesearchaeota OTUs against a BLASTN database of the V4 Woesearchaeota OTU sequences.
Microbial community analyses
OTU counts were normalized to within-sample relative abundance (total sum scaling, TSS). Microbial community analyses and visualizations were conducted in R (R Core Team 2014) using the phyloseq, vegan, VennDiagram, gplots, and ggplot2 packages (Wickham 2009; Chen and Boutros 2011; McMurdie and Holmes 2013; Oksanen et al. 2013; Wames et al. 2016), and with QIIME (Caporaso et al. 2010). For analyses of microbial community data in conjunction with geochemical data, only the 76 samples with both microbial and geochemical measures were used. PERMANOVA (Anderson 2001) tests were implemented
23


using the vegan: :adonis function. The bioenv function in vegan was used to identify the subset of geochemical parameters that best correlated with the community membership dissimilarity matrix, and geochemical variables were tested for covariance using the Hmisc package (Harrell and Dupont 2015). Mantel and partial mantel tests were conducted using vegan. GPS coordinates were converted to intersample distances as 30.78m for 1 second of latitude and 24.38m for 1 second of longitude
(http://www.usgs.gov/faq/categories/9794/3022). All commands for these analyses are reported in Supplemental file 2.5.
16S rRNA phylogenetic tree construction
All archaeal OTUs were aligned using SINA (Pruesse, Peplies, and Glockner 2012) and added to the ARB guide tree (SILVA SSURef NR99, vl23) using the parsimony add method to retain tree topology (Ludwig et al. 2004). For the Bathyarchaeota tree, published representative Bathyarchaeota sequences (Kubo et al. 2012; Lloyd et al. 2013; Meng et al. 2014; Evans et al. 2015) were similarly added to the guide tree if not already present. The tree was pruned to retain only the representative sequences and the OTUs from this study, and representative sequences were used to identify subgroups. The Euryarchaeota phylogenetic tree was constructed using RAxML (GTRGAMMA, 5 searches, 100 bootstraps) (Stamatakis 2014) in ARB using all OWC Euryarchaeota OTUs and reference sequences, including cultured Euryarchaeota 16S rRNA gene sequences and sequences of the 2 nearest neighbors to the OWC OTUs as identified by the SINA alignment to SILVA SSURef NR99 vl26 (Quast et al. 2013). Only the Candidatus Methanoperedens andMethanosaeta genera are shown in Figure 2.4. For the Woesearchaeota tree, all reference sequences and OTUs located within the Woesearchaeota candidate phylum were retained and a maximum
24


likelihood tree constructed using RAxML (GTRGAMMA, 1 search, 100 bootstraps) with representative DP ANN genomes as the outgroup. The recently published partial 16S rRNA gene from the Woesearchaeota RC-V genome bin reported by Lazar et al. (2017) was added to this tree with the parsimony add function in ARB. Trees were visualized and annotated using iTOL (Letunic and Bork 2007).
Data availability
DNA sequencing data is available at the NCBI Sequence Read Archive under BioProject PRJNA325008.
Results
Site geochemistry
Soil geochemical parameters and corresponding microbiology samples (Table S2.1) were collected across two transects (Transect 2 and Transect 3). Each transect contained three sampling sites defined by hydrology [mud (M), mud transition (MT), open water covered (O)], and multiple cores were pulled from each of the six sites, with each core divided into four depths for sampling (Figure 2.1). Calculated water salinity over the sampling period ranged from 0.2 - 0.4 parts per thousand. With increasing soil depth, Fe (II) concentrations increased, while sulfate concentrations decreased, likely reflecting oxygen decrease with depth (Figure S2.1). Nitrite, pH and total carbon were largely invariant across sites and depths within Transect 2, but were more variable across Transect 3. The Transect 3 open water covered site (site 03) was most distinct from all other sites, with significantly lower pH and total soil carbon concentrations (Tukey's HSD p < 0.001) and nitrite below detection in 13 of 16 samples (Figure S2.1, Table S2.1). Across sites, overall sample geochemistry varies more with soil depth, rather than by lateral distance (Figure S2.2). Of the
25


measured geochemical parameters (Table S2.1), pH, Fe (II), nitrite, sulfate, and total soil carbon were identified as the subset that best correlated with both archaeal and bacterial community inter-sample dissimilarities (bioenv Spearman r= 0.62, 0.57 respectively).
Figure 2.1 - Site map and scale of wetland soil sampling. Hydric soil samples were collected from the Old Woman Creek freshwater wetland (star) in Ohio, USA (left). Samples were collected along two transects (red lines, transects T2 and T3). Each transect consists of three sites (M, MT, or O) defined by water cover, and 3 or 4 cores (C1-C4) per site were collected and sampled at 4 depths (D1-D4; right). A representative sample from the mud site of transect 2, core 4, at depth 3 is identified as M2C4D3. The 84 samples processed for geochemistry and microbial community analysis represent samples spatially separated at scales from centimeters to hundreds of meters.
Domain-specific sequencing recovers bacterial and archaeal diversity
We employed a domain-specific 16S rRNA amplicon sequencing and assembly approach to concurrently characterize the archaeal and bacterial communities in 84 wetland
26
35 cm


soil samples across two hydrological transects (Figure 2.1, Transect 2 and Transect 3). 16S rRNA V3-V6 amplicons were assembled from an average of 337,298 +/- 167,867 bacterial reads using the primers 338f/1061r (Ong et al. 2013), and 275,955 +/- 185,997 archaeal reads using the primers 349f/1041r (Gantner et al. 2011; Klindworth et al. 2013) per sample (+/-standard deviation; Table S2.2). In silico analyses of the selected PCR primers against an existing 16S rRNA database (Klindworth et al. 2013) predicted 84.4% coverage of the archaea and 95% coverage of the bacteria, with virtually no cross-domain amplification predicted. This domain specificity of the primers was confirmed in the sequencing: on average, 99.87% of the archaeal sequencing reads, and 99.99% of the bacterial sequencing reads mapped to sequences of the target domain (Figure S2.3; Table S2.2), and we exclusively assembled sequences of the targeted domain within each sequencing run. The archaeal primers produced some non-specific amplification of protein coding bacterial DNA, likely as a result of low annealing temperature (Gantner et al. 2011), and EMIRGE identified and discarded these reads.
Our pipeline identified 478 archaeal and 1082 bacterial 97% identity Operational Taxonomic Units (OTUs) distributed broadly across the wetland (Supplemental Files S2.1-
52.4) . On average, each sample contained 31% (146 +/- 41) of archaeal and 44% (476 +/-111) of bacterial OTUs. Across the wetland, 40% (188/478) of archaeal OTUs and 60% (655/1082) of bacterial OTUs are universally found at each of the six sampling sites (Figure
52.4) . Hydrology-paired sites from the two transects shared most OTUs (Figure S2.4). For example, 93% of archaeal OTUs found at the Mud 3 site are also found at the Mud 2 site.
The most geochemically distinct site (03) still shared 90% of archaeal and 94% of bacterial OTUs with at least one of the other five sites.
27


Community-level biogeography corresponds to geochemistry
Across samples, archaeal and bacterial beta diversity was strikingly similar. Bray-Curtis inter-sample dissimilarity matrices based on the bacterial and archaeal communities were highly correlated (Mantel R=0.86, p<0.0001), and non-metric multidimensional scaling (NMDS) ordinations of these dissimilarities produced significantly similar clustering patterns of samples for both domains (Figure 2.2; Procrustes correlation^.90, p<0.0001). Archaeal and bacterial communities found in individual samples were most similar in composition to communities from samples with similar geochemistry. Ordination of samples based on Euclidean distance of geochemical measurements alone uncovered a similar inter-sample structure to that observed for both the archaeal and bacterial community (Figure S2.2; procrustes correlation = 0.65, 0.66 respectively, p<0.0001). Inter-sample community dissimilarity was not solely due to increases in spatial distance between samples (Angermeyer, Crosby, and Huber 2016). When controlling for the effects of lateral distance between sampling sites, both the archaeal and bacterial community structures still correlated significantly with measured geochemical parameters (Table S2.3; p<0.0001; partial Mantel R=0.58, 0.54 respectively).
28


A
Transect 2
B
Transect 3
ra
o
co
-C
o
<
CO
0
o
0
CD
°o O ® o O
cP °° o o°°<^ <* ° & Aa
„ ° ^ $P oA $Taa o A
A &
A A A
-0 c .2 0.0 0.2 NMDS1 0.4

O Q O . O OA O A
oo ’oO A A \ £ A
° ** A A^ o ^ A
Qd O 0 ♦ o
<*cft cP 3 ° 9° ° o o ^aa* o a ♦ A
A A A
-0.2 0 D 0 0 NMDS1 2 0
n
8 °
$ O
O A/
o°„ 9 o * Oa & A
A ft A *A
-0.2 0.0 0.2 0.4 0.6
NMDS1
-0.2 0.0 0.2 NMDS1
Mud
O 0-5 cm O 6 -12 cm O 13 - 23 cm O 24-35 cm
Mud transition
0-5 cm <> 6- 12cm <> 13-23 cm ^ 24-35 cm
Water covered A 0 -5 cm /\ 6 -12 cm A 13-23 cm Z 24-35 cm
Figure 2.2 - Statistically significant clustering of samples by water cover and soil depth as described by both archaeal and bacterial community composition. NMDS ordination of Bray-Curtis dissimilarity for soil archaeal (A, B) and bacterial (C, D) communities. A single ordination was performed for each domain and is shown twice in color to emphasize the samples from Transect 2 (A, C) and from Transect 3 (B, D) separately, with samples from the opposite transect shown in grey. Measured Bray-Curtis dissimilarities for both archaeal and bacterial communities across the wetland confirmed that soil depth defines microbial community structure. Within Transect 2 (A), samples cluster more by soil depth than by site (PERMANOVA p < 0.001; R2= 0.52, 0.13 respectively); whereas in Transect 3 community dissimilarity is dominated by the unique 03 site geochemistry, though samples are still primarily organized by soil depth.
Overall, bacterial and archaeal communities were structured at spatial scales that correspond to geochemical gradients. We collected multiple adjacent cores within ~1 m2 at each site (Figure 2.1). At the meter scale, samples from adjacent cores clustered significantly by depth for five of the six sites (PERMANOVA p<0.001; Table S2.4). Alternatively, samples from the same core but from depths separated by centimeters did not cluster together (Figure S2.5, Table S2.4). Within Transect 2, microbial communities were most similar at equivalent soil depth, despite being collected from hydrologically distinct sites ~ 10 meters
29


apart (Figure 2.2; mean Bray-Curtis dissimilarity grouped by depth within site: 0.25, grouped by depth within Transect 2: 0.26, grouped by core within site: 0.38). Across transects (roughly 200 meters apart), the communities from both mud sites (M2, M3) are similar to each other at equivalent depths (Figure 2.2). However, for the water-covered soil samples (02, 03), community similarity at equivalent depths is found only within and not across transects, in agreement with the geochemical differences in pH, nitrite, sulfate, and total carbon between the 03 site and all other sites (Figure 2.2).
Bacterial community membership and distribution
Bacterial OTUs (Supplemental files S2.2, S2.4) were most commonly from the Proteobacteria (44% of total bacterial relative abundance), Chloroflexi (17.5%), Bacteroidetes (9.8%) and Nitrospirae (7.6%). The Deltaproteobacteria were the most abundant of the Proteobacteria, followed by the Betaproteobacteria, and Gammaproteobacteria. Most Chloroflexi OTUs were found within the Deha/ococcoidetes, followed by Anaerolineae. The most abundant bacterial orders were the Bacteroidales and the Nitrospirales (8% each), followed closely by a Deltaproteobacteria order (BPC076) with 7% of bacterial abundance. Like many other bacterial orders, these three orders exhibited abundance distributions suggesting habitat preferences (Figure S2.6), in this case for shallow soils (.Bacteroidales), deeper soils (.Nitrospirales), and for the geochemically distinct 03 site (BPC076).
Many OTUs were associated with known sulfate or sulfur reducing lineages, including Deltaproteobacteria taxa Desulfarculaceae, Desulfobulbaceae,
Desulfobacteraceae (including Desulfococcus), Syntrophobacterales, and Thermodesulfobacteriales. Additional potential for sulfate reduction was represented by 19
30


Thermodesulfovibrionaceae OTUs. A single high-abundance sulfur-oxidizing Thiobacillus sp. OTU (OWC_b8) was among the most abundant bacterial OTUs site-wide (site-wide mean 1.96%, max 5.2%).
In addition to sulfur cycling, we also inferred the potential for iron cycling and bacterial aerobic oxidation of methane (Figure S2.6). Fourteen OTUs were assigned to the metal reducing genus Geobacter, and 6 OTUs are assigned to the microaerophilic iron-oxidizing genus Gallionella. The bacterial methanotrophic OTUs were all Type I methanotrophs, with the exception of one low-abundance OTU assigned to the Methylosinus genus (Supplemental File S2.2). Potential methanotroph OTUs included 21 OTUs within the order Methylococcales (10 Methylococcaceae OTUs, 6 Crenotrichaceae; Supplemental File 2.2). Most Methylococcales OTUs decreased in abundance with soil depth. However, the most abundant Methylococcales OTU (OWC_bl7; 99% identity to Methylobacter tundripaludum) usually appeared in deeper sediment samples than the other Methylococcales OTUs (Supplemental File S2.2).
Methane-cycling archaeal community membership and distribution
Domain-specific sequencing revealed diverse archaeal communities across the wetland. Euryarchaeota contributed 52% of total archaeal relative abundance, followed by the Bathyarchaeota (formerly Miscellaneous Crenarchaeota Group) with 36% (Figure 2.3; Supplemental File 2.1). Multiple individual OTUs from these phyla were found at relative abundances greater than 20% in individual samples. The next most abundant phylum, the Woesearchaeota, comprised 8% of total relative abundance. Multiple other low-abundance archaeal groups each at total relative abundances less than 1% of the archaea were consistently detected across the wetland (Figure 2.3).
31


Figure 2.3
Transect 2 Transect 3
Mud Mud Open Mud Mud Open
transition water transition water
& 0s $ & & 0s $ & 0s $
Phylum: | Bathyarchaeota | Euryarchaeota Thaumarchaeota | Verstraetearchaeota | Woesearchaeota | Other
Within sample relative abundance (%)
^ max=58%
0.01 0.1 1 10%
# OTUs 6 72 21 157 8 10
13 3 6
3 2 2
4
15 1 3 1
10
6
14
5
10
14
13
5
16 16
Phylum
Thermoplasma tales; ASC21 Bathyarchaeota Metnanosaeta Woesearchaeota (DHVEG-6)
Candidatus Methanoperedens WSA2; 20a-9
Aeniqmarchaeota; Deep Sea Euryarchaeotic Group Candidate division YNPFFA (Verstraetearchaeota) Thaumarchaeota; AK59 | Fladesarchaea
Thaumarchaeota; pSLI 2 Methanosarcina Methanobacterium Altiarchaeales
Methanomicrobiales; Rice Cluster II Soil Crenarchaeotic Group (SCG)
Soil Crenarchaeotic Group (SCG); SCA115 Thermoplasma tales; CCA47 Thermoplasma tales; AMOS1A-4113-D04 Miscellaneous Euryarchaeotic Group (MEG) Methanomicrobiaceae Methanorequla
Terrestrial Miscellaneous Gp (TMEG)
Group C3 (Bathyarchaeota Group 15)
IMethanolinea
Marine Benthic Group D (DHVEG-1)
Lokiarchaeota
UJ


Figure 2.3 - Archaeal taxa display abundance patterns corresponding to geochemical parameters and soil depth. Heatmap of relative abundance of archaeal OTUs summarized at the deepest informative level at or above genus according to Silva release 128 taxonomy. Only those taxa with a minimum 0.01% mean relative abundance across samples are included. Equivalent depth samples are grouped together within each site, and presented in order of increasing soil depth left to right.
33


OTUs were identified from 5 of the 7 known methanogenic Euryarchaeota orders, including: Methanobacteriales, Methanomicrobiales (Methanoregula, Methanolinea, Methanospirillaceae); Methanosarcinales (Methanosaeta, Methanosarcina); Methanocellales, and Methanomassiliicoccaceae (Supplemental files S2.1, S2.3). OTUs representing all known methanogenic metabolisms are present, with acetoclastic Methanosaeta spp. (max relative abundance 47%, mean 21% +/- 8%) and hydrogenotrophic Methanoregula spp. (max relative abundance 10%, mean 4% +/- 2%) being the most abundant methanogenic OTUs. We also identified 3 OTUs at 96-98% 16S rRNA gene identity to sequences in the newly described methanogenic candidate genera Candidatus Methanomethylicus and Candidatus Methanosuratus within the candidate phylum Verstraetearchaeota (Vanwonterghem et al. 2016). Of the 85 Bathyarchaeota OTUs, none shared more than 92% 16S rRNA gene identity to the 2 recently described methanogenic Bathyarchaeota (Evans et al. 2015). We identified 10 OTUs classified in the SILVA taxonomy within the proposed methanogenic WSA2 class (Nobu et al. 2016). However, all 10 OTUs were within the 20a-9 subclade, and share less than 80% identity with 16S rRNA genes from described methanogenic WSA2 Candidatus Methanofastidiosa genomes.
The most abundant archaeal OTUs across the wetland were related to anaerobic methanotrophic Candidatus Methanoperedens spp. (ANME-2d). The 8 Candidatus Methanoperedens OTUs identified here were related to both the recently described Candidatus Methanoperedens sp. BLZ-1 (Arshad et al. 2015) (93.5% - 99.7% ID over V3-V6 16S rRNA gene) and Candidatus Methanoperedens nitroreducens (Haroon et al. 2013) (93.8% - 97.3% ID over V3-V6, Figure 4). The relative abundance of this approximately genus-level group of OTUs (max relative abundance 58%; mean 12% +/- 13%; Figure 2.3,
34


Figure 2.4) approaches the abundance level observed for entire archaeal phyla. These OTUs increased in relative abundance with soil depth, and were particularly enriched at the M3 site due to the increase in a single Candidatus Methanoperedens sp. OTU (OWC_a2). OWC_a2 comprised, on average, 80% of the total ANME-2d abundance observed across all sites, and was the most abundant individual archaeal OTU in this study, reaching 45% of total archaeal abundance in a sample from the M3 site (Figure 2.3, Figure 2.4).
Woesearchaeota, Bathyarchaeota, and other enigmatic archaea display notable diversity across the wetland
Separately targeting the archaea allowed for robust detection of archaeal groups that were composed of numerous low-abundance OTUs. The recently described Woesearchaeota phylum (previously DHVEG-6 and Parvarchaea) (Castelle et al. 2015) is represented by the largest number of archaeal OTUs (157), despite being lower in total relative abundance compared to the Euryarchaeota and Bathyarchaeota. Of the 157 Woesearchaeota OTUs, 145 individual OTUs had a relative abundance reaching 0.1% in at least one sample, though only 17 of those OTUs reached a relative abundance of at least 1% in any sample. We asked whether the high level of phylogenetic diversity within this phylum would have been detected with an amplicon approach using standard pan-domain PCR primers targeting the shorter V4 hypervariable region (Caporaso et al. 2012). In silico simulation using the V4 primer set and the same OTU picking procedure produced 241 Woesearchaeota OTUs. However, assuming archaea represent approximately 10% of the prokaryotic community (Schwarz, Eckert, and Conrad 2007; Prasse, Baldwin, and Yarwood 2015; Webster et al. 2015; Argiroff et al. 2016) only 11 of these shorter 241 Woesearchaeota OTU sequences would be reported above the 0.1% relative abundance threshold with universal V4
35


sequencing, and no single OTU reached 1% relative abundance within the archaea (Figure S2.7).
Thq Bathyarchaeota candidate phylum (Meng et al. 2014) is represented by 85 OTUs that are collectively abundant across the wetland, almost exclusively assigned to phylogenetic subgroups 5b, 6, 7/17, 11, and 15 (Kubo et al. 2012). Of these 85 OTUs, 81 had a relative abundance of at least 0.1% in at least one sample, and 22 reached 1%, whereas with the simulated V4 approach only 20 of 96 OTUs reached relative abundance of 0.1% in any single sample, and only 3 OTUs reached 1%. Individual Bathyarchaeota subgroups had site-specific distributions, which correlated with underlying geochemistry at the OTU level (Figure 2.5).
Numerous additional underexplored archaeal taxa were represented by multiple OTUs, despite contributing a small component of archaeal relative abundance, and for all these groups we discerned habitat preferences across the wetland. A preference for deeper soils occurs for the Lokiarchaeota (formerly Marine Benthic Group B), certain Thaumarchaeota groups (pSL12, AK59, formerly Marine Benthic Group A), Marine Benthic Group D (DHVEG-1) and Terrestrial Miscellaneous Group (TMEG) OTUs. A preference for the low nitrite, low carbon, low pH 03 site occurs for the Aenigmarchaeota, Thermoplasmatales ASC21, Hadesarchaea, and a clade of Methanomicrobiaceae distinct from described genera within this family (Figure 2.3).
36


OTU mean relative abundance (%)
0 7 14 21 28 35
Mud
Mud
transition
Water
covered
Mud
Mud
transition
Water
covered
Tree scale: 0.01
D1 D2 D3 D4 D1 D2 D3 D4 D1 D2 D3 D4
D1 D2 D3 D4 D1 D2 D3 D4 D1 D2 D3 D4
rOW
m f—l
owe a252
â– uncultured archaeon China Xiangjiang River JF304118 uncultured archaeon Taiwan Jing-Mei River Taipei JN398004 OWC a169
C
<11
"D
01
01
Q.
O
C
3
<3
t3
C
5
j—u
vCandii vjncultur ^uncultui xhaeo -uncu I-----u
I * uncu li Hiuncul
tow
^um
incultured archaeon JN18!
^uncultured archaeon JQ738695 archaeon enrichment culture clone LCBA1C7 FJ907179 uncultured archaeon Taiwan Jing-Mei River Taipei JN397837 OWC a354
uncultured archaeon EU155918 iuncultured archaeon Japan Hokkaido AB294257 ' OWC a215
uncultured archaeon USA New York state DQ301885 â– Rifle CO gbIKP308694.1 OWC a48
uncultured arch aeon Taiwan Jing-Mei River Taipei JN398024 OWC a330
uncultured archaeon China HM244267 uncultured archaeon Switzerland Lago di CadagnoAM851080 â– uncultured archaeon Taiwan HM159360 uncultured archaeon USA EU155958 â– Candidatus Methanoperedens sp. BLZ-1 uncultured archaeon DQ369741
OWC a2
uncultured archaeon Taiwan Jing-Mei River Taipei JN397848 -----OWC a164
uncultured euryarchaeote Antarctica Weddell Sea Larsen FN429776 •uncultured archaeon Taiwan Jing-Mei River Taipei JN397705 ■uncultured euryarchaeote China Tibetan plateau Zoige w EU283010
I----0
kuncull
[runcu
^uncu
â– Methanosaeta harundinacea AY817738
Methanosaeta thermophila PT CP000477 â– Methanosaeta thermophila PT Japan AB071701 OWC a985
uncultured archaeon USA EU155954 uncultured archaeon EU155914 uncultured archaeon HQ330686 OWC a40
-uncultured archaeon USA JN649277 -OWC a100
_ ^uncultured archaeon USA New York AY175391 l#i— OWC a385
^uncultured archaeon USA JN649258 uncultured archaeon KC831394 OWC a165
uncultured archaeon HQ330664 uncultured euryarchaeote Spain AJ937876 â– OWC a424 OWC a287
â– uncultured archaeon China Xiangjiang River JF304142
^nclj^rfcfarchaeon China Xiangjiang River JF304117 â– uncultured archaeon China Xiangjiang River JF304116 OWC a393
■uncultured archaeon HQ330699 ^puncultured archaeon KJ206650 ^uncultured Methanosaeta sp. KC502890 I— OWCa122
-uncultured archaeon USA Kentucky EU519311 runcultured archaeon EU155903
^uncultured archaeon Taiwan Jing-Mei River Taipei JN397850 l_j— OWC a503
â– ----OWC a548
•uncultured archaeon EU155902 -uncultured Methanosaeta sp.AY780569 -uncultured archaeon Taiwan Jing-Mei River Taipei JN397894 OWC a586
â– uncultured archaeon USA Palo Alto CA HQ592625
B uncultured archaeon France Cholet CU917245
Methanosaeta concilii AB679168 Methanosaeta concilii GP6 CP002565 Methanosaeta concilii GP6 CP002565 Methanosaeta concilii X51423 I---OWC a980
Ip uncultured archaeon USA JN649116 ^uncultured archaeon USA JN649153 I—OWC a255
•mcuiSef archaeon JX426845 rOWC a253 # l-OWC a356
OWC a805
^^uncultured archaeon France Cholet CU916789 -----OWC a981
—uncultured Methanosarcinales archaeon JX023111 j—OWC a148
■•uncultured archaeon USA JN649143 * uncultured archaeon SJD-114 AJ009515
uncultured Methanosaeta sp. Brazil AY454761
Figure 2.4 - Methanosaeta spp. and Ca. Methanoperedens spp. OTUs in OWC soils are abundant and display OTU-level habitat preferences. Selected clades from a Euryarchaeota maximum likelihood 16S rRNA gene tree show individual OTUs within (A) Candidatus Methanoperedens (ANME-2d) and (B) Methanosaeta genera. OTU mean relative abundance for each sampling depth at each site is shown. Bootstrap values >0.8 are shown with black circles.
37


A
Transect 2
Transect 3
Within-sample relative abundance (%)
Mud
Mud Water
transition covered
Mud
Mud
transition
0.01 0.1 1 10%
Water
covered
Subgroup 6 Subgroup 15 Subgroup 7/17 Subgroup 5b Subgroup 11
B
Spearman
correlation
0
-0.14
1-0.28 -0.42 -0.56 -0.7
Subgroup 15 U Subgroup 7/17 U Subgroup 11 [|| Subgroup 5b Subgroup 6
38


Figure 2.5 - Bathyarchaeota subgroups and OTUs display phylogenetically conserved abundance patterns in the OWC wetland, correlating to geochemical measures, a)
Bathyarchaeota OTUs in OWC soils are distributed across several subgroups associated globally with freshwater sediments and display subgroup-specific differentiated abundance. The heatmap shows within-sample relative abundance of OTUs within each Bathyarchaeota subgroup. Replicate depth samples are grouped together within each site, and presented in order of increasing soil depth left to right. Reference sequences (Kubo et al. 2012; Lloyd et al. 2013; Meng et al. 2014) were used to place OWC Bathyarchaeota OTU within subgroups, b) Phylogenetic tree of wetland Bathyarchaeota 16S rRNA gene sequences with reference sequences as listed above. Heatmap shows each OTU's Spearman correlation with geochemical measures. Correlations are phylogenetically conserved among groups, agreeing with their site-specific distributions as shown in panel A
39


Discussion
Archaea play crucial roles in freshwater circumneutral wetlands, but studies to date have largely been limited in their ability to detect the full diversity of archaeal communities, or have used methods that focus only on methane-cycling archaea. We applied replicated, domain-specific amplicon sequencing to a model freshwater wetland, deeply characterizing both archaeal and bacterial communities across soil geochemical, depth, and hydrological gradients. With the targeted, increased sequencing depth afforded by this approach, correlated to geochemical data, we have identified patterns of microbial presence and abundance that suggest geochemical controls on community structure, as well as provided insights into the lifestyle of both abundant and rare taxa.
Broadly, we infer that geochemical environment, rather than dispersal limitation (Homer-Devine et al. 2004; Martiny et al. 2011), appears to be the controlling factor defining archaeal and bacterial community structure in this wetland. First, the relatively high percentage of OTUs shared across the wetland (Figure S2.4) suggests that there are limited physical barriers to dispersal. Second, microbial communities varied more with geochemical measures changing over centimeters of depth rather than with lateral distances of hundreds of meters. Finally, these patterns held for both the archaeal and bacterial communities (Figure 2.2), even though these communities were measured independently. To our knowledge, microbial community heterogeneity across spatial scales has not been explored within freshwater wetland soils, and the broad spatial similarity observed here contrasts with the high level of spatial heterogeneity observed within some dry soils (O’Brien et al. 2016).
OWC is a methane-emitting wetland (Nahlik and Mitsch 2010), so we asked if the membership and composition of the methanogenic archaeal community was similar to those
40


reported in other wetland ecosystems. Consistent with other freshwater wetlands (Borrel et al. 2011; Bridgham et al. 2013), we identified multiple archaea belonging to known methanogenic orders, with the most abundant being acetoclastic Methanosaeta spp., followed by hydrogenotrophic Methanoregula spp. These two genera are frequently identified together in wetland settings, with relative dominance depending on factors such as pH, season, and carbon availability (Kotsyurbenko et al. 2007; Sun et al. 2012; He et al. 2015). Although activity cannot be determined from relative abundance, based on the replicated abundance distributions of these methanogens, we hypothesize that acetoclastic methanogenesis may be more relevant in the wetland during the sampling season.
However, our targeted sequencing of the archaea also identified multiple additional methanogenic groups, spanning 5 of the 7 known methanogenic Euryarchaeota orders, including members of the Methanomassiliicoccaceae, as well as newly described methylotrophic methanogens within the phylum Verstraetearchaeota (Vanwonterghem et al. 2016). It is unlikely that the level of diversity of methanogenic taxa and inferred methanogenic substrates detected here represents some unique capacity of the OWC wetland. Rather, this result is more likely a product of our approach, which resulted in increased archaeal sampling depth and phylogenetic resolution. For example, previously unreported Verstraetearchaeota were recently found at very low relative abundances in shotgun metagenomic sequencing data from freshwater wetlands sediments (Vanwonterghem et al. 2016). In our study, the most abundant of the 3 Verstraetearchaeota OTUs has a maximum relative abundance of only 0.6% within the archaeal domain, and would likely have been below detection in a study using more traditional universal primers. However, this OTU is present in 80 of the 84 samples across the wetland, and all three OTUs show a distinct
41


abundance distribution that suggests a habitat preference for deeper soils, and for the 03 sampling site (Figure 2.3). Additional experiments are needed to determine activity of the low-abundance Verstraetearchaeota in the wetland. Nonetheless, increased detection of low-abundance methanogens has important consequences for interpreting the methanogenic capacity of the wetland beyond what is typically inferred by examining only the most abundant methanogenic taxa.
Unexpectedly, the most abundant methane-cycling archaea within this wetland were not most similar to canonical methanogens, but rather were phylogenetically affiliated with anaerobic methanotrophic Candidatus Methanoperedens spp. (also known as ANME-2d or AAA). In separate enrichment cultures, Candidatus Methanoperedens spp. organisms have been shown to conduct anaerobic oxidation of methane (AOM) using nitrate (Raghoebarsing et al. 2006; Haroon et al. 2013; Arshad et al. 2015) or Fe(III) and Mn(IV) (Ettwig et al. 2016) as electron acceptors. Related sequences have also been linked to freshwater AOM coupled to sulfate reduction (Schubert et al. 2011; Timmers et al. 2015). ANME-2d have been identified in globally distributed freshwater ecosystems (Welte et al. 2016; Vaksmaa et al. 2017) including rice paddies (Lee et al. 2015), aquifer sediments (Flynn et al. 2013; Castelle et al. 2015), lake sediments (Stein et al. 2001; Lliros et al. 2010; Schubert et al. 2011; Kadnikov et al. 2012; Fan and Xing 2016), river sediments (Rastogi et al. 2009), estuarine sediments (Li et al. 2012; Prasse, Baldwin, and Yarwood 2015), minerotrophic fens (Cadillo-Quiroz et al. 2008), mud volcanoes (Wrede et al. 2012), and high altitude cold wetland sediments (G. Zhang et al. 2008) and the broad distribution of this group suggests an important role in global carbon cycling in these habitats.
42


The wetland contained Candidatus Methanoperedens spp. 16S rRNA sequences similar to sequences from both metal-reducing and nitrate-reducing Candidatus Methanoperedens (Figure 4), suggesting that these two processes might both occur in natural freshwater wetland environments. Intriguingly, the dominant Candidatus Methanoperedens sp. OTU in this study (OWC_a2) shares 99.7% nucleotide identity with the metal-reducing Candidatus Methanoperedens sp. BLZ-1 (Ettwig et al. 2016) and is most abundant at a site (M3) that is marked by both a significant increase in the abundance of iron-oxidizing Gallionella spp. OTUs and a significant decrease in abundance of metal-reducing Geobacter spp. OTUs (Figure S2.6). Based on the findings of this study, ongoing research is using pore-water dialysis samplers to correlate the abundance and transcripts of Candidatus Methanoperedens spp. populations to local methane flux measurements. However, our results here complement existing soil laboratory enrichment studies of iron-reducing Candidatus Methanoperedens spp. (Ettwig et al. 2016) and provide additional evidence that archaeal AOM linked to iron reduction may be a key part of the methane cycle in some freshwater ecosystems (Sivan et al. 2011; Sivan et al. 2014; Bar-Or et al. 2015).
We also examined our data for prevalence of other known anaerobic methanotrophs. AOM is carried out in other freshwater settings by the nitrite-reducing, methane oxidizing bacterium Candidatus Methylomirabilis oxyfera and related members of the NC10 phylum (Raghoebarsing et al. 2006; Ettwig et al. 2010; Deutzmann et al. 2014; Hu et al. 2014; Shen et al. 2016). Among the 6 low abundance NC10 OTUs identified in the OWC wetland, none were identified as greater than 91.8% 16S rRNA gene nucleotide identity to Ca. M. oxyfera. Further understanding of in situ anaerobic methane oxidation has important implications for modeling redox cycling in wetlands (Smemo and Yavitt 2011), especially as current climate
43


models assume methane oxidation is aerobic, and constrained by depth and saturation-dependent O2 concentration and competition (Riley et al. 2011).
It is well-known that the combined effects of depth-correlated geochemical redox gradients (Cadillo-Quiroz et al. 2006; Lazar et al. 2015; Lee et al. 2015; Chu et al. 2016) and water cover (Kotiaho et al. 2010) can be strongly associated with changes in soil microbial communities. However, we also observed OTU-level differences in occupancy and abundance along soil depth gradients, which would not have been predicted based on redox requirements of closely related organisms. For example, although many anaerobic methanogenic Methanosaeta sp. OTUs increase in abundance with depth, one Methanosaeta OTU had the opposite abundance pattern, and was the most abundant archaeal OTU in 0-5 cm depth samples otherwise characterized by an abundance of aerobic bacterial taxa (Figure 2.4). It is possible that perhaps this taxa has unique antioxidant strategies for tolerating oxygen fluctuations, as has been suggested for other methanogens (Tholen, Pester, and Brune 2007; Angel et al. 2011; Jasso-Chavez et al. 2015). Similarly, while examining the methanotrophic community, we also identified OTUs from the bacterial Methanococcales family. Interestingly, these aerobic methanotrophic OTUs are persistent throughout the sampled depth profile, including to depths where known anaerobic methanotrophic taxa dominate (Figure S2.6). These apparently paradoxical co-occurrences of organisms representing aerobic and anaerobic processes may be the result of redox micro-sites in the soil (Jakobsen 2007). It is also possible that one or other of the populations is not active during the time of sampling, due to death or dormancy from unfavorable redox conditions. Activity measurements will be needed to resolve the role of individual methane-cycling OTUs at different depths.
44


Despite detecting numerous methanogens and methanotrophs, the majority of the archaeal OTUs present in the wetland soils are not currently known to be involved directly in methane cycling. In fact, most of these taxa are from entirely uncultivated lineages. These taxa include the candidate phyla Woesearchaeota and Bathyarchaeota, whose metabolic potential is just beginning to be illuminated by limited metagenomics studies (Castelle et al. 2015; Lazar et al. 2016; Lazar et al. 2017). Here, Woesearchaeota OTUs display distinct abundance patterns suggesting occupancy of at least two environmental niches. Despite the high phylogenetic diversity represented by 157 OTUs, most of the individual Woesearchaeota OTUs shared the same strong association with shallow soils, and corresponding positive correlation with soil sulfate and presumably dissolved oxygen concentrations. Yet, in contrast to this broad trend, there are 33 (21%) sequences that displayed an opposite distribution across the wetland, enriched in deeper soils, suggesting divergent lifestyles among the Woesearchaeota warranting further metabolic exploration (Figure S2.7).
Despite the large preference of Woesearchaeota OTUs for shallow soils, the sampled Woesearchaeota genomes to date suggest fermentative metabolisms (Castelle et al. 2015; Lazar et al. 2017) and/or symbiotic lifestyles (Castelle et al. 2015), rather than aerobic metabolisms. The high overall correlation of this group with soil sulfate concentrations in this wetland may represent an environmental determinant on its distribution. Whereas one complete Woesearchaeota genome was found to encode an ATP sulfurylase, the remaining genes necessary for sulfur cycling metabolisms were absent from this genome (Castelle et al. 2015). These apparent inconsistencies between known genomic potential and the majority of our observed habitat preferences are resolved by phylogenetic comparison of the 16S rRNA
45


sequences from this study alongside those from the available sequenced genomes (Figure S2.7). Seven of 11 Woesearchaeota genomic samples all derive from a single aquifer groundwater source (Rinke et al. 2013; Castelle et al. 2015; Anantharaman et al. 2016; Lazar et al. 2017), and most available genomes cluster narrowly in our expanded phylogenetic tree. That is, reconstructed genomes thus far account for only a small, non-representative fraction of the Woesearchaeota phylogenetic diversity found in this study (Figure S2.7). This restricted genomic sampling limits inference of Woesearchaeota metabolism for most of our shallow-associated OTUs, and more broadly within wetlands and related freshwater lakes and sediments where Woesearchaeota have been observed (Amaral-Zettler et al. 2008; Ye et al. 2009; Borrel et al. 2012; Ortiz-Alvarez and Casamayor 2016). Future genomic sampling from this phylum should target the full breadth of Woesearchaeota metabolisms suggested by the phylogenetically diverse habitat preferences for shallow soils identified here.
Within the Bathyarchaeota candidate phylum, we also identified environmentally differentiated distribution patterns suggesting subgroup-specific habitat preference. The Bathyarchaeota are among the most abundant organisms reported in marine and freshwater sediments globally (Biddle et al. 2006; Borrel et al. 2012; Kubo et al. 2012; Lloyd et al.
2013; Fillol, Auguet, et al. 2015; Lazar et al. 2015). Approximate family-level subgroups have been shown to display habitat preferences (e.g. salinity, sulfate, and soil depth) that likely reflect conserved underlying metabolic capacities (Kubo et al. 2012; Fillol, Auguet, et al. 2015; Lazar et al. 2015; Lazar et al. 2016). The Bathyarchaeota OTUs in this study are confined almost exclusively to subgroups 5b, 6, 7/17, 11, and 15 and display group-specific abundance distributions that extend previously described habitat preferences (Figure 2.5).
46


Bathyarchaeota groups 6, 15, and 7/17 are broadly distributed across the OWC wetland, yet display soil-depth-linked habitat preferences. Agreeing with findings from more saline estuarine sediments (Lazar et al. 2015), Subgroup 6 was generally more abundant in shallower, sulfate-rich soils, extending this habitat preference to freshwater environments. Partial genomes reconstructed from estuarine sediment metagenomes (Lazar et al. 2016) revealed potential fermentative metabolisms for subgroups 6, 15, and 7/17. Acetate was hypothesized to be produced via the breakdown of plant-based carbohydrates, proteins and amino acids, rather than competing with respiring metabolisms, or potentially via autotrophy, and acetogenesis may be conserved across additional Bathyarchaeota subgroups (He et al. 2016). This metabolic strategy could explain the broad distribution of these subgroups here, and presents a potential link to acetoclastic methanogenesis performed by the Methanosaeta spp. and Methanosarcinales OTUs.
Bathyarchaeota subgroups 5b and 11 have a more restricted distribution across the wetland. Previously, subgroups 5b and 11 were suggested as freshwater indicator taxa, being found almost exclusively in freshwater habitats globally (Fillol, Auguet, et al. 2015), but we do not find these groups broadly distributed across these freshwater wetland soils. Instead, OTUs from subgroups 5b and 11 display preferential association with the water-covered 03 site and for deeper soils (Figure 2.5). Individual OTUs within subgroups 5b and 11 consistently correlate positively with Fe(II) and negatively with pH, sulfate, nitrite, and total soil carbon (Figure 2.5). These distributions suggest more finely partitioned habitat preferences beyond salinity tolerances for these subgroups. Subgroups 5b and 11 still lack representative genomes, but our data hint at very different lifestyles from other freshwater Bathyarchaeota groups.
47


Conclusions
Environmental differences among freshwater wetland ecosystems likely translate to differences in microbial communities, which in turn differentially impact biogeochemical cycling. By employing a high-resolution, domain-specific sequencing approach, we provide a more complete picture of the complex archaeal community in a model methane-emitting freshwater wetland. We found surprising diversity among known methanogens, and OTU-level habitat preferences suggesting the potential for subtle controls on methane emissions that may not be phylogenetically conserved. We also infer that archaeal anaerobic oxidation of methane performed by Candidatus Methanoperedens spp. could be a currently unknown mechanism regulating net methane emissions in this wetland. In addition to the functional potential represented by these characterized taxa, we detected a diverse set of archaea for which metabolic properties are largely unknown. However, for these groups, as exemplified by the Bathyarchaeota and Woesearchaeota, the phylum-level to OTU-level archaeal habitat preferences we describe serve as important ecological context within which to interpret emerging genomic and functional studies of archaea in similar habitats. Assuming that they are active, function of these less-characterized taxa will need to be incorporated into a community-level understanding of carbon cycling in freshwater wetland soils. Acknowledgments
This research was supported in part by the Ohio Water Development Authority (#6835, to KCW). We thank the staff of the Old Woman Creek National Estuary Research Reserve, in particular Kristi Arend, and Frank Lopez for site access, housing infrastructure, and space and equipment for analytical and sample processing. We thank Elmar Pruesse for assistance with ARB.
48


Author Contributions
Field sampling was designed and executed by Kelly Wrighton and Kay Stefanik. The domain-specific sequencing method was designed, tested in silico, and implemented by Adrienne Narrowe and Christopher Miller. Kay Stefanik and Jordan Angle conducted geochemical analyses. Jordan Angle and Rebecca Daly performed DNA extraction and PCR. Adrienne Narrowe performed sequencing library preparation. Adrienne Narrowe and Christopher Miller performed all post-sequencing bioinformatics and data analysis.
Adrienne Narrowe and Christopher Miller wrote the manuscript contained in this chapter, which was read and approved by all co-authors.
49


CHAPTER III
COMPLEX EVOLUTIONARY HISTORY OF TRANSLATION ELONGATION FACTOR 2 AND DIPHTHAMIDE BIOSYNTHESIS IN ARCHAEA AND
PARABASALIDS2
Abstract
Diphthamide is a modified histidine residue which is uniquely present in archaeal and eukaryotic elongation factor 2 (EF-2), an essential GTPase responsible for catalyzing the coordinated translocation of tRNA and mRNA through the ribosome. In part due to the role of diphthamide in maintaining translational fidelity, it was previously assumed that diphthamide biosynthesis genes (dph) are conserved across all eukaryotes and archaea. Here, comparative analysis of new and existing genomes reveals that some archaea (i.e., members of the Asgard superphylum, Geoarchaea, and Korarchaeota) and eukaryotes (i.e., parabasalids) lack dph. In addition, while EF-2 was thought to exist as a single copy in archaea, many of these rZ/i/r-lacking archaeal genomes encode a second EF-2 paralog missing key-residues required for diphthamide modification and for normal translocase function, perhaps suggesting functional divergence linked to loss of diphthamide biosynthesis. Interestingly, some Heimdallarchaeota previously suggested to be most closely related to the eukaryotic ancestor maintain dph genes and a single gene encoding canonical EF-2. Our findings reveal that the ability to produce diphthamide, once thought to be a universal feature
2 Portions of this chapter were previously published online as: Narrowe AB*, Spang A*, Stairs CW, Caceres EF, Baker BJ, Miller CS, Ettema TJG (2018). Complex evolutionary history of translation Elongation Factor 2 and diphthamide biosynthesis in Archaea and parabasalids. bioRxiv, and are included with the permission of the copyright holder. Star indicates equal contribution to online publication. Full author contributions are listed at the end of the chapter.
50


in archaea and eukaryotes, has been lost multiple times during evolution, and suggest that anticipated compensatory mechanisms evolved independently.
Introduction
Elongation factor 2 (EF-2) is a critical component of the translational machinery that interacts with both the small and large ribosomal subunits. EF-2 functions at the decoding center of the ribosome, where it is necessary for the translocation of messenger RNA and associated tRNAs (Spahn et al. 2004). Archaeal and eukaryotic EF-2, as well as the homologous bacterial EF-G, are members of the highly conserved translational GTPase protein superfamily (Atkinson 2015). Gene duplications and subsequent neofunctionalizations have been inferred for eukaryotic EF-2 (eEF-2), with the identification of the spliceosome component Snul 14 (Fabrizio et al. 1997), and Rial, a 60S ribosomal subunit biogenesis factor (Becam et al. 2001). Bacterial EF-G is involved in both translocation and ribosome recycling and has undergone multiple duplications, including subfunctionalizations separating the translocation and ribosome recycling functions (Tsuboi et al. 2009; Suematsu et al. 2010) as well as neo-functionalizations including roles in back-translocation (Qin et al. 2006), translation termination (Freistroffer et al. 1997), regulation (Li et al. 2014) and tetracycline resistance (Donhofer et al. 2012). However, to date, archaea were thought to encode only a single essential protein within this superfamily, i.e. archaeal EF-2 (aEF-2) (Atkinson 2015).
Unlike bacterial EF-Gs, archaeal and eukaryotic EF-2s contain a post-translationally modified amino acid which is synthesized upon the addition of a 3-amino-3-carboxypropyl (ACP) group to a conserved histidine residue and its subsequent modification to diphthamide by the concerted action of 3 (in archaea) to 7 enzymes (in eukaryotes) (De Crecy-Lagard et
51


al.; Schaffrath et al. 2014). While diphthamide is perhaps best known as the target site of bacterial ADP-ribosylating toxins (Iglewski, Liu, and Kabat 1977; Jorgensen et al. 2008) and as required for sensitivity to the antifungal sordarin (Botet et al. 2008), its exact role remains a subject of investigation. Yeast mutants incapable of synthesizing diphthamide have a higher rate of translational frame shifts, suggesting that this residue plays a critical role in reading frame fidelity during translation (Ortiz et al. 2006). Furthermore, structural studies of eEF-2 using high-resolution Cryo-EM have indicated that diphthamide interacts directly with codon-anticodon bases in the translating ribosome, and facilitates translocation by displacing ribosomal decoding bases (Anger et al. 2013; Murray et al. 2016). In addition, diphthamide has been proposed to play a role in the regulation of translation, as it represents a site for reversible endogenous ADP-ribosylation (Schaffrath et al. 2014), and in the selective translation of certain genes in response to cellular stress (Arguelles et al. 2014). Given its anticipated role at the core of the translational machinery, it is not surprising that, with the sole exception of Korarchaeum cryptofilum (De Crecy-Lagard et al.; Elkins et al. 2008), the diphthamide biosynthetic pathway is universally conserved in all archaea and eukaryotes. Indeed, while not strictly essential, loss of diphthamide biosynthesis has been shown to result in growth defects in yeast (Kimata and Kohno 1994; Ortiz et al. 2006) and some archaea (Blaby et al. 2010), and is either lethal or causes severe developmental abnormalities in mammals (Liu et al. 2006; Webb et al. 2008; Yu et al. 2014).
In the current study, we explore the evolution and function of EF-2 and of diphthamide biosynthesis genes using genomic data from novel major archaeal lineages that were recently discovered using metagenomics and single-cell genomics approaches (Hug, Baker, et al. 2016; Adam et al. 2017; Spang, Caceres, and Ettema 2017). In particular, we
52


report the presence of EF-2 paralogs in many archaeal genomes belonging to the Asgard archaea, Korarchaeota and Bathyarchaeota (Meng et al. 2014; Evans et al. 2015; Spang et al. 2015; He et al. 2016; Lazar et al. 2016; Zaremba-Niedzwiedzka et al. 2017) and the unexpected absence of diphthamide biosynthesis genes in several archaea and in parabaslid eukaryotes. Our findings reveal a complex evolutionary history of EF-2 and diphthamide biosynthesis genes, and point to novel mechanisms of translational regulation in several archaeal lineages. Finally, our results are compatible with scenarios in which eukaryotes evolved from an Asgard-related ancestor (Spang et al. 2015; Zaremba-Niedzwiedzka et al. 2017) and suggest the presence of a diphthamidylated EF-2 in this lineage.
Materials And Methods
Sampling and sequencing of ABR Loki- and Thorarchaeota.
Sampling, DNA extraction, library preparation and sequencing was produced as described in (Zaremba-Niedzwiedzka et al. 2017). We chose the four deepest samples, at 125 and 175 cm below sea-floor (MM3/PM3 and MM4/PM4 respectively), as they showed highest lokiarchaeal diversity in a maximum likelihood phylogeny of 5 to 15 ribosomal proteins (RP15) encoded on the same contig (Zaremba-Niedzwiedzka et al. 2017). Adapters and low quality bases were trimmed using Trimmomatic version 0.32 with the following parameters: PE -phred33 ILLUMINACLIP:NexteraPE-PE.fa:2:30:10:l:true LEADINGS TRAILING:6 SLIDINGWINDOW:4:15 MINLEN:36 (Bolger, Lohse, and Usadel 2014). Assembly of ABR Loki- and Thorarchaeota.
Samples from the same depth were assembled together using IDBA-UD (Peng et al. 2012) (version 1.1.1-384, —maxk 124-r ) producing four different assemblies (SEMM1/PM1, S2:MM2/PM2, S3:MM3/PM3, S4:MM4/PM4). Assemblies S3
53


and S4 were particularly interesting as they showed the highest lokiarchaeal diversity. However, some lokiarchaeal members showed highly fragmented contigs, probably due to the low abundances of these organisms. In an attempt to produce longer contigs we coassembled those reads coming from Asgard archaea members in the samples MM3, PM3, MM4 and PM4. Asgard archaea reads were identified using Clark (version 1.2.3, -m 0) (Ounit et al. 2015) and Bowtie2 (version 2.2.4, default parameters) (Langmead and Salzberg 2012) against a customized Asgard archaea database. Classified reads were extracted and coassembled using SPAdes (version v.3.9.0, —careful) (Bankevich et al. 2012).
In brief, the Asgard database was composed of Asgard genomes publicly available on February 2017. Clark does not perform well when organisms present in the samples of interest are not highly similar to the ones present in the provided database. To increase the classification sensitivity, we included in our database low-quality Asgard MAGs (with highly fragmented contigs) generated from assemblies S3 and S4, using CONCOCT (Alneberg et al. 2014). Coverage profiles required by CONCOCT were estimated using kallisto (version 0.43.0, quant —plaintext) (Bray et al. 2016). All available samples from the same location (MM1, PM1, MM2, PM2, MM3, PM3, MM4, PM4) were used and mapped independently against the assemblies S3 and S4. For each assembly, MAGs were reconstructed using two different minimum contig length thresholds (2000 and 3000 bp). We used the number of containing clusters of ribosomal proteins (ribocontigs) as a proxy to estimate the microbial diversity present in the community. The maximum number of clusters (-c option in CONCOCT) was estimated by calculating approximately 2.5 times the estimated number of species in the sample (Johannes Alneberg, personal communication), resulting into 900 and 600 for S3 and S4, respectively. Potential Asgard archaea bins were identified based on the
54


presence of ribocontigs classified as Asgard archaea and were included in the database. Binning of ABR Loki- and Thorarchaeota.
Several binning tools with different settings were run independently: CONCOCT_2000: version 0.4.0, — readlength 200 and minimum contig length of 2000. CONCOCT_3000: version 0.4.0, — read length 200 and minimum contig length of 3000. In both cases, coverage files were created mapping all 8 samples against the co-assembly using kallisto. MaxBin2: version 2.2.1, -min_contig_length 2000 -markerset 40 -plotmarker (Wu, Simmons, and Singer 2016). The 8 samples were mapped against the co-assembly using Bowtie2. Coverage was estimated using the getabund.pl script provided. MyCC_4mer: 4mer -t 2000 (Lin and Liao 2016). MyCC_56mer: 56mer -t 2000. Both coverage profiles were obtained as the authors described in their manual.
The results of those 5 binning methods were combined into a consensus: contigs were assigned to bins if they had been classified as the same organism by at least 3 out of 5 methods. The resulting bins were manually inspected and cleaned further using mmgenome (Albertsen et al. 2013). Completeness and redundancy was computed using CheckM (Parks etal. 2015).
Sampling and sequencing of OWC Thorarchaeota.
Eight soil samples were collected from the Old Woman Creek (OWC) National Estaurine Research Reserve and DNA was extracted as described previously (Chapter II; Narrowe et al. 2017). Library preparation and five lanes of Illumina HiSeq 2x125 bp sequencing followed standard operating procedures at the US DOE Joint Genome Institute (GOLD study ID GsOl 14821). Sample M3-C4-D3 had replicate extraction, library preparation, and two lanes of sequencing performed, and reads were combined before
55


downstream analysis. For 3 additional samples (M3-C4-D4, 03-C3-D3, 03-C3-D4) one lane of sequencing was performed. For the other 4 samples (M3-C5-D1, M3-C5-D2, M3-C5-D3, M3-C5-D4) DNA was sheared to 300bp with a Covaris S220, metagenomic sequencing libraries were prepared using the Nugen Ovation Ultralow Prep kit, and all four samples were multiplexed on one lane of Illumina HiSeq 2x125 sequencing at the University of Colorado Denver Anschutz Medical Campus Genomics and Microarray Core.
Assembly and binning of OWC Thorarchaeota.
For initial assembly of the 5 full-lane sequencing runs, adapter removal, read filtering and trimming were completed using BBDuk (sourceforge.net/projects/bbmap) ktrim=r, minlen=40, minlenfraction=0.6,mink=l 1 tbo, tpe k=23, hdist= 1 hdist2= 1 ftm=5, maq=8, maxns=l, minlen=40, minlenfraction=0.6, k=27, hdist= 1, trimq=12, qtrim=rl. Filtered reads were assembled using megahit (Li et al. 2015) version 1.0.6 with — k-list 23,43,63,83,103,123. The individual metagenome from the 03-C4-D3 sample was binned using Emergent Self-Organizing Maps (ESOM) (Dick et al. 2009) of tetranucleotide frequency (5kb contigs, 3kb windows). BLAST hits of predicted proteins identified a Thorarchaeota population bin. All scaffolds containing a window in this bin were used as a mapping reference and reads from the 9 OWC libraries were mapped to this bin using bbsplit with default parameters (sourceforge.net/projects/bbmap). The mapped reads were reassembled using SPAdes version 3.9.0 with —careful -k 21,33,55,77,95,105,115,125 (Bankevich et al. 2012). Finally, the reads which were input to the reassembly were mapped to the assembled scaffolds using Bowtie 2 (Langmead and Salzberg 2012) to generate a coverage profile which was used to manual identify bins using Anvi'o (Eren et al. 2015). Proteins were predicted using prodigal (Hyatt et al. 2010) and searched against UniRef90
56


release 11-2016 (Suzek et al. 2015), with the taxonomy of best blast hits used to validate contigs as probable Thorarchaeota. Contigs having no top hit to the publicly available Thorarchaeota genomes were manually examined and removed if they could be assigned to another genome bin in the larger metagenomic assembly. Genome completeness and contamination was estimated using CheckM (Parks et al. 2015).
Identification of diphthamide biosynthesis genes and EF-2 homologs in eukaryotes and archaea.
The EGGNOG members dataset (available at
http://eggnogdb.embl.de/#/app/downloads) was surveyed for sequences corresponding to the following clusters of orthologous groups (COG): EF-2, COG0480; DPH1/DPH2, COG1736; DPH3, COG5216; DPH4, COG0484; DPH5, COG1798; DPH6, COG2102; and DPH7, ENOG4111MMJ. For genomes not represented in EGGNOG, we manually inspected publicly available genomes as indicated by ‘orthology assignment source’ (Supplementary File SI). Similarly, an in-house arCOG dataset, modeled after the publicly available arCOGs from Makarova et al. (Makarova, Wolf, and Koonin 2015), was queried for the corresponding COG distribution in relevant archaeal genomes. Finally, aEF-2 and aEF-2p genes in Thorarchaeota OWC Bin 2,3 and 5 were identified using HMMER: version 3.1b2, hmmsearch —cut-tc (Eddy 2011) against PFAM models PF00679 (EF-G C) and PF03764 (EFG IV). Conserved synteny surrounding the Thorarchaoeta aEF-2p gene was used to further search for partial aEF-2p genes. In addition, all contigs with matching HMM hits to dph2 and dph5 in the full OWC assembly were manually examined for potential Thorarchaeal dph genes; none were identified.
57


Phylogenetic analyses
Elongation factor 2: EF-2 and EF-2 paralogs of Asgard archaea, Koarchaeota and Bathyarchaeota were aligned with a representative set of archaeal, bacterial EF-2 and eukaryotic EF-2, EFL1 and snRNP homologs using mafft-linsi (Katoh and Standley 2013). Subsequently, poorly aligned ends were removed manually before the alignments were trimmed with trimAl 5% (Capella-Gutierrez et al. 2009), yielding 871 aligned amino acid positions. Maximum likelihood analyses were performed using IQ-tree using the mixture model LG+C60+R4+F, which was selected among the C-series models based on its Bayesian information criterion score by the built-in model test implemented in IQ-tree. Branch supports were assessed using ultrafast bootstrap approximation as well as with single branch test (-alrt option).
Diphthamide biosynthesis proteins Dphl/Dph2 (IPR016435; arCOG04112) and Dph5 (IPR004551; arCOG04161): Both Dphl and Dph2 as well as Dph5 homologs of a representative set of eukaryotes were aligned with archaeal Dphl/2 and Dph5 homologs, respectively. Several DP ANN genomes contain two genes encoding the CTD and NTD of Dphl/2 (Fig. 3.1, Supplementary File S3.1) such that Dphl/2 homologs of these organisms had to be concatenated prior to aligning Dphl/2 sequences. Alignments were performed using mafft-linsi and trimmed with BMGE (Criscuolo and Gribaldo 2010) using the blossum 30 matrix and setting the entropy to 0.55. This resulted in final alignments of 170 (Dphl/2) and 221 (Dph5). Maximum likelihood analyses were performed using IQ-tree (Nguyen et al. 2015) with the mixture models resulting in the lowest BIC: LG+C50+R+F (Dphl/2) and LG+C60+R+F (Dph5), respectively. Branch supports were assessed using ultrafast bootstrap approximation (Hoang et al. 2018) as well as with the single branch test (-alrt flag).
58


Concatenated ribosomal proteins: A phylogenetic tree of co-localized ribosomal proteins was performed using the rpl5 pipeline as described previously (Zaremba-Niedzwiedzka et al. 2017). In brief, archaeal ribosomal proteins encoded in the r-protein gene cluster (requiring a minimum of 11 ribosomal proteins) were aligned with mafft-linsi, trimmed with trimAl using the -gappyout option, concatenated and subjected to maximum likelihood analyses using IQ-tree with the LG+C60+R4+F model chosen based on best BIC score as described above. Branch supports were assessed using ultrafast bootstrap approximation as well as with the single branch test (-alrt option) in IQTREE.
Structural modeling of EF-2 homologs.
Structural models of a/eEF-2 genes and paralogs were generated using the i-Tasser standalone package version 5.1 (Yang et al. 2015), and visualized and analyzed using UCSF Chimera version 1.11.12 (Pettersen et al. 2004). The best structural hits to the PDB for each sequence's top-scoring model were identified using COFACTOR (Roy, Yang, and Zhang 2012). The Drosophila melanogaster eEF-2 structure in complex with the ribosome (PDB:4V6W) was used as a structural reference to which all models were superimposed (aligned) using Chimera's MatchMaker.
Loop motif logos of EF-2 homologs
e/aEF-2 and paralog sequences which were used to generate the EF-2 tree were clustered at 90% amino acid identity using CD-HIT: version 4.6, -c 0.9 -n 5 (Fu et al. 2012) and the sequence alignment was filtered to retain only cluster centroids. The conserved loop sequences were extracted from the filtered EF-2 alignment using Jalview version 2.10.1 (Waterhouse et al. 2009), verified by cross-referencing to the structural models, and sequence
59


logos generated on cluster centroids only using WebLogo: version 2.8.2 (weblogo.berkeley.edu) (Crooks et al. 2004).
Accession Numbers
Taxonomy and accession numbers for all genes analyzed in this study are listed in Supplementary File S3.1.
Results
Most As gard archaea, Korarchaeota and Geoarchaea as well as parabasalids lack diphthamide synthesis genes
It was previously assumed that EF-2 of all eukaryotes and Archaea was uniquely characterized by the presence of diphthamide. To examine if this assumption is still valid when taking into account recently sequenced genomes, we surveyed 337 archaeal and 168 eukaryotic genomes (File SI) for each of the three known archaeal (De Crecy-Lagard et al.) and seven eukaryotic (Su, Chen, et al. 2012; Su, Lin, et al. 2012; Uthman et al. 2013) dph genes. While most archaeal genomes encode clear dph homologues, we failed to detect the diphthamide biosynthesis genes in a large diversity of metagenome-assembled genomes (MAGs) of uncultured archaea, including newly assembled MAGs analyzed for this study (Fig. 3.1, Supplementary Fig. S3.1, Supplementary File S3.1). In particular, our analyses showed that, as reported for A. cryptophilum (De Crecy-Lagard et al.; Elkins et al. 2008), all Korarchaeota and Geoarchaea as well as nearly all members of the Asgard archaea lack the conserved archaeal diphthamide biosynthesis genes dphl/2, dph5 and dph6. As an exception, Asgard archaea related to the Heimdallarchaeote LC3 clade were found to encode the complete archaeal diphthamide biosynthetic pathway (Fig. 3.1). Genes coding forDph5 and Dph6 could not be detected in two Bathyarchaeota draft genomes (RBG_13_46_16b and
60


SG8 32 3). However, it is unclear whether these two genomes are in the process of losing dph biosynthesis genes or whether the absence of dph5 and dph6 genes is due to the incompleteness of these draft genomes. We also surveyed 168 eukaryotic genomes and high-quality transcriptomes, including those lineages that have undergone drastic genome reduction, such as microsporidians (Corradi et al. 2010), diplomonads (Morrison et al. 2007), and degenerate nuclei (i.e., nucleomorphs) of secondary plastids in cryptophytes (Lane et al. 2007) (Supplementary File S3.1) for dph gene homologs. We detected dph homologues in all eukaryotic genomes and transcriptomes except for parabasalid protists, including animal pathogens such as Trichomonas vaginalis, Tritrichomonas foetus and Dientamoeba fragilis (Supplementary File S3.1). Unless these archaea and parabasalids possess alternative, yet undiscovered diphthamide biosynthesis pathways, these findings suggest that their cognate EF-2 lacks the modified diphthamide residue. As a peculiarity, while the Dphl/2 protein is encoded by a single fusion gene in seemingly all archaea, we found that in several members of the DP ANN archaea (Rinke et al. 2013; Castelle et al. 2015) this protein is encoded by two genes that separately code for the N- and C-terminal domains. To our knowledge, this is the first systematic report of the widespread absence of diphthamide biosynthesis in diverse eukaryotes and archaea.
61


G
.........................' 'Qf-
£
HZ
HZ
â–  Opisthokonta (189) ...Apusozoa (1)
— Amoebozoa (5) Excavata (13)
— Parabasalia (6*) SAR (29)
Archaeplastida (25)
HE
HE
— Heimdallarchaeota LC3
— Heimdallarchaeota
.....Thorarchaeota
..... Lokiarchaeota (12)
.... Odinarchaeota (1)
â– Verstraetearchaeota (5)
.....Crenarchaeota (50)
..... Geoarchaeota (10)
......Korarchaeota (6)
....Bathyarchaeota (11)
......Aigarchaeota (4)
...Thaumarchaeota (14)
.....Euryarchaeota (157)
............. DPANN (49)
.......... Bacteria
(2)
(3)
(9)
• O •••
• o ##o oo ooo
oo ooo
• o ••o
00 3)00
oo ooo oo o® o oo ooo
• o ##o
• o ##o oo ooo oo ooo
o
o
o
o
o
o
o
o
o
o
oo ooo
• o
• o
• o
• o
oo
• o
• o
• o
oo
o#
o#
o#
• o
• o
• o
CO
CO
• o
• o
• o
• o
oo
• >= 50% genomes O < 50% genomes O No homologue O unique DPH motif
nfo Jb rA (~\V ^
rO rO rOrO ,o
G G G G
XT
Figure 3.1 - Diphthamide biosynthesis genes are conserved across most eukaryotic and archaeal lineages. Eukaryotic and archaeal orthologues of diphthamide biosynthesis (DPH) genes were retrieved from the publicly available EGGNOG and an in-house archaeal orthologues (arCOG) datasets. Complete list of genomes surveyed can be found in Supplementary File SI including reduced genomes from nucleomorphs (not shown on figure). Total number of genomes surveyed are shown next to each group. Since Dph4 is a member of the large DNAJ-containing protein family, we could not unequivocally identify this protein based on orthology alone and is therefore excluded from the figure. No arCOG available for DPH3. *A11 eukaryotic genomes are complete except five deeply-sequenced transcriptomes from Parabasalia; dark and light grey circles indicate whether homologues were detected in more or less than 50% of the genomes surveyed respectively; yellow circles indicate the absence of a detectable homologue; pink circles indicate lack of conservation of the diphthamide modification motif; half-circles indicate the presence of multiple copies of EF-2 with and without the conserved diphthamide modification motif. 1 - Homologue detected in the original assembly (ABR_125(Zaremba-Niedzwiedzka et al., 2017)) but not in the reassembly (ABR16 genome); a closer inspection of the contig revealed that it is chimeric and will thus be removed from the final bin; 2 - Homologue detected in only one Lokiarchaeota assembly (AB15); 3 - Several DPANN genomes contain two proteins that encode the CTD and NTD of Dphl/2, respectively.


Various archaeal genomes that lack diphthamide biosynthesis genes encode an EF-2 paralog To shed light into the implications of the potential lack of diphthamide in members of the Asgard archaea and Korarchaeota, we performed detailed analyses of eukaryotic and archaeal EF-2 homologs (Fig. 3.1). First, we found that the draft genomes of most Asgard archaea, some Korarchaeota (Kor 1 and 3), and a few Bathyarchaeota encode two distantly related EF-2 paralogs. In contrast, the genomes of K. cyptophilum and two novel marine Korarchaeota (Kor 2 and 4) and Heimdallarchaeota LC2 and LC3 as well as Geoarchaea do not encode an EF-2 paralog. Given that the Heimdallarchaeota LC2 genome was estimated to be only 70-79 % complete (Zaremba-Niedzwiedzka et al. 2017), and based on phylogenetic analyses (see below), we consider it possible that this genome might encode an as-yet unassembled aEF-2 paralog. The presence of paralogous aEF-2 in most Asgard archaea and some Korarchaeota genomes corresponds with the absence of diphthamide synthesis genes (Fig. 3.1 and 3.2). Yet, even though the genomes of K cryptophilum, Kor 2, Kor 4, and Geoarchaea as well as of Heimdallarchaeote LC2 lack dph genes, they do not encode an EF-2 paralog. In all other archaeal genomes, including that of Heimdallarchaeote LC3, the absence of an EF-2 paralog correlates with the presence of dph genes.
63


j*Cc
eukaryotic EF2 incl. Parabasalids
eukaryotic Rial (EFL1) eukaryotic Snu114p
e LC_3
136_19 Heimdallarchaeote CS Cren-/Verstraete-/Geoarchaeota Aig-/ Thaumarchaeota KYH37041 Ca. Bathyarchaeota archaeon B24 Bathyarchaeota
-------KON31906.1 miscellaneous Crenarchaeota group-1 archaeon SG8-32-3
n.c.
P-C-
— OGD49652.1 Ca. Bathyarchaeota archaeon RBG_13_46_16b KON29577 miscellaneous Crenarchaeota group 15 archaeonDG_45 YP_001736560 Ca. Korarchaeum cryptofilum OPF8 Korarchaeote LHC_01286 Korarchaeote bin2_00704 Korarchaeote bin4_00485
r Lokiarchaeote ABR11_18460 Lokiarchaeote ABR13_10980 Lokiarchaeote ABR08_09930 Lokiarchaeote ABR02_20700 Lokiarchaeote ABR04_02460 Lokiarchaeote ABR05_04650 Lokiarchaeote ABR06_25720 KKK44407 Lokiarchaeum sp. GC14_75 Lokiarchaeote ABR03_12650 — Lokiarchaeote ABR15_20820 Lokiarchaeote ABR01_20000 OLS15932 Lokiarchaeota archaeon CR_4
•LK
- k;
4#
KXH72312 Thorarchaeota archaeon SMTZ1_45 Thorarchaeote OWC_Bin2_01519
- KXH71555 Thorarchaeota archaeon SMTZ1 83 Thorarchaeote OWC_Bin3_01564 Thorarchaeote OWC_Bin5_00744
OLS17158 Odinarchaeota archaeon LCB_4 OLS33283 Heimdallarchaeota archaeon_AB_125 OLS22544 Heimdallarchaeota archaeon_LC_2
— Korarchaeote bin1_00187
— Korarchaeote bin3_01297
r Lokiarchaeote ABR08_05420 Jl Lokiarchaeote ABR11_09000
— Lokiarchaeote ABR04_33160 L Lokiarchaeote ABR02_00250
I— Lokiarchaeote ABR15_25140 i- Lokiarchaeote ABR06_17420 J- Lokiarchaeote ABR05_09740 S'— Lokiarchaeote ABR03_25750a ^ KKK45003 Lokiarchaeum sp. GC14_75a
----Lokiarchaeote ABR01_14510
---------Lokiarchaeote ABR14_07490

- OLS17734 Odinarchaeota archaeon LCB_4
. OLS30427 Thorarchaeota archaeon AB_25 l Thorarchaeote ABR09_06640
I_KXH73341 Thorarchaeota archaeon SMTZ1_45
.----Thorarchaeote Bin2_01874
1-------Thorarchaeote Bin5_00525
— Thorarchaeote Bin3_02499a
— OLS33013 Heimdallarchaeota archaeon AB_125
— Korarchaeote bin3_00670
— Korarchaeote bin1_00548
|— OGD49094.1 Ca. Bathyarchaeota archaeon RBG_13_46_16b I— KON34131.1 miscellaneous Crenarchaeota group-6 archaeon AD8-1 j Hadesarchaea/ MSBL1 archaea DPANN/ Euryarchaeota





HRG • I
HRN O eukarV°tlcEF2
HRG •
QYN
KRE
HRG O
HTS HTS HTS HTS HTS HTS HTS HTQ HTQ HTQ HTQ HTG HTG HTG HTG HTG HTG HTG HTG HTG HTG HTG ENA ENA ENA EDA ENA ENI ENV n.d. n.d. ESS FE-N— ETS ETS ETS ETS ETS n.d. -LR RLS LRG
HRG
HRG
eukaryotic EF2 paralogs
put. archaeal EF2 (aEF2)
put. archaeal EF2 (aEF2)
put. archaeal EF2 paralogs (aEF2p)
put. archaeal EF2 (aEF2)
put. aEF2/aEF2p put. aEF2 put. aEF2p
• 100/100
•90-99.9/100 or 100/90-99
oQO-QQ Q/QO-QQ
o80-89i9/80-1 00 or 80-100/80-89
Figure 3.2
64


Figure 3.2 - The evolution of archaeal EF-2 family proteins. Phylogenetic tree of EF-2 family proteins based on maximum likelihood analyses of 871 aligned positions using IQ-tree. EF-2 of Bathyarchaeota grouping in an unexpected position or representing potential aEF-2p are shaded in orange. aEF-2 of Kor- and Asgard archaea are shaded in purple, while their aEF-2p are shaded in green. Highlighted amino acids show the conservation of key residues and black/white circles reveal the presence/absence of dph biosynthesis genes in the respective organisms/MAGs. Branch support values are based on ultrafast bootstrap approximation as well as single branch tests, respectively and are represented by differentially colored circles as detailed in the figure panel. Whenever branch support values were below 80 for any of the two methods, values have been removed and branches cannot be considered significantly supported.
Scale bar indicates the number of substitutions per site. Abr.; snRNP: U5 small nuclear ribonucleoprotein EFL1: elongation factor-like GTPase ; n.c.: not conserved; p.c.: partially conserved; n.d.: not determined.
65


Archaea with two EF-2 family proteins encode only one bona fide EF-2
We next addressed whether residues and structural motifs shown to be necessary for canonical translocation were conserved in the various EF-2 and EF-2 paralogs. Domain IV of EF-2, representing the anticodon mimicry domain, is critical for facilitating concerted translocation of tRNA and mRNA (Rodnina et al. 1997; Ortiz et al. 2006). This domain includes three loops that extend out from the body of EF-2 and interact with the decoding center of the ribosome. The first of these three loops (HxDxxHRG) (canonical residue positions are numbered according to sequence associated with D. melanogaster structural model PDB 4V6W (Anger et al. 2013)) contains the site of the diphthamide modified histidine, H701, and is highly conserved across archaea and eukaryotes (Ortiz et al. 2006; Y. Zhang et al. 2008). High conservation is also seen in a second adjacent loop (SPHKHN) in the a/eEF-2 domain IV (S581-N586), which contains a lysine residue (K584) that interacts directly with the tRNA at the decoding center, and is itself positioned by a stacking interaction between P582 and H585 (Murray et al. 2016). The third loop appears to stabilize the diphthamide loop, partially via a salt-bridge formed between a nearby glutamate residue (E660) and R702 in the diphthamide loop (Anger et al. 2013). Both of these residues are highly conserved among archaea and eukaryotes.
Our analyses reveal that the sequence motifs in these loops are also strictly conserved among the EF-2 family proteins of the Heimdallarchaeote LC3 lineage, Geoarchaea, as well as in those Korarchaeota and Bathyarchaeota that lack an EF-2 paralog (Fig. 3.3, Supplementary Fig. S3.2a). Notably, this conservation is seen irrespective of the presence or absence of dph genes in those genomes. However, most bona fide EF-2 of parabasalids (which lack dph genes), possesses a glycine to asparagine mutation at residue 703 (Fig. 3.3,
66


Supplementary Fig. S3.2b, Supplementary Fig. S3.3a), which may compensate for the lack of the diphthamide residue by contributing an amide group (Fig. 3.3, Supplementary Fig.
S3.3b).
In contrast, in those Asgard archaea and Korarchaeota (Kor 1/3 clade) that encode two EF-2 family proteins, even within the bona fide EF-2 copy, these domain IV motifs show reduced conservation. In the diphthamide loop, R702 is universally replaced by a threonine residue. In 21 of 22 aEF-2 proteins, there is a correlated mutation of E660 to either arginine or lysine (Supplementary Fig. S3.4). Structural homology modeling suggested that these correlated mutations likely prevent unfavorable electrostatic interactions between domain IV loops, and maintain stabilization of the diphthamide loop (Supplementary Fig. S3.4). While G703 is conserved in most EF-2s of archaea, all Lokiarchaeota (except Lokiarchaeota CR_4), encode either a serine or a glutamine at this site (Fig. 3.3, Supplementary Fig. S3.2a). Furthermore, analysis of the second loop (S581-N586) revealed additional crucial mutations in the EF-2 of these archaea; notably, K584 is not conserved (Fig. 3.3, Supplementary Fig.
S3.2a). Despite these modifications which correlate with the presence of an EF-2 paralog in these archaea, there is still evidence for strong selection pressure maintaining many of the key conserved residues in these domain IV motifs, including H701, the target site of diphthamide modification (Fig. 3.3, Supplementary Fig. S3.2a).
In contrast, our analyses of the multiple sequence alignment and structural models suggest that the paralogous EF-2 (aEF-2p) proteins encoded by these archaea lack conservation in the stabilizing second loop (SPHKHN) as well as the first diphthamide loop (HxDxxHRG), including H701 (Fig. 3.3). Based on predicted fold conservation in domains I and II, and the overall conservation of the five sequence motifs (G1-G5) characterizing
67


GTPase superfamily proteins (Atkinson 2015), aEF-2p likely maintains GTPase activity (Supplementary Fig. S3.5). However, given the apparent lack of conservation in key domain IV loops, it is unlikely that aEF-2p proteins can serve as functional translocases in protein translation.
DPH+ Eukaryota canonical eEF-2
DPH+ Archaea canonical aEF-2
DPFT Asgard and Korarcheaum putative EF-2
DPFI" Asgard and Korarchaeum EF-2 paralog
D.melanogaster eEF-2 Hadesarchaea DG-33 aEF-2
PDB :4v6w:Az
diphthamide loop ^ diphthamide loop ^
£ H-DAiHRG ii HED^HRG
CDK-recognition loop CDK-recognition loop
& SfJNIKHN SF NkHN
Thorarchaeota OWC-2
diphthamide loop
^H^DpaHtg
CDK-recognition loop
jSsN2HN
Thorarchaeota OWC-2 EF-2 paralog
V499
Figure 3.3 - Predicted structure of Asgard archaea EF-2 and EF-2 paralogs
Structural modeling of representative EF-2 genes and paralogs compared to eukaryotic EF-2 structure shows conservation of overall EF-2 structure regardless of diphthamide synthesis capacity (top). The overall fold of two loops located at the tip of domain IV is conserved, but otherwise highly conserved sequence motifs in these loops are not conserved in DPH' Asgard archaea and Korarchaea or in EF-2 paralogs (middle). Bottom panels show a close-up of the key residues from the motifs, highlighting that these residues are those positioned at the tip of the domain IV loops crucial for interaction with the decoding site in canonical EF-2 structures. Histidine residue that is the site of dph modification is starred.
68


EF-2 homologs of archaea experienced complex evolutionary history
To resolve the evolutionary history of EF-2, we performed phylogenetic analyses of archaeal EF-2 (aEF-2) and aEF-2p, bacterial EF-G and eukaryotic EF-2 family proteins, i.e. EF-2, Rial (or Elongation factor like, EFL1) and Snul 14 (or U5 small nuclear ribonucleoprotein, snRNP/ U5-116kD) (Fig. 2) (Atkinson 2015). First, our analyses revealed that sequences from all non-LC3 Asgard archaea and the Kor-1 and -3 marine Korarchaeota formed two distinct clades, one of which contains canonical aEF-2 proteins (as defined by conservation of the domain IV loop known to interact with the ribosomal decoding center during translocation) while the other cluster comprises aEF-2p (Fig. 3.2). However, the phylogenetic placement of these protein clades relative to each other and within the phylogenetic backbone is not fully resolved due to lack of statistical support. This might be caused by modified (accelerated) evolutionary rates that appear to characterize the evolution of aEF-2 and aEF-2p in lineages that encode a paralog, as indicated by increased relative branch lengths for both the aEF-2 and aEF-2p clades (Fig. 3.2, Supplementary Files S3.2 and S3.3).
Secondly, bathyarchaeal EF-2 homologs were also found to form two separate clades. One of these clades is placed within the TACK superphylum, and includes both canonical bathyarchaeal EF-2s as well as potential paralogs (i.e., RBG_13_46_16b and SG8-32-3). In contrast, the second clade is only comprised of two sequences (i.e., RBG_13_46_16b and AD8-1), and is placed as a sister group of all TACK, Asgard and eukaryotic EF-2 homologs (Fig. 3.2). In spite of this deep placement in the phylogenetic analyses, the second clade is comprised of the canonical EF-2 homologs of Bathyarchaeota genomes RBG_13_46_16b and AD8-1, based on analysis of key domain IV residues. Currently, only the most complete
69


of the latter two draft genomes, RBG_13_46_16b, contains an aEF-2 paralog. Therefore, the current data is insufficient to resolve the puzzling pattern of EF-2 evolution in the Bathyarchaeota phylum.
Finally, in our analysis, eEF-2, Rial and Snul 14 were found to form a highly supported monophyletic group that emerged as a sister group to the aEF-2 proteins encoded by the genomes comprising the Heimdallarchaeote LC3 clade (LC3 and B3).
Close inspection of the EF-2 sequence alignment revealed that eukaryotic and LC3 EF-2 homologs share common indels to the exclusion of all other archaeal EF-2 family protein sequences (Supplementary Fig. S3.6, Supplementary Fig. S3.7). Notably, these highly conserved indels were found to be encoded by the genomic bins of two distantly related members of the Heimdallarchaeota LC3 lineage, which were independently assembled and binned from geographically distinct metagenomes (Spang et al. 2015; Zaremba-Niedzwiedzka et al. 2017). This refutes recently raised claims stating that these indels in Heimdallarchaeote LC3 may be the results of contamination from eukaryotes (Da Cunha et al. 2017) while supporting the sister-relationship of eukaryotes and Asgard archaea (Spang et al. 2015; Erne et al. 2017; Zaremba-Niedzwiedzka et al. 2017; Spang et al. 2018). In addition, despite the low sequence identity of 39%, the high-confidence modeled structure of Heimdallarchaeote LC3 EF-2 was highly similar to Drosophila melanogaster eEF-2 (RMSD (root-mean-square deviation) 1.3 A across all 796 residues to D. melanogaster structural model PDB 4V6W (Anger et al. 2013); Supplementary File S3.1). By comparison, the Heimdallarchaeaote AB-125 model aligns less confidently to the Drosophila EF-2 structure (RMSD 16.4A). The observed phylogenetic topology and the presence of the full complement of dph biosynthesis genes in LC3 genomes (Figs. 3.1 and 3.2), support an
70


evolutionary scenario in which Heimdallarchaeote LC3 and eukaryotes share a common ancestry with EF-2 being vertically inherited from this archaeal ancestor.
Discussion
The use of metagenomic approaches has led to an expansion of genomic data from a large diversity of previously unknown archaeal and bacterial lineages and has changed our perception of the tree of life, microbial metabolic diversity and evolution, as well as the origin of eukaryotes (Brown et al. 2015; Castelle et al. 2015; Spang et al. 2015; Hug, Baker, et al. 2016; Parks et al. 2017; Zaremba-Niedzwiedzka et al. 2017). Since most of what is known about archaeal informational processing machineries is based on a few model organisms, we aimed to use the expansion of genomic data to investigate key elements of the translational machinery - EF-2 and diphthamidylation - across the tree of life.
Our analyses of archaeal EF-2 family proteins and the distribution of diphthamide biosynthesis genes have revealed unusual features of the core translation machinery in several archaeal lineages. These findings negate two long-held assumptions regarding the archaeal and eukaryotic translation machineries, with both functional and evolutionary implications. First, we show that diphthamide modification is not universally conserved across Archaea and eukaryotes. Second, we demonstrate that, much like Bacteria and eukaryotes (Atkinson 2015), the archaeal EF-2 protein family has undergone several gene duplication events, presumably coupled to functional differentiation of EF-2 paralogs, throughout archaeal evolution.
The evolution of archaeal diphthamide biosynthesis and EF-2 is especially intriguing in the context of eukaryogenesis. Recent findings based on comparative genomics indicate that eukaryotes evolved from a symbiosis between an alphaproteobacterium with an archaeal
71


host that shares a most recent common ancestor with extant members of the Asgard archaea, possibly a Heimdallarchaeota-related lineage (Spang et al. 2015; Zaremba-Niedzwiedzka et al. 2017). Our study adds additional data to support this scenario by revealing close sequence and predicted structural similarity of canonical EF-2 proteins of the Heimdallarchaeote LC3 lineage and eukaryotic EF-2 proteins, including shared indels. Furthermore, phylogenetic analyses of EF-2 family proteins reveals that EF-2 of the Heimdallarchaeote LC3 lineage forms a monophyletic group with EF-2 family proteins of eukaryotes, and therefore suggests that the archaeal ancestor of eukaryotes was equipped with an EF-2 protein similar to the homologs found in this lineage. The subsequent evolution of the eukaryotic EF-2 family appears to have included at least two ancient duplication events leading to Rial and Snul 14. Importantly, the presence of characteristic eukaryotic indels in EF-2 of all members of the Heimdallarchaeote LC3 lineage further strengthens this hypothesis and underlines that concerns raised about the quality of these genomic bins (Da Cunha et al. 2017) are unjustified (Spang et al. 2018).
In addition, the LC3 clade also represents the sole group within the Asgard archaea that is characterized by the presence of the full complement of archaeal diphthamide biosynthesis pathway genes. However, while phylogenetic analyses of Dphl/2 show weak support for a sister-relationship between Heimdallarchaeota and eukaryotes, eukaryotic Dph5 appears to be most closely related to homologs of Woesearchaeaota (Supplementary Fig.
S3.8, Supplementary File S3.3), an archaeal lineage belonging to the proposed DP ANN superphylum (Rinke et al. 2013; Castelle et al. 2015; Williams et al. 2017), comprising various additional lineages with putative symbiotic and/or parasitic members (reviewed in Spang et al. (Spang, Caceres, and Ettema 2017)). Notably, a previous study has also revealed
72


an affiliation of some eukaryotic tRNA synthetases with DP ANN archaea (Furukawa et al. 2017). Given that several DP ANN lineages infect or closely associate with other archaeal lineages, they may exchange genes with their hosts frequently, as was shown for Nanoarchaeum equitans and its crenarchaeal host Ignicoccus hospitalis (Podar et al. 2008). Following a similar reasoning, the archaeal ancestor of eukaryotes (i.e. a relative of the Asgard archaea) may have acquired genes (e.g. dph5) from an ancestral DPANN/Woesearchaeota symbiont. However, prospective analyses and generation of genomic data from additional members of the Asgard and DP ANN archaea are necessary to test this hypothesis and to clarify the evolutionary history of the origin of diphthamide biosynthesis genes in eukaryotes.
Furthermore, our findings have practical implications for studies that involve phylogenetic and metagenomic analyses. Previously, EF-2 has been widely used as a phylogenetic marker, in both single-gene (Iwabe et al. 1989; Baldauf, Palmer, and Doolittle 1996; Hashimoto and Hasegawa 1996; Elkins et al. 2008), and multiple-gene alignments of universal single copy genes [(Williams et al. 2012; Guy, Saw, and Ettema 2014; Raymann, Brochier-Armanet, and Gribaldo 2015), and others] to assess the relationships between Archaea, Bacteria and eukaryotes. However, the presence of paralogs of EF-2 in various Archaea and eukaryotes suggest that EF-2 should be excluded from such datasets. In addition, EF-2, Dphl/2, and Dph5 are part of single-copy marker gene sets regularly used to estimate genome completeness and purity of archaeal metagenomic bins (Wu and Scott 2012; Parks et al. 2015). The presence of duplicated aEF-2 gene families, the absence of dph genes in most Asgard archaea, Geoarchaea and Korarchaeota, and the presence of two split genes
73


for Dphl/2 in DP ANN makes these genes unsuited as marker genes, and should hence be excluded from marker gene sets used to assess genome completeness.
The observed absence of dph biosynthesis genes in various Archaea as well as parabasalids is surprising given that diphthamide was previously thought to be a conserved feature across Archaea and eukaryotes (Schaffrath et al. 2014), and critical for ensuring translational fidelity (Ortiz et al. 2006). While we currently cannot rule out the possibility that 6//;/?-lacking archaea and parabasalids perform the multi-step process of diphthamidylation using a set of yet-unknown enzymes, future proteomics studies will be needed to conclusively rule out the presence of diphthamide in these taxa. Yet, it is more likely that these groups have evolved a different mechanism or mechanisms to fulfill the proposed roles of diphthamide in translation.
Many of the rZ/i/r-lacking archaeal genomes encode two paralogs of the aEF-2 gene. Despite the apparent absence of diphthamide, our sequence and structural modeling analyses imply that these dipthamide-deficient aEF-2 proteins are likely under strong selective pressure to maintain translocase function. In contrast, analyses of the aEF-2p suggest that, while this paralog is a member of the translational GTPase superfamily, aEF-2p is unlikely to function in the same manner as canonical aEF-2. In fact, the complete lack of sequence conservation in aEF-2p key domain IV loop residues indicates that these paralogs are not likely to act as translocases (Fig. 3.3, Supplementary Fig. S3.2a) (Rodnina et al. 1997; Ortiz et al. 2006) and instead perform alternative roles. For instance, it seems possible that aEF-2p may compensate for the absence of diphthamide in at least some ^//-lacking lineages. However, other functions for aEF-2p such as error-correcting back-translocation or ribosome recycling also seem possible, given the observed sub- and neo-functionalizations seen in
74


eukaryotic and bacterial EF-2/EF-G paralogs (Qin et al. 2006; Tsuboi et al. 2009). Alternatively, given proposed regulation of translation via ADP-ribosylation of diphthamide (Schaffrath et al. 2014) and a role of diphthamide in responding to oxidative stress (Arguelles et al. 2013; Arguelles et al. 2014), aEF-2p could perform another, yet unknown role in translation regulation.
Currently, the consequences for the absence of dph biosynthesis genes in parabasalids and in several Archaea remain unclear. Future studies could gain insight into such questions by studying translation in the genetically tractable parabasalid Trichomonas vaginalis, whose cell biology and metabolism has been extensively studied. In addition, acquisition of additional sequencing data or enrichment cultures from members of the Asgard superphylum, Korarchaeota, and other novel archaeal lineages will lead to a better understanding of the evolution and function of EF-2 family proteins, and the absence of dph biosynthesis genes. Acknowledgements
We thank Jordan Angle, Kay Stefanik, Rebecca Daly, and Kelly Wrighton for assistance with sampling of OWC sediments, and Felix Homa for computational support. Sequencing of OWC metagenomes was conducted in part by the U.S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility that is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. Sequencing of Aarhus bay metagenomes was performed by the National Genomics Infrastructure sequencing platforms at the Science for Life Laboratory at Uppsala University, a national infrastructure supported by the Swedish Research Council (VR-RFI) and the Knut and Alice Wallenberg Foundation. We thank the Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX) at Uppsala University and the Swedish
75


National Infrastructure for Computing (SNIC) at the PDC Center for High-Performance Computing for providing computational resources. This work was supported by grants of the European Research Council (ERC Starting grant 31003 9-PUZZLE CELL), the Swedish Foundation for Strategic Research (SSF-FFL5) and the Swedish Research Council (VR grant 2015-04959) to T.J.G.E.. C.W.S. is supported by a European Molecular Biology Organisation long-term fellowship (ALTF-997-2015) and the Natural Sciences and Engineering Research Council of Canada postdoctoral research fellowship (PDF-487174-2016).
Author Contributions
Adrienne Narrowe, Anja Spang, Christopher Miller, and Thijs Ettema designed the bioinformatics and computational experiments. Adrienne Narrowe, Anja Spang, Courtney Stairs, Eva Caceres, and Brett Baker conducted phylogenetic analyses. Adrienne Narrowe, Anja Spang, Courtney Stairs, and Eva Caceres conducted EF2 sequence analyses. Adrienne Narrowe and Christopher Miller performed protein structural analyses. Adrienne Narrowe, Anja Spang, Courtney Stairs, and Eva Caceres conducted sequence analyses. Adrienne Narrowe, Anja Spang, and Christopher Miller wrote the manuscript contained in this chapter, with contributions from Courtney Stairs, Eva Caceres, and Thijs Ettema. The final manuscript was read and approved by all co-authors.
76


CHAPTER IV
BATHYARCHAEOTA POPULATIONS IN WETLAND SOILS CONTAIN PREVIOUSLY UNKNOWN METABOLIC COMPLEXITY3
Introduction
Initially described in marine sediments as the Miscellaneous Crenarchaeota Group (MCG) (Inagaki et al. 2003), it has become clear that the recently renamed Bathyarchaeota may be the most abundant and broadly distributed archaeal phylum globally (Biddle et al. 2006; Sorensen and Teske 2006; Meng et al. 2009; Kubo et al. 2012; Lloyd et al. 2013; Meng et al. 2014; Fillol, Sanchez-Melsio, et al. 2015; Fillol, Auguet, et al. 2015; Lazar et al. 2015; Xiang et al. 2017). Marker gene studies have shown that the Bathyarchaeota are broadly distributed across a variety of habitats, including marine sediments (Inagaki et al. 2003; Biddle et al. 2006; Sorensen and Teske 2006; Lloyd et al. 2013; He et al. 2016; Yu et al. 2017), estuarine sediments (Lazar et al. 2015; Lazar et al. 2016), wetland soils (Narrowe et al. 2017), aquifers (Anantharaman et al. 2016; Jewell et al. 2017), coal bed methane wells (Evans et al. 2015) and hot springs (McKay et al. 2017).
Despite the global distribution, Bathyarchaeota thus far remain poorly characterized at the genomic level relative to their broad global distribution. While 16S rRNA gene trees suggest up to 22 family-level subgroups within this phylum (Kubo et al. 2012; Meng et al. 2014; Lazar et al. 2015; McKay et al. 2017; Xiang et al. 2017), there are as yet no isolate cultures or genomes, and only 39 partial to near-complete Bathyarchaeota genomes found in
3 This work was a collaborative project with the following authors: Adrienne B. Narrowe, Lindsey M. Solden, Jordan C. Angle, Rebecca A. Daly, Mikaya A. Borton, Kelly C. Wrighton, Christopher S. Miller. Full author contributions are listed at the end of the chapter.
77


public databases. Of those 39, the ones including a 16S rRNA gene that can be assigned to a subgroup represent only 8 of the 22 subgroups.
Functional gene predictions from the partial to near-complete Bathyarchaeota genomes suggest that the Bathyarchaeota metabolisms are as diverse as their phylogenetic and habitat distribution. Described partial genomes indicate evidence for heterotrophy (Biddle et al. 2006; Seyler, McGuinness, and Kerkhof 2014) including the potential for degradation of extracellular proteins (Lloyd et al. 2013), aromatic compounds (Meng et al. 2014), and complex carbohydrates (Lazar et al. 2016), as well as autotrophic assimilation of carbon (Evans et al. 2015; He et al. 2016; Lazar et al. 2017). Multiple lines of genomic evidence now support a role for some Bathyarchaeota as sources of acetate, either via fermentation (Lazar et al. 2016) or as homoacetogens (He et al. 2016), and it appears that many Bathyarchaeota are capable of both pathways. On the basis of pathway analysis in two partially assembled genomes, some Bathyarchaeota may be methanogens (Evans et al. 2015). Respiration via reduction of protons may be possible for subgroup 1, but no other respiratory processes have been conclusively inferred for this phylum (Lazar et al. 2016). The large-scale metabolic inferences are thus far based on sampling from only three environments: coal bed methane wells (Evans et al. 2015), marine sediments (He et al. 2016), and estuarine sediments (Lazar et al. 2016), and Bathyarchaeota genomic potential in freshwater environments has yet to be fully characterized (Jewell et al. 2017). Across and within these habitats, biogeographic distributions have suggested subgroup-specific habitat preferences, likely linked to distinct metabolic features of the different Bathyarchaeota subgroups (Fillol, Auguet, et al. 2015; Fillol, Sanchez-Melsio, et al. 2015; Lazar et al. 2015; Xiang et al. 2017). However, such a linkage between genome-inferred metabolism and habitat preference has
78


been suggested for only 4 partial genomes from a single site/habitat type to date (Lazar et al. 2016).
Our previous work on archaeal community diversity in a methane-emitting freshwater wetland identified that Bathyarchaeota comprised up to 1/3 of all archaeal 16S rRNA gene sequences (Chapter II, Figures 2.3 and 2.5; Narrowe et al. 2017). Sampling across a wetland hydrological gradient and down a soil-core depth gradient provided clear evidence for subgroup-specific habitat preferences linked to hydrologic and geochemical features. To explore the genomic determinants of these preferences in a freshwater habitat, and to genomically characterize previously unsampled Bathyarchaeota subgroups, we performed shotgun metagenomic sequencing of 4 samples predicted to harbor abundant, diverse Bathyarchaeota genomes from our prior 16S rRNA gene study. With these data, we link Bathyarchaeota genome bins to their distributions across the physical and geochemical gradients in the wetland, confirm Bathyarchaeota core metabolic features, and expand the range of predicted metabolic potential for this phylum.
Methods and Materials Sampling and DNA extraction
Freshwater wetland soil cores were collected from Old Woman Creek, Huron OH, USA (OWC) in October 2013, and total DNA extracted as described previously (Chapter II; Narrowe et al. 2017). Based on the high abundance of Bathyarchaeota and other taxa of interest as identified using 16S rRNA amplicon sequencing (Chapter II; Narrowe et al. 2017), 4 samples were chosen for shotgun metagenomic sequencing (M3-C4-D3, M3-C4-D4, 03-C3-D3, 03-C3-D4) with the goal of producing metagenome-assembled genomes.
79


Library preparation and Sequencing
Library preparation and five lanes of Illumina HiSeq 2x125 bp sequencing followed standard operating procedures at the US DOE Joint Genome Institute (GOLD study ID GsOl 14821). Sample M3-C4-D3 had replicate extraction, library preparation, and two lanes of sequencing performed, and reads were combined before quality trimming and assembly. For 3 additional samples (M3-C4-D4, 03-C3-D3, 03-C3-D4) one lane of sequencing was performed. This study also made use of read coverage profiles from 4 additional lower-read-coverage samples (M3-C5-D1, M3-C5-D2, M3-C5-D3, M3-C5-D4). For these samples, DNA was sheared to 300bp with a Covaris S220, metagenomic sequencing libraries were prepared using the Nugen Ovation Ultralow Prep kit, and all four samples were multiplexed on one lane of Illumina HiSeq 2x125 sequencing at the University of Colorado Denver Anschutz Medical Campus Genomics and Microarray Core.
Genome Assembly and Binning
The five full-lane sequencing runs (2X M3-C4-D3, M3-C4-D4, 03-C3-D3, and 03-C3-D4) were initially assembled following JGI standard protocols as follows. Adapter removal, read filtering and trimming were completed using BBDuk (sourceforge.net/projects/bbmap) ktrim=r, minlen=40, minlenfraction=0.6,
mink=ll tbo, tpe k=23, hdist= 1 hdist2= 1 ftm=5, maq=8, maxns=l, minlen=40, minlenfraction=0.6, k=27, hdist= 1, trimq=12, qtrim=rl. Filtered reads were then assembled using megahit (Li et al., 2015) version 1.0.6 with —k-list 23,43,63,83,103,123.
To generate coverage profiles for use in binning, all 9 read sets were mapped to final scaffolds from each assembly using seal (v. June 27, 2016;
https://sourceforge.net/projects/bbmap/) (ambiguous=random threads=8 interleaved=f
80


prealloc=t nzo=false speed=12 minkmerhits=l k=27). Each assembly was binned individually using the coverage profiles as input to MetaBAT (v. 2.12.1) (—cvExt) (Kang et al. 2015). Bin completion and contamination was estimated using CheckM (Parks et al.
2015) and for those bins predicted as archaeal, single copy marker genes (SCGs) were predicted using Amphora2 (Wu and Scott 2012). For the archaeal bins, the set of predicted SCGs were searched against EiniRef90 (rel. 112017) using BLASTP (evalue le-10)(Altschul 2008). The consensus taxonomy of the best blast hit for each bin's SGCs was used to identity genome bins that were likely bathyarchaeotal.
Functional annotation and bin OC
Gene predictions and annotations for each assembly were performed using the JGI Microbial Genome Annotation Pipeline (Huntemann et al. 2015); and for each putative bathyarchaeotal bin, the genes corresponding to the binned scaffolds were extracted and a homology search performed via BLASTP (v. 2.7.1+) (Altschul 2008) against Einiref90 (release 112017). Bins were additionally annotated using GhostKOALA (genus_prokaryotes + family_eukaryotes + viruses) (Kanehisa, Sato, and Morishima 2016) and Interproscan (v.5.24-63.0) (-iprlookup -goterms -pa) (Jones et al. 2014).
To conservatively filter bins, any scaffold not containing at least one protein with a top hit to a UnirefPO sequence annotated as "Bathyarchaeota", "Crenarchaeota", or 'MCG' was flagged for removal from the bin. This approach risks discarding legitimate bathyarchaeotal scaffolds with protein families that have not yet been detected in the limited existing bathyarchaeotal genomes, especially for short scaffolds. To guard against discarding true genomic novelty, we made the assumption that a scaffold with legitimate but only novel proteins in one genome bin might have close homologs in a scaffold from a phylogenetically
81


related bin, and that this second scaffold might contain more informative proteins with reliable homology to existing Bathyarchaeota genomes. Thus, we next checked all scaffolds flagged for removal against a database of predicted proteins retained after the previous filtering step, from bins assigned as belonging to the same subgroup (see below for subgroup assignment). Any scaffolds containing at least 2 BLASTP hits to a retained scaffold from another bin, of at least 90% amino acid identity over 100 amino acids, were restored to the bin. Scaffolds containing genes from key metabolisms discussed below were also manually examined to verify their likely Bathyarchaeota origin. Final bin quality was estimated using CheckM (Parks et al. 2015), which provides a measure of bin completeness and contamination based upon the presence and number of conserved single-copy marker genes in the genome bins. Metabolic pathways were visualized using the KEGG Mapper-Reconstruct Pathway Webserver (Aoki-Kinoshita and Kanehisa 2007)
Identification and validation of population clusters
Bin subgroup clusters (roughly corresponding to bins with the same Bathyarchaeota subgroup) were identified using several metrics. FastANI (Jain et al. 2017) was used for an all vs. all comparison of bin-to-bin average nucleotide identity (—minFrag 20). Bin population clusters were further defined using phylogenetic markers including 16S rRNA genes and ribosomal S3 protein genes. For bins that contained multiple copies of any marker gene, the bin was retained if the phylogenetic assignment of all copies agreed at the subgroup level. Where the ANI analysis suggested linkage across subgroup bin clusters, which was inconsistent with other bin pairs within the population bin clusters, we manually examined the linkages. For the linked bins: 03D4_binl53 (N=316 scaffolds) and 03D3_bin_323 (N=441 scaffolds) there were only 51-54 BLASTN hits greater than lOOnt in length
82


satisfying the -evalue le-10 parameter. While many of the proteins encoded on these scaffolds are highly conserved, the extremely high ANI across these scaffolds suggest that they were likely misbinned in either the 03D3 or the 03D4 assembly and binning. We conservatively removed these scaffolds from both bins to avoid making metabolic inferences based on scaffolds that we could not assign with confidence.
Ribosomal S3 gene phylogeny and subgroup assignment
Ribosomal S3 genes were identified in bins and in all publicly available Bathyarchaeota genomes/bins using hmmsearch (—cut tc) (Eddy 2011) against PFAM00189. Sequences were aligned using mafft (L-INS-i) (Katoh et al. 2002), and the alignments trimmed using trimAl (-gappyout) (Capella-Gutierrez et al. 2009). Maximum likelihood gene phylogenies were generated from the trimmed alignments using IQ-TREE using the best model chosen by BIC (LG+F+R5), branch supports were estimated using ETFBoot (—m MFP -bb 1000 -alrt 1000) (Nguyen et al. 2015; Hoang et al. 2018), and trees were visualized using iTOL (Letunic and Bork 2007).
Identification of putative Bathyarchaeota1 mcrABG genes
To identify possible Bathyarchaeotal mcrABG genes, all predicted protein coding sequences in the assemblies were searched (BLASTP, evalue le-10) against a database containing all mcrABG gene sequences from Candidatus Bathyarchaeota BA1 and BA2 (Evans et al. 2015) and Candidatus Syntrophoarchaeum sp. (Laso-Perez et al. 2016) as both these groups encode mcrA genes that are divergent from Euryarchaeotal mcrA genes. Hits greater than 50% amino acid identity and at least 150 amino acids in length were searched against GenBank to exclude sequences with higher identity to known euryarchaeotal methanogen mcrABG sequences. For each of mcrA, mcrB, and mcrG, the remaining putative
83


Bathyarchaeotal sequences were combined with a set of reference sequences including those of BA1, BA2, and Ca. Syntrophoarchaeum sp., and, in the case of the mcrA gene, with all mcrA sequences from all assemblies, and were aligned using mafft (L-INS-i) (Katoh et al. 2002), and the alignments trimmed using trimAl (-gappyout) (Capella-Gutierrez et al. 2009). Maximum likelihood gene phylogenies were generated from the trimmed alignments using IQ-TREE using the best model chosen by BIC (mcrA: LG+F+R5; mcrB: LG+F+R6; mcrG: LG+R4), branch supports were estimated using UFBoot (-m MFP -bb 1000 -alrt 1000) (Nguyen et al. 2015; Hoang et al. 2018), and trees were visualized using iTOL (Letunic and Bork 2007).
Results and Discussion
Recovery of multiple Bathyarchaeota bins
We recovered 28 partial to near-complete Bathyarchaeota genome bins from the metagenomic assembly of 4 wetland soil samples (depths 13-35cm.). An additional 2 bins were identified from the assembly of 2 shallow depth wetland soil samples (0-12cm) (Table 4.1). These bins ranged from 23-96% complete, and most bins have less than 20% estimated contamination. However, based on the strict filtering criteria we applied to the bins, this estimated contamination likely represents co-binning of contigs from closely related species, and bins with greater than 20% estimated contamination are considered composite bins of closely related genomes within their respective subgroups. For example O3D4-bin201 appears to contain 2 closely related group 5b genomes.
Overall, these genome bins range in size from ~0.24Mbp to 3.5Mbp, with estimated complete genome sizes (calculated from bin length and estimated completeness) ranging from ~0.7Mbp to 2.6Mbp, largely agreeing with previously reported Bathyarchaeota genome
84


sizes (Evans et al. 2015; He et al. 2016; Lazar et al. 2016). Interestingly, 3 of 5 group 15/17 bins have estimated genomes sizes less than IMbp. For two of these bins, O3D3_Bin_340 and C4D4_Bin_16 (estimated sizes 708-805kbp), the nearest reference genome, Candidatus Bathyarchaeota archaeon RBG 16-48-13 is also estimated to be 84% complete with a bin size of only 0.8Mbp. However, other genomes in groups 15 and 17 are larger, approaching 1.5-2Mbp. As more genomes become available, it will become clear if there is in fact a subset of Bathyarchaeota with reduced genomes.
Table 4.1 - Bathyarchaeota genome bin metrics
Subgroup 6
C4D3vl_Bin.79 C4D4_Bin.42 03D4_Bin.l79 03D3_Bin.323 C4D3vl_Bin.l91 C4D3v2_Bin.l07
Completeness 71.32 66.16 71.48 78.74 33.25 49.6
Contamination 25 20.85 16.67 23.54 0 18.69
# contigs 344 394 351 395 138 318
N50 4302 4220 5525 5117 3963 4212
mean length 4296 4195 5072 4865 3901 4201
total length 1477662 1652650 1780108 1921636 538345 1335966
longest contig 34212 16119 20713 24618 12099 20720
C4D4_Bin.lll MUD_14- 15_Bin.7 C4D3vl_Bin.l93 03D3_Bin.l5 C4D3vl_Bin.l25 C4D3v2_Bin.78
Completeness 55.47 80.4 64.49 52.54 32.03 60.14
Contamination 1.46 17.6 5.61 6.61 6.15 3.74
# contigs 173 333 268 200 185 180
N50 3954 6538 4430 3963 2944 4414
mean length 4036 5761 4123 3912 2407 4394
total length 698217 1918558 1104908 782416 445263 790891
longest contig 15457 26800 12971 9771 10932 13699
03D3_Bin.225 03D4_Bin.7 03D3_Bin.88 OPEN_15_Bin.29 C4D3v2_Bin.212
Completeness 30.06 72.59 72.43 48.86 65.42
Contamination 0 12.37 16.82 10.59 10.28
# contigs 85 308 277 185 343
N50 3841 4861 4531 3736 4092
mean length 3650 4701 4466 3841 4174
total length 310261 1448028 1237180 710512 1431513
longest contig 6862 19053 15929 13429 16722
85


Table 4.1 (continued) - Bathyarchaeota genome bin metrics
Subgroup 5b
03D3_Bin.37 03D3_Bin.l96 03D4_Bin.l47 O3D4_Bin.201 03D4_Bin.226
Completeness 39.36 68.84 80.15 96.26 70.97
Contamination 5.61 33.33 57.24 129.91 41.34
# contigs 160 494 458 730 280
N50 3836 3657 3985 4877 3701
mean length 3904 3725 4072 4748 3760
total length 624563 1840066 1864936 3466169 1052777
longest contig 11474 17579 19385 34912 9645
Subgroups 15 and 17
C4D4_Bin.l21 03D3_Bin.l53 C4D4_Bin.l6 O3D3_Bin.340 03D4_Bin.9
Completeness 48.29 25.5 36.24 23.3 66.08
Contamination 0.31 0.93 0 0.97 0.93
# contigs 86 107 81 61 264
N50 4118 4125 3579 3723 4317
mean length 3974 3995 3604 3816 4181
total length 341731 427496 291902 232750 1103847
longest contig 11550 13202 7594 8461 14353
Subgroup 11
03D3_Bin.l68 03D4_Bin.24 03D4_Bin.l53
Completeness 80.37 44.48 39.13
Contamination 19.91 0.93 1.4
# contigs 327 190 264
N50 4658 4941 3719
mean length 4577 4669 3843
total length 1496814 887078 1014428
longest contig 17705 16488 11912
86


Phylogenetic placement of bins
The Old Woman Creek (OWC) Bathyarchaeota genome bins dramatically increase the genomic sampling of several Bathyarchaeota subgroups, and are the first Bathyarchaeota genomes described from freshwater wetland soils. To phylogenetically place our genome bins in the context of previously described Bathyarchaeota genomes, we constructed a phylogenetic tree using the small ribosomal subunit protein S3 gene, a known single-copy phylogenetic marker gene. (Figure 4.1a) While based on only a single marker gene, the phylogeny is well supported and recapitulates previously described relationships among the reference genome bins (He et al. 2016). Analysis of partial and full assembled SSU 16S rRNA genes in the OWC bins and reference genomes indicates that the clades identified in the S3 phylogeny are congruent with 16S rRNA-based bathyarchaeotal subgroup designations (Kubo et al. 2012; Meng et al. 2014; Fillol, Sanchez-Melsio, et al. 2015; Lazar et al. 2015; Xiang et al. 2017). The Bathyarchaeota genome bins we present here include three genome bins within subgroup 11, a subgroup with no prior genomic representatives; five genome bins from within group 5 (5a, 5b, 5bb); and 14 genome bins within subgroup 6, a group previously represented by only a single genome bin (Lazar et al. 2016). An additional five genome bins are placed among representatives from groups 15 and 17; however, these bins and most nearby genomes lack 16S rRNA genes, and have relatively long branch lengths, suggesting that additional genomic representatives from this part of the Bathyarchaeota phylum will be needed to more precisely place these genomes (Table 4.1, Figure 4.1a).
87


Figure 4.1
A
Candidatus_Bathyarchaeota_archaeon_B23
-------miscellaneous Crenarchaeota group-15 archaeon DG-45
-Candidatus Bathyarchaeota archaeon RBG 13 52 12 )3D4 bin 9 scaffoldl0804 88
Candidatus_Bathyarchaeota_archaeon_RBG_13_60_20 Candidatus_Bathyarchaeota_archaeon_RBG_16_57_9 -Candidatus Bathyarchaeota archaeon RBG 16 48 13 ___r------C4D4 bin 16 scaffold21218 70
03D3 bin 340 scaffold81672 18
------------------Candidatus Bathyarchaeota archaeon RBG 13 38 9
--------miscellaneous Crenarchaeota group archaeon SMTZ-80
Candidatus Bathyarchaeota archaeon UBA185 ----------------------C4D4 bin 121 scaffold26008 2
1------------------03D3 bin 153 scaffold53249 70
— Candidatus Bathyarchaeota archaeon B24
-----Candidatus_Bathyarchaeota_archaeon_B25
I Candidatus Bathyarchaeota archaeon JdFR-10 I Candidatus Bathyarchaeota archaeon JdFR-11
----Candidatus Bathyarchaeota archaeon ex4484 135
I Candidatus Bathyarchaeota archaeon B26-1
'---Candidatus Bathyarchaeota archaeon B26-2
-Candidatus Bathyarchaeota archaeon BA1
___I Candidatus Bathyarchaeota archaeon BA2
'Candidatus Bathyarchaeota archaeon UBA589
-----03D4 bin 153 scaffold47334 23
______i03D3 bin 168 scaffold08057 131
'03D4 bin 24 scaffold05949 12 Candidatus Bathyarchaeota archaeon UBA233 Candidatus_Bathyarchaeota_archaeon_JdFR-06 Candidatus_Bathyarchaeota_archaeon_JdFR-08
I--- Candidatus Bathyarchaeota archaeon CG07
—03D4 bin 226 scaffoldl09242 6
Candidatus Bathyarchaeota archaeon ZAV-13

Candidatus Bathyarchaeota archaeon RBG 13 46 16b 03D4 bin 201 scaffold04537 123
16S rRNA detected
★ in public genome / bin it this study
Tree scale 0.1 I---------1
-03D3 bin 196 scaffoldll953 150 -03D3 bin 37 scaffold24627 93 -03D4 bin 147 scaffold06767 52 03D4 bin 226 scaffold46752 17 -03D4 bin 201 scaffold42758 38
_i---Candidatus Bathyarchaeota archaeon CG 4 8 14 3
miscellaneous Crenarchaeota group-1 archaeon SG8-32-3 — miscellaneous Crenarchaeota group-6 archaeon AD8-1 C4D4 bin 42 scaffoldl6218 63 C4D3 vl bin 79 scaffold55352 41
r03D3 bin 323 scaffold55397 53 '03D4 bin 179 scaffold23715 41 -C4D3 vl bin 191 scaffold43627 10 C4D3 v2 bin 107 scaffold40400 50 |C4D3 vl bin 193 scaffold66098 46
C4D4 bin 111 scaffold92304 32 03D3 bin 15 scaffoldl37270 41 03D3 bin 88 scaffold 125028 12 C4D3 vl bin 125 scaffold26471 34 C4D3 v2 bin 78 scaffoldl08944 20 03D3 bin 225 scaffold67320 13 03D3 bin 88 scaffold05354 130 03D4 bin 7 scaffold63050 31
I----UJU,
---------------------
^ _ |C4D3 vl bin 79 scaffold28457 53
'C4D3 v2 t
> bin 212 scaffold40204 52
00
00
+* ++* * * + * *>+++♦ *>++
B
★
★
★
Within-sample relative abundance (%)
Transect 3 0.01 0.1 1 10%
Mud Water covered
Depth ^ SB
II ii Subgroup 6
II ii Subgroup 15
II ii ii Subgroup 7/17 Subgroup 5b
II ii Subgroup 11
subgroup 11
subgroup 5b
subgroup 6


Figure 4.1 - Expanded genomic sampling of multiple Bathyarchaeota subgroups. A)
Maximum likelihood phylogeny of bathyarchaeotal ribosomal S3 genes. Black labels indicate previously available partial-to-near-complete genomes. Colored labels indicate genome bins from this study, and labels indicate sample origin. Orange: mud flat samples; Blue: water covered samples. Darker shade indicates increased soil depth. The lightest shade denotes two additional bins from -0-12cm sediments. Red and black stars indicate the presence of a 16S rRNA gene in bins from this study and previously reported genomes respectively. Subgroup identification is based in part on placement of 16S rRNA sequences or best BLASTN hits within the ARB Silva (vl32) guide tree. Ribosomal S3 phylogeny and placement of 16S rRNA sequences are in agreement for the subgroups shown. Closed circles on branches indicate IQ-TREE ultrafast bootstrap support >95%. B) Expected relative abundance of Bathyarchaeota subgroups in wetland soil samples based on 16S rRNA gene relative abundance (adapted from Narrowe et al. 2017). Subgroup/sample distribution of genome bins agrees with predictions from 16S rRNA census. Metagenomic bins were recovered from samples where subgroups were predicted to be abundant.
89


Predicted subgroup habitat preferences are supported by metagenomic analysis
The intra-wetland bathyarchaeotal subgroup-level habitat preferences we identified previously (as discussed in Chapter II) (Narrowe et al. 2017) are supported by the distribution of recovered genome bins (Figure 4. lb). In our previous analysis of 16S rRNA genes across the wetland, we identified that 16SrRNA genes from Bathyarchaeota subgroup 6 were found in all samples, with a slight increase in abundance in shallow soils. In this study, we recovered subgroup 6 genome bins from all metagenomic samples, consistent with their wetland-wide distribution. In contrast, 16S rRNA genes from subgroups 5b and 11 were found almost exclusively in soils from the Open water 3 site, increasing in abundance with soil depth (Figure 2.5) (Narrowe et al. 2017). Consistent with those 16S-amplicon-measured distributions, the Bathyarchaeota subgroup 11 and subgroup 5b genome bins identified here arise only from the Open Water 3 site samples and were not reconstructed from the mud flat (Figure 4.1). This supports our initial observation that while subgroups 5b and 11 were declared indicator taxa from freshwater sediments (Fillol, Auguet, et al. 2015), within our freshwater wetland these two groups display a more restricted range, which may be used to further resolve the habitat preferences of these groups (Chapter II; Narrowe et al. 2017). Two additional deeply branching bin pairs likely belonging to subgroups 15 and 17 suggest a common, but as yet undetermined, habitat feature in Mud flat 3 - depth 4 and Open water 3-depth 3 as these pairs are represented from each of those samples. The presence of these two groups also agrees with expectations based on our 16S rRNA gene analysis, which predicted the presence of these subgroups across the wetland (Figure 2.5, Figure 4. lb).
90


AN I analysis to validate bin clusters:
The presence of multiple closely related genomes is known to complicate metagenomic assembly, resulting in shorter fragmented assembly for some members of the community (Sharon et al. 2012; Howe et al. 2014; Hug, Thomas, et al. 2016). In order to guard against the possibility of using a misbinned contig containing a marker gene to assign a bin within the wrong subgroup, we also performed an all-versus-all average nucleotide identity (ANI) analysis including all wetland Bathyarchaeota bins and all publicly available Bathyarchaeota genomes/bins shown in the ribosomal S3 tree (Figure 4.2). ANI has been shown to have a discontinuous distribution, with ANI values at or above 95% indicating genomes from the same species, and more distantly related genomes presenting ANI less than 83% (Jain et al. 2017). Thus, our expectation was that genomes within a subgroup would have more similar ANI than to outgroup bins. Our results confirmed the 16S rRNA-and S3 gene-based group assignments and indicated that all OWC genome bins are properly placed within each subgroup. Bins in subgroup 6 have ANI linkages only to other bins in this group, many with species-level ANI values (>95%). ANI linkages between subgroups 5 and 11 were found among 4 bins; however, the maximum across-group ANI was 83% and involved only 20 of 392 scaffolds, as compared to the subgroup 11 intra-subgroup species-level link of 97.5% which involved 115 of 342 scaffolds (Table 2). These findings are consistent with the subgroup-level relationship among the bins defined by phylogenetic marker genes, and allow us to use the combined genomic information from these partial genome bins to represent the metabolic potential from within these subgroups.
91


B23 DG-45 RBG_13_52_12 03D4_bin.9 RBG_13_60_20 RBG_16_57_9 JdFR-10 JdFR-11 B26-1 B26-2 BA1 BA2 UBAS89 03D3_bin.168 03D4_bin.24 JdFR-06 JdFR-08 CG07 03D4_bin.226 RBG_13_46_16b 03D4_bin.201 03D3_bin.l96 03D4_bin.147 CG
C4D4_bin.42 C4D3_v1_bin.79 03D3_bin.323 03D4_bin.l 79 C4D3_vl_bin.l91 C4D3_v2_bin.107 C4D3_v1_bin.193 Mud_2014-15_bin.7 C4D4_bin.111 C4D3_v2_bin.78 03D3_bin.88 03D4_bin.7 Open_2015_08_bin.29 C4D3_v2_bin.212
Figure 4.2 - Average nucleotide identity (ANI) confirms bin placement within subgroups. All vs. all comparison of bin ANI shows higher identity within than across subgroups. Bin names and subgroup placement are as in Figure 4.1. Only bin with reported matches to other bins are shown. ANI corresponds to line thickness. Highest (100%) and lowest (76%) ANI are indicated to show scale.
mcrABG analysis
The recent discoveries of putative methanogenic Bathyarchaeota (Evans et al. 2015), and Verstraetearchaeota (Van wont erghem et al. 2016), have challenged the longstanding canon of methanogenesis being possible only within the Euryarchaeota. With our previous findings of Verstraetearchaeota and the abundance and richness of Bathyarchaeota within the methane-emitting Old Woman Creek wetland, we asked if the OWC Bathyarchaeota might
92


be methanogens. The methyl-coenzyme M reductase subunits alpha, beta, and gamma (mcrABG) genes are the hallmark genes for methanogenesis. In addition to their presence in coal bed methane wells (Candidatus Bathyarchaeota archaeon BA1 and BA2) (Evans et al. 2015), bathyarchaeotal mcrA amplicon sequences were also reported from sediments in Yellowstone hot springs (McKay et al. 2017). The BA1 and BA2 genomes belong within Bathyarchaeota subgroups 3 and 8 (Evans et al. 2015). The Yellowstone mcrA amplicon sequences could not be assigned to subgroups, but occurred in sediments in which groups 2, 6, 15, 20, 10, and 14 were abundant based on 16S rRNA gene amplicon sequencing paired to the mcrA gene sequencing (McKay et al. 2017). Within our metagenomic assembly, which includes subgroups 5b, 6, 11, 15, and 17, we identified 9 putative bathyarchaeotal mcrA gene sequences, 7 bathyarchaeotal mcrB sequences, and four bathyarchaeotal mcrG sequences. These mcrABG sequences were found in both Open Water 3 soil samples {mcrABG) and also in the Mud 3, Depth 4 sample {mcrA only). Despite attempts at reassembly, the contigs containing these sequences remained short and thus we were not able to assign them to any of the specific Bathyarchaeota genome bins. However, the OWC bathyarchaeotal mcrABG sequences are most closely related to those of Ca. Bathyarchaeota BA1 and BA2 (Evans et al. 2015) and branch near the mcrABG sequences from butane-utilizing Candidatus Syntrophoarchaeum sp. (Laso-Perez et al. 2016)(Figure 4.3, Figure S4.1, Figure S4.2). The long branch length separating the Ca. Syntrophoarchaeum and Ca. Bathyarchaeota BA1 and BA2 mcrA sequences from the euryarchaeotal mcrA sequences was suggested to be the result of sequence and structural divergence that permits these mcrA to accommodate larger alkanes, in particular butane (Laso-Perez et al. 2016). In the case of Ca. Syntrophoarchaeum, the mcrA protein subunit has been shown to activate butane for oxidation, analogous to
93


methane oxidation by mcrA for methane oxidation via reverse methanogenesis (Laso-Perez et al. 2016). While we were not able to assign them to genome bins, the presence of these genes across multiple samples in this wetland indicates that the potential for methane production (or possibly butane oxidation) by Bathyarchaeota likely extends more broadly across the phylum than known to date. More critically, this suggests that additional methanogen diversity may be present and yet unaccounted for in our understanding of wetlands methane emissions.
Metabolic potential of Bathyarchaeota carbon cycling in wetland soils
To date, inferences of metabolic potential for Bathyarchaeota have been made based on only 13 genome bins representing only 8 of the approximately 22 bathyarchaeotal subgroups and reflecting only 3 environments: coal bed methane wells (Evans et al. 2015), marine sediments (He et al. 2016), and brackish estuarine sediments (Lazar et al. 2016). Thus far, all described Bathyarchaeota genomes encode components of the Wood-Ljungdahl pathway (WLP). This pathway, found in both the bacteria and archaea, is also known as the acetyl-CoA pathway can be used for carbon fixation or can be used oxidatively (Borrel, Adam, and Gribaldo 2016; Chistoserdova 2016; Adam, Borrel, and Gribaldo 2018). In the archaea, this pathway is found in and often associated with methanogens; however, with the discovery and description of additional archaeal genomes, it has become clear that this pathway is also found in non-methanogenic archaea, and is even absent in the case of the methanogenic Candidatus Methanomassiliicoccus sp. (Borrel, Adam, and Gribaldo 2016).
94


mcrA
Figure 4.3
95
Tree scale: 0.1


Full Text

PAGE 1

GENOME ENABLED RESOLUTION OF ARCHAEAL DIVERSITY IN METHANE CYCLING ! WETLANDS by ADRIENNE BETH NARROWE B.A., Northwestern University, 1991 B.S., Wright State University, 2009 M.S., University of Colorado Denver, 2012 A dissertation submitted to the Faculty of the Graduate School of the University of Colorado in partial fulfillment of the requirements for the degree of Doctor of Philosophy Integrative and Systems Biology Program 2018

PAGE 2

! "" ! This dissertation for the Doctor of Philosophy degree by Adrienne Beth Narrowe has been approved for the Integrative and Systems Biology Program by Michael J. Greene, Chair Christopher S. Miller, Advisor Kateri n a Kechris Catherine Lozupone Annika C. Mosier Timberley M. Roane Date: May 12, 2018

PAGE 3

! """ ! Narrowe, Adrienne Beth (Ph.D., Integrative and Systems Biology Program ) Genome Enabled Resolution Of Archaeal Diversity In Methane Cycling Wetlands Dissertation directed by Assistant Professor Christopher S. Miller ABSTRACT As the only known organism s that can produce methane, archaea are key players in the global methane cycle. D espite their critical role, archaea are oft en understudied compared to bacteria. With recent advances in DNA sequencing, we are discovering entire archaeal phyla, new members within known archaeal groups, and unexpected metabolic potential within the archaea l domain . Temperate freshwater wetlands are among the environments needing additional study of the archaeal community. Such mid latitude, naturally occurring freshwater wetlands are estimated to be large contributors to global methane cycling, but the composition and function of their microbial communities are not well incorporated in global models of methane emissions. Here, using multiple methods of high throughput DNA sequencing , we describe the microbial community structure and functional potential in Old Woman Creek (OH, USA), a model freshwater wetland, finding g enomic, metabolic, and habitat diversity within and among archaeal groups that would not have been predicted based on previous knowledge. First, we developed a new domain specific sequencing protocol, producing a more highly resolved archaeal community pr ofile than was achievable with existing methods. This deep view of the archaeal community demonstrated that highly diverse assemblages of methane cycling and non methane cycling archaea are present across the wetland; with habitat distributions that sugge st environmentally defined niches. Next, using metagenomic sequencing to reconstruct archaeal genomes from the wetland soils, we identified multiple

PAGE 4

! "# ! phylogenetically conserved evolutionary anomalies in the complement of fundamental information processing genes among the recently described Asgard archaea. Finally, further genomic reconstructions identified metabolic features among the Bathyarchaeota, which correspond to their intra population habitat specificity we identified in the initial community profi le. Metabolic reconstructions highlight broad potential for Bathyarchaeota to play key roles in facilitating methane production by providing substrate to the methanogenic archaea, and, surprisingly, members of this group may themselves contribute directly to methane cycling in this system. These findings offer additional insight into the evolution, diversity, and function of multiple archaeal groups within a model freshwater wetland, and provide the foundation for further study to better understand factors controlling methane cycling. The form and content of this abstract are approved. I recommend its publication. Approved: Christopher S. Miller

PAGE 5

! # ! TABLE OF CONTENTS CHAPTER I. INTRODUCTION ................................ ................................ ................................ ........ 1 II. HIGH RESOLUTION SEQUENCIN G REVEALS UNEXPLORED ARCHAEAL DIVERSITY IN FRESHWA TER WETLAND SOILS ................................ .............. 13 III. COMPLEX EVOLUTIONARY HISTORY OF TRANSLATI ON ELONGATION FACTOR 2 AND DIPHTHA MIDE BIOSYNTHESIS IN ARCHAEA AND PARABASALIDS ................................ ................................ ................................ ...... 50 IV. BATHYARCHAEOTA POPUL ATIONS IN WETLAND S OILS CONTAIN PREVIOUSLY UNKNOWN M ETABOLIC COMPLEXITY ................................ .. 77 V. CONCLUSIONS AND FUTU RE DIRECTIONS ................................ ................... 106 REFERENCES ................................ ................................ ................................ ..................... 109 APPENDIX A. Supplementary Material For Chapter II ................................ ................................ .... 133 B. Supplementary Material For Chapter III ................................ ................................ ... 141 C. Supplementary Material For Chapter IV ................................ ................................ .. 154

PAGE 6

! $ ! CHAPTER I INTRODUCTION Marker gene sequencing based exploration of microbial diversity Understanding the composition of a microbial community in the environment is a first step to developing hypotheses regarding the community's role in biogeochemical cycling and the controls on that activity. Currently, this can be best achieved through the use of environmental DNA sequencing. Sequencing based approaches to environmental microbiology, from marker gene sequencing to metagenomics, have in recent years greatly improved our view of bacterial and archaeal diversity in terrestrial, aquatic, and h ost associated systems, and vast swaths of the tree of life are just now being discovered (Pace 1997; Ley et al. 2006; Wrighton e t al. 2012; Brown et al. 2015; Castelle et al. 2015; Probst and Moissl Eichinger 2015; Spang et al. 2015; Anantharaman et al. 2016; Hug, Baker, et al. 2016; Seitz et al. 2016; Zaremba Niedzwiedzka et al. 2017) . The finding that the sequence of the sma ll ribosomal subunit 16S rRNA gene could be used as a molecular phylogenetic marker to describe the relationships among microorganisms ushered in a new way of understanding the diversity of prokaryotic life on earth, and this gene was used to define the Archaea as a domain of life distinct from the Bacteria (Woese and Fox 1977) . Polymerase chain reaction (PCR) based approaches followed, which allowed for the amplification and subsequent sequencing of these gene sequences directly from environmental DNA samples (Lane et al. 1985; Pace 1997) . It became apparent that the number of cultured and culturable microorganisms was a minute fraction of the microbial dive rsity found in any system (Staley and Konopka 1985) . As improvements in DNA sequencing technology balanced massive increases in throughput with

PAGE 7

! % ! decreases in read length, accuracy, and cost, it became possible to sequence multiple samples concurrently, and at higher sequencing depth per sample (Caporaso et al. 2011; Caporaso et al. 2012) . Greater sequencing depth makes lower abundance sequences more likely to b e detected, and the concurrent increases in sample number opened the door to the adaptation of classical ecological metrics to microbial community analyses (Lozupone and Knight 2007; Caporaso et al. 2010; Lozupone et al. 2011; Martiny et al. 2011; McMurdie and Holmes 2014) . With these increases in scale, the known diversity in a multitude of environments exploded (Schloss and Handelsman 2004; Lozupone and Knight 2007; Huttenhower et al. 2012; Schloss et al. 2016) , and the 16S rRNA public database SILVA (release 132) no w contains almost 700,000 unique high quality sequences (Quast et al. 2013) . While freed from the biases associated with laboratory culturing, a new set of biases applies to 16S rRNA based amplicon studies (Engelbrektson et al. 2010; Degnan and Ochman 2012; Klindworth et al. 2013; Karst et al. 2018) . The 16 S rRNA is a critical component of the small ribosomal subunit, and as such contains conserved structural elements, which is reflected in regions of especially conserved primary sequence (Woese et al. 1975) . The PCR amplification of 16S rRNA gene sequences relies on the use of these highly conserved sequences as priming sites for 'universal' primer sequences (Lane et al. 1985) , while the intervening, less conserved sequence is used for phylogenetic analysis. In principle, the highly conserved sequences would be truly universally conserved across archaea and bacteria (both of which contain variants of the 16S rRNA gene.) However, these conserved regions are not 100% conserved, and even small variations can impact PCR primer binding and efficiency, rendering certain 16S rRNA genes either invisible or underrepresented in amplicon studies, skewing community composition and abundance

PAGE 8

! & ! estimates (Baker, Smith, and Cowan 2003; Frank et al. 2008; Teske and S¿rensen 2008; Miller et al. 2011; Degnan and Ochman 2012; Pinto and Raskin 2012; Klindworth et al. 2013; Karst et al. 2018) . As 'universal' PCR primer design is based on sequence databases, this bias itself becomes amplified as primer design reflects that which has already b een identified (Klindworth et al. 2013) . In addition to amplification bias, accurate relative abundance estimates are difficult to estimate using the 16S rRNA gene, because it is not present as a single copy in many bacterial and in some archaeal genomes (Case et al. 2007; Kembel et al. 2012; Sun et al. 2013; Stoddard et al. 2015) . In particular, the bacteria typically contain many more copies of the 16S rRNA gene than the archaea, so in environ ments where the bacteria may already outnumber the archaea (Bates et al. 2011) , archaeal 16S rRNA genes may be difficult to detect in amplicon based 16S rRNA marker gene s tudies. Despite these limitations, the broad application of 16S rRNA amplicon sequencing to study the distribution of marker gene sequences over time and within and across environments has greatly expanded our view of microbial diversity, microbial communi ty assembly, and correlation of taxa with environmental factors. Metagenomics for discovery of functional and phylogenetic diversity 16S rRNA amplicon sequencing data cannot describe microbial community function, which is the ultimate goal in understandi ng the interactions of microorganisms with each other and the environment. Thorough characterization of the metabolic role of a microorganism traditionally entails biochemical assays performed on pure culture, coupled with sequencing and annotation of the organism's isolate genome (BrŠuer et al . 2011) . However, while most environmental microorganisms remain uncultured, their function can be inferred from genomic analyses. As compared to amplicon sequencing, which targets a

PAGE 9

! ' ! single gene, or shotgun genome sequencing of a single genome (Venter et al. 2001) , shotgun metagenomic sequencing is t he random fragmentation and sequencing of the total DNA from an environment (Tyson et al. 2004; Venter et al. 2004) . Analysis of shotgun metagenomic data can be gene centric, inferring the metabolic capacity of a system by cataloging the encoded genes in the entire community (Tringe et al. 2005) , or genome centric, where individual or population genomes are reconstructed (T yson et al. 2004) and metabolisms described on the organism level rather than on the community level. In both cases, metagenomic assembly and homology based gene annotations make possible the inference of unculturable microbial function (Tyson et al. 2004; Baker et al. 2010; Hess et al. 2011; Iverson et al. 2012; Wrighton et al. 2012) . In addition to metabolic insight, shotgun metagenomics also offered additional, sometimes surprising phylogenetic insights. Use of 16S rRNA sequencing based studies had reve aled large sections of the microbial world that were known only from that sequencing approach. Then as these metagenomic assembly and binning methods improved, the limitations of amplicon based surveys of microbial diversity became more apparent and it be came clear that there are entire sections of the tree of life that were also hidden from view and which we now know only from shotgun metagenomic sequencing (Hug, Baker, et al. 2016) . The recent discoveries of entire bacterial and archaeal phyla and superphyla remind us how much may yet remain to be discovered (Wrighton e t al. 2012; Brown et al. 2015; Castelle et al. 2015; Spang et al. 2015; Eloe Fadrosh et al. 2016; Seitz et al. 2016; Zaremba Niedzwiedzka et al. 2017) . That this phylogenetic diversity also encompasses additional functional diversity becomes more appar ent with each new discovery (Wrighton et al. 2012; Kantor et al. 2013; Evans et al. 2015; Liu et al. 2018) .

PAGE 10

! ( ! Large scale functional inference; linking phylogeny and function Ma ny advances have been made using genome centric metagenomics, particularly by providing insight into the microbial metabolisms relevant in biogeochemical cycling (Wrighton et al. 2012; Castelle et al. 2013; Baker et al. 2015; Anantharaman et al. 2 016; Daly et al. 2016; Hug, Thomas, et al. 2016; Danczak et al. 2017) . While the use of metagenomics to resolve unculturable genomes is expanding rapidly and being applied to an increasing number of environments, prokaryotic phylogenetic and metabolic diversity remains vastly undersampled at the genomic level, and for certain taxa, most of the described genomes arise from very few environments (Schulz et al. 2017) . This scarcity of complete, environmentally relevant genomes, from ei ther cultured strains or from metagenomic reconstruction , means that function for any given detected taxa might be inferred based on relationships to an encompassing phylogenetic clade with a genomic sampling of size N=1. Making matters worse, available g enomes are often derived from very different environments and at varying levels of phylogenetic relatedness to genomes in an environment of interest. For nearly all systems, it is unknown across what phylogenetic and environmental scales metabolic potentia l is conserved among related microorganisms. Many marker gene surveys of environments assume that the 16S rRNA marker gene and implied phylogeny is a reliable proxy for the full genomic (and thus functional) content of an organism. But the phenotypic v ariation among organisms that translates to environmental differentiation is not always captured by phylogenetic measures (Martin y, Treseder, and Pusch 2013; Barber‡n et al. 2014; Martiny et al. 2015; McLaren and Callahan 2018) . Even for complex traits that are among the most phylogenetically conserved , such as methanogenesis (Martiny , Treseder, and Pusch 2013) , phylogenetically varying patterns of

PAGE 11

! ) ! gene loss and retention have resulted in the recent discovery of a methanogenic order within an otherwise non methanogenic cluster (Paul et al. 2012; Borrel et al. 2013) and met hanogens identified outside the Euryarchaeota (Evans et al. 2015; Vanwonterghem et al. 2016) , previously thought to be the only archaeal phylum to contain methanogens. Now, as more genomes become available, it becomes possible to look at the evolutionary relationship among phylogeny and traits. Using methanogenesis as an example, it has been proposed on the bas is of multiple archaeal genomes that the last archaeal common ancestor (LACA) was likely a methanogen (Borrel, Adam , and Gribaldo 2016; Sorokin et al. 2017; Spang and Ettema 2017; Adam, Borrel, and Gribaldo 2018) . While many archaea harbor genes characteristic of methanogens, most archaeal taxa are not methanogenic, and the distribution of methanogens across the ar chaeal tree is much patchier than previously thought (Borrel, Adam, and Gribaldo 2016; Chistoserdova 2016; Adam, Borrel, and G ribaldo 2018) . With a complex, multi gene trait such as methanogenesis found to be variable at multiple phylogenetic levels, it is clear that inference of traits based on phylogenic relatedness, while convenient, is risky and subject to many qualificat ions. Even for apparent species level simila rity, seemingly identical organism s may harbor unexpected genomic and metabolic differences (Chase et al. 2017; McLaren and Callahan 2018) . Evidence suggests both that nearly identical organisms as assessed by 16S rRNA sequence similarity can harbor environmentally relevant differences in metabolic potential when c ompared at the genome level, and conversely that a larger 16S rRNA nucleotide sequence difference can envelop multiple genotypes which are ecologically indistinguishable and may represent a population genomic continuum (Shapiro and Polz 2014; Chase et al. 2017; McLaren and Callah an 2018) . Such environmentally relevant functional

PAGE 12

! * ! 'microdiversity' among closely related organisms has been explored in marine systems documenting multiple co existing ecotypes of bacterioplankton including Prochlorococcus and Pelagibacterales (S AR11) (Wilhelm et al. 2007; Kashtan et al. 2014) and has been similarly shown for nitrite oxidizing Nitrospira (Gruber Dorninger et al. 2015) within a built environment, for methanogens in estuarine sediments (Youngblut et al. 2015) and for terrestria l bacteria (Chase et al. 2017) . In the face of such phylogenetic, functional and ecological diversity it is clear that additional sampling is needed targeting a wide range of environments and taxa to better constrain the functional range associated to any phylogenetic grouping. Systemat ic attempts to address this broad undersampling have begun with efforts specifically targeting such 'microbial dark matter' using single cell sequencing techniques (Ishoey et al. 2008; Wu et al. 2009; Rinke et al. 2013) , and continue as recovery of genomes from metage nomes becomes more tractable and commonplace. Recently, metagenomic assembly and binning of more than 1,500 publicly available shotgun metagenomic sequencing datasets resulted in the deposition of over 8,000 new near complete metagenome assembled genomes (MAGs) into public databases (Parks et al. 2017) ; but while MAGs were assembled, the corresponding metabolic capacity was not described. Other recent studies have produced tens to thousands of MAGs from environments such as estuarine sediments (Baker et al. 2015) , aquifers (Anantharaman et al. 2016) , cow rumen (Stewart et al. 2018) , the sub s eafloor (Jungbluth, Amend, and RappÂŽ 2017) , and ocean waters (Tully, Graham, and Heidelberg 2018) , but many en vironments remain undersampled ( including freshwater wetlands , which are the focus of the presen t study) .

PAGE 13

! + ! This project: Wetland archaeal community diversity With t his project, we address this sampling gap by characterizing a rchaeal diversity in a model methane emitting freshwater wetland at the commu nity, genome, and gene levels. Freshwater wetlands are a key source of atmospheric methane (Bastviken et al. 201 1) , and microorganisms, especially the archaea, are a critical component of the fr eshwater wetlands methane cycle, both producing and oxidizing this potent greenhouse gas (Nazaries et al. 2013) . Thus, modeling how environmental factors affect the activity of methane cycling microbial communities, and subsequently impact net methane emissions, needs to take into account both variability in microbial commun ity composition and the variability of microbial metabolic capacity within that community. While environmental methane cycling has long been explored (Jannasch 1975) , novel methanogenic and methanotrophic taxa continue to be described (Raghoebarsing et al. 200 6; Ettwig et al. 2009; Borrel et al. 2013; Mondav et al. 2014; Evans et al. 2015; Lang et al. 2015; Vanwonterghem et al. 2016; Sorokin et al. 2017) , and additional metabolic capabilities identified, such as the direct coupling of anaerobic methane oxida tion to the reduction of sulfate, nitrate and metals (Beal, House, and Orphan 2009; Ettwig et al. 2010; Haroon et al. 2013; Egger et al. 2014; Arshad et al. 2015; Timmers et al. 2015; Ettwig et al. 2016) . However, much still remains to be learned regarding the microorganisms enga ged in these processes, particularly in situ, including their distribution, habitat preferences, and their potential impact on biogeochemical cycling in specific habitats. Old Woman Creek (Huron, OH, USA) is a freshwater estuarine wetland a djacent to La ke Erie. As a research station in the National Oceanic and Atmospheric Administration's (NOAA) National Estuarine Research Reserve System (NERR S ) , Old Woman Creek (OWC) has been studied for decades (Klarer and Millie 1992; Mitsch and Reeder 199 2; Chin et al.

PAGE 14

! , ! 1998; Herdendorf, Klarer, and Herdendorf 2006; Bernal 2008) ; however, a comprehensive microbial census has not been performed. It is known that OWC is methane emitting (Nahlik and Mitsch 2010) , but the identity and substrate utilization capacity of the methanogens present is not known. And while methanogens are directly responsible for the production of methane, the competing or cooperative members of the surround ing community are not known, nor are the ultimate biotic and abiotic controls on methane emissions from this wetland. An approach to understanding function in this wetland is to generate a full microbial community profile and a corresponding metabolic/gen omic profile that is specific to this ecosystem. In addition to addressing specific questions about methane cycling, this in depth profiling can be used to ask if the archaeal metabolisms in the wetland agree with predictions that would have been made bas ed on phylogenetic relatedness to genomes sampled from other environments. With the paired community level census and metagenome assembled genomes, we examine genomic variation within specific archaeal populations, and also link this variation to observed ecological differentiation within these populations. How do the microbial community as a whole and the individual populations vary with site characteristics, both phylogenetically and functionally? Does observed phylogenetic variability correspond to fun ctional variability? Do the genomically inferred archaeal metabolisms in the wetland agree with predictions made based only on phylogenetic relatedness to genomes sampled from other sites in the wetland? First, as discussed in Chapter II, we developed and employed a new approach for the targeted deep sequencing of archaeal and bacterial 16S rRNA genes and linked the resulting microbial community profiles to geochemical measu res. A sampling strategy spanning hydrological and soil depth gradients enabled des cription of the spatial distribution of

PAGE 15

! $! community members. We found that the archaea and the bacteria are arrayed along well defined gradients corresponding to soil depth, and geochemical measures. With this novel sequencing and bioinformatics strategy, w e were able to describe the archaea at a level of resolution not typically achieved with amplicon sequencing, and we detected an unexpected level of archaeal diversity in the wetland. Multiple methanogenic taxa were identified, including newly described n on Euryarchaeotal phyla, and vast numbers of presumably non methanogenic archaea were also identified: these organisms' role within the wetland remains enigmatic. These archaea displayed specific distributions across the wetland gradients, which suggested specific, environmentally defined, habitats. Remarkably, due to the resolution achieved with our sequencing approach, we were able to discern these habitat preferences at multiple taxonomic levels, even to the level of individual OTUs (operational taxono mic unit; used as a proxy measurement for microbial species). As the differentiated distributions across the wetland were associated with specific geochemical measures, we hypothesized that the genomic content of the particular taxa associated with a part icular site, and hence geochemical regime, would be selected by that habitat, and would differ from the content encoded in the genomes of even closely related taxa with different apparent habitat preferences. To address this question, we selected soil sa mples for metagenomic sequencing where archaeal taxa of interest were predicted to be abundant based on the amplicon census. These samples also represented two ecosites with sharply differing geochemistry and archaeal community profiles, allowing us to be gin to explore intra population variation associated with site conditions. Metagenomic assembly indicated the presence of 3 Candidatus Thorarchaeota genomes present across 2 samples. The Thorarchaeota candidate phylum falls

PAGE 16

! $$ ! within the recently described Asgard superphylum, and members of this superphylum are currently thought to be the closest extant relatives of the ancestral eukaryotic host cell (Zaremba Niedzwiedzka et al. 2017) . Comparative analysis of these Ca. Thorarchaeota genomes in the context of other archaeal and eukaryotic genomes uncovered anomalies in the complement of core information processing genes. These findings are discussed in Chapter III and include both losses and gains of genes associated with the elongation stage of protein synthesis. The apparent loss of diphthamide synthesis genes, otherwise universally present across archaea and eukaryotes, and the gain of a second copy of translation elongation factor 2, a single copy gene with no known paralogs in the Archaea, r aises multiple questions regarding the evolution and conservation of key components of protein synthesis. Finally, in Chapter IV, we discuss the genomically encoded metabolic versatility found within the Bathyarchaeota in the OWC wetland. The Bathyarch aeota were among the most abundant and numerous archaea that we identified in the initial amplicon census of the wetland. Within this phylum, multiple subgroups have been delineated, and broad habitat preferences described (Fillol, Aug uet, et al. 2015; Lazar et al. 2015; Xiang et al. 2017) . The frequent detection of Bathyarchaeota (previously known as Miscellaneous Crenarchaeota Group, MCG) from other methane active sites (Biddle et al. 2006; Kubo et al. 2012; Lloyd et al. 2013; Evans et al. 2015) as well as this wetland led to an initial hypothesis that the Bathya rchaeota were integral members of methanogenic consortia, potentially supplying the methanogens with their required metabolic substrates. More recently, Bathyarchaeota have been found to be capable of acetogenesis, and even methanogenesis (Evans et al. 2015; He et al. 2016; Lazar et al. 2016) . Our findings in the amplicon study that particular Bathyarchaeota subgroups are associated with specific sites within the wetland are reinforced

PAGE 17

! $% ! with metagenomic analysis of these samples. We recovered multiple partial genome bins from seve ral Bathyarchaeota subgroups, which we treat as subgroup specific population genomes for pathway analysis. These population genomes have distributions that were predicted based on their presence/absence and abundance in the amplicon study. We find that t he capacity for acetogenesis is broadly conserved across these newly described Bathyarchaeota subgroups, while differing capacity for carbon assimilation within these population genomes suggests links to the geochemistry at the sites from which they were r ecovered. Interestingly, we also recovered Bathyarchaeotal mcrABG genes from several metagenomic assemblies. These are the hallmark genes for methanogenesis, and suggest that the methane production capacity within the OWC wetland may be more complex than initially thought.

PAGE 18

! $& ! CHAPTER I I HIGH RESOLUTION SEQUENCING REVEALS UNEXPLORED ARCHAEAL DIVERSITY IN FRESHWATER WETLAND SOILS 1 Originality Significance Statement Freshwater wetlands represent significant sources of atmospheric methane, yet we know surprisingly little about the microbial communities in these terrestrial systems. We use a novel domain specific rRNA amplicon sequencing approach in combination with extensive sampling across a freshwater wetland to provide the first high resolution view of both the archaeal and bacterial communities in these soils. Our methodology is especially powerful when used to explore the low abundance archaeal taxa with high phylogenetic resolution. Many of these taxa have previously been shown to play critical r oles in carbon cycling in soils and sediments. We uncover new phylogenetic diversity in both methane cycling and underexplored non methane cycling archaeal populations. These populations show habitat preferences that are structured at multiple phylogenetic levels, and which suggest complex interactions governing methane cycling. Summary Despite being key contributors to biogeochemical processes, archaea are frequently outnumbered by bacteria, and consequently are underrepresented in combined molecular surv eys. Here, we demonstrate an approach to concurrently survey the archaea alongside the bacteria with high resolution 16S rRNA gene sequencing, linking these community data to !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!! ! $ Portions of this chapter were previously published in: Narrowe, A.B., Angle, J.C., Daly, R.A., Stefanik, K.C., Wrighton, K.C., and Miller, C.S. (2017) High resolution sequencing reveals unexplored archaeal diversity in freshwater wetland soils. Environ. Microbiol. 19 : 2192 Ð 2209, and are included with the permission of the copyright holder. Author contributions are listed at the end of the chapter. !

PAGE 19

! $' ! geochemical parameters. We applied this integrated analysis to hydric soils samp led across a model methane emitting freshwater wetland. Geochemical profiles, archaeal communities, and bacterial communities were independently correlated with soil depth and water cover. Centimeters of soil depth and corresponding geochemical shifts cons istently affected microbial community structure more than hundreds of meters of lateral distance. Methanogens with diverse metabolisms were detected across the wetland, but displayed surprising OTU level partitioning by depth. Candidatus Methanoperedens sp p. archaea thought to perform anaerobic oxidation of methane linked to iron reduction were abundant. Domain specific sequencing also revealed unexpectedly diverse non methane cycling archaeal members. OTUs within the underexplored Woesearchaeota and Bathya rchaeota were prevalent across the wetland, with subgroups and individual OTUs exhibiting distinct occupancy and abundance distributions aligned with environmental gradients. This study adds to our understanding of ecological range for key archaeal taxa in a model freshwater wetland, and links these taxa and individual OTUs to hypotheses about processes governing biogeochemical cycling. Introduction Temperate freshwater wetlands are an important source of biogenic atmospheric methane (Bastviken et al. 2011; Bridgham et al. 2013; Kirschke et al. 2013) . As a group, these habitats are estimated to account for up to 40% of all methane emissions (Denman et al. 2007) , yet can also function as carbon s inks. However, the diverse classification of habitats labeled as freshwater wetlands reflects wide variation in hydrology, plant cover, soil type, nutrient content, and pH (Federal Geographic Data Committee 2013) , and includes both natural and constructed wetlands. Microbial communities mediate carbon cycling in

PAGE 20

! $( ! freshw ater wetlands, but certain of these habitats are better studied microbiologically (Borrel et al. 2011; Bridgham et al. 2013) ; particularly rice fields (Conrad 2002; Conrad et al. 2012; Lee et al. 2014) , lake and river sediments (Borrel et al. 2012; J. Wang et al. 2012; Bodelier et al. 2013) , tidal estuaries (John Parkes et al. 2012; Webster et al. 2015) , peatlands (Juottonen et al. 2012; Preston et al. 2012; Sun et al. 2012; Hawkins, Johnson, and BrŠu er 2014) , and constructed wetlands (Nahlik and Mitsch 2010; Sams— and Garc’a 2013; Arroyo, S‡enz de Miera, and Ansola 201 5; He et al. 2015) . Although some similarity in key functional guilds may be common across wetland types, fundamental geochemical differences drive differences in microbial community structure, and thus inferred metabolic capabilities. Soil pH and red ox, which vary by wetland type and within a single wetland, are two such controllers on methanogenesis, and are explicitly incorporated as key parameters in predicting methane production in biogeochemical models (Meng et al. 2012) . However, geochemistry can also control methane cycling via indirect interactions with non methanogenic members of the community, and these interactions can be wetland type specific. For example, in acidic Sphagnum dominated peatlands, oxygen dependent breakdown of phenolics by non methanogenic communi ty members leads to a cascade of increased decomposition and ultimately provides the substrates facilitating methanogenesis (Fenner and Freeman 2011) . Yet th ese complex interactions are probably not relevant outside of peatlands, due to less polyphenolic load linked to differing aboveground vegetation. Thus, a full characterization of microbial community structure and membership across the full range of freshw ater wetland types is necessary to improve models of carbon cycling in these ecosystems (Riley et al. 2011) .

PAGE 21

! $) ! Archaea are expected to be a critical component of the microbial community in freshwater wetlands, in part because all known methanogens and mos t anaerobic methanotrophs are archaea. Most biogenic methane production in wetlands is thought to be driven by Euryarchaeota performing acetoclastic or hydrogenotropic methanogenesis, although much of that methane can be immediately consumed by bacterial a nd archaeal methane oxidizers (Segarra et al. 2015; Cai et al. 2016) . Archaeal communities (and thus controls on methane emissions) vary by wetlan d type. Comparatively less is known about non methane cycling archaea in freshwater wetlands. Many microbial diversity surveys have potentially missed archaeal populations present by exclusively targeting methane cycling microbes using functional marker g enes for archaeal methanogenesis or bacterial methane oxidation (e.g. mcrA , pmoA ) (Bourne, M cDonald, and Murrell 2001; Luton et al. 2002) . There is a need to more extensively phylogenetically sample, as well as characterize the habitat preferences of, non methane cycling archaea in freshwater wetlands, to develop hypotheses about the roles the se organisms play in carbon cycling. Archaea often constitute only a small fraction of the total microbial community in soils and sediments (Bates et al. 2011; Borrel et al. 2012; Webster et al. 2015) , despite their large contributions to ecosystem wide biogeochemical processes. Unfortunately, many 16S rRNA gene based studies employ PCR primers that simultaneously amplify both bacterial and archaeal 16S rRNA genes (Caporaso et al. 2012) , under sampling diversity of the less abundant archaeal fraction (Y. Wang et al. 2012; Klindworth et al. 2013) . To more deeply sample the archaeal members, primers specific to this domain have been developed (Baker, Smith, and Cowan 2003; Teske and S¿rensen 2008; Klindworth et al. 2013) , but oft en produce longer amplicons which have largely limited their use to lower throughput

PAGE 22

! $* ! sequencing methods such as Sanger sequencing (Gantner et al. 2011) and pyrosequencing (Lee et al. 2015; Webster et al. 2015) . An approach that incorporates archaeal domain specificity and exploits higher throughput short read sequencing would both increase access to low abundance archaeal community members and allow for increased sampling and replication. One solution is to assemble longer rRNA gene amplicons from short read shotgun sequencing libraries with the EMIRGE algorithm (Miller et al. 2011; Miller et al. 2013; Ong et al. 2013) . EMIRGE uses a large database of candidate rRNA genes to perform a templated assembly of rRNA reads from a samp le. In an iterative process, mapped reads are used to correct the candidate rRNA genes to reflect gene sequences found in the sample, and mappings to corrected candidate genes are then used to probabilistically assign each read to candidate genes for rela tive abundance estimates. Because longer amplicons can be assembled, this method permits domain specific primer selection independent of sequencing read lengths, exploits the higher throughput afforded by short read sequencing, and allows for more extensiv e sampling and replication within a single study. Domain specific primers which exclusively target the V3 V6 region of the archaeal or bacterial 16S rRNA gene have been described, and increase phylogenetic resolution when compared to shorter amplicons (Gantner et al. 2011; Ong et al. 2013; Singer et al. 2016) . In this study, we investigate the full extent of archaeal diversity within hydric soils of a model temperate, circumneutral freshwater wetland. Soils were sampled from Old Woman Creek (OWC) National Estuarine Research Reserve, a naturally occurring palustrine freshwater emergent wetland adjacent to Lake Erie, Ohio, USA. This wetland has well characterized macroecology, hydrology, and geochemistry (Klarer and Millie 1992; Mitsch

PAGE 23

! $+ ! and Reeder 1992; Chin et al. 1998; Herdendorf, Klarer, and Herdendorf 2006; Bernal 2008) , with mean annual methane emissions up to 82 g CH4 C m! 2. (Nahlik and Mitsch 2010) . By applying a novel high throughput, domain specific V3 V6 16S rRNA sequencing approach, we ask ed how relative abundances of common and rare archaeal taxa co vary with bacterial community structure and soil geochemistry across multiple phylogenetic and spatial scales Experimental Procedures Experimental Design and sample collection Old Woman Creek National Estuary Research Reserve is a 573 acre freshwater wetland on the southern edge of Lake Erie near Huron, Ohio. Soil cores studied here were collected in October 2013 across two transects (labeled Transect 2, Transect 3). Three ~1m 2 sites were sampl ed from each transect, spanning hydrologic gradients from i) seasonally exposed mud flat (mud, M), to ii) a submerged mud flat which drained during the 24 hour collection period due to storm erosion of the barrier beach (mud transition, MT), to iii) perman ently flooded sediments covered by 24" of open water (open water covered, O; Figure 2.1). Samples from a third transect (Transect 1) were collected but were not analyzed here. Plot location was marked with a Mobile Mapper 100 GPS unit. Cores were collected to a depth of approximately 35 cm using a modified Mooring System soil corer (3" diameter Cellulose Acetate Butyrate core liner), and stored on ice for transport. Four soil cores were collected from each of sites M2, M3, and O3, while 3 cores each were co llected from sites O2, MT2, and MT3. Soil was extruded from core liners, and each core was sectioned into 4 samples by depth as measured from the core surface: 0 5cm (D1), 6 12 cm (D2), 13 23 cm (D3), and 24 35 cm (D4), yielding a total of 84 samples. Sam ples were transferred to sterile Whirl pak bags and homogenized by hand. 30 g of soil from each homogenized sample was

PAGE 24

! $, ! removed and stored at 4¡C for geochemical analysis, while the remaining soil was stored at 20¡C for microbial analysis. Geochemical mea surements and analyses For 76 of the 84 samples pH, nitrate, nitrite, total soil carbon, acetate, Fe (II), sulfate and phosphate were measured, although phosphate was below detection in all samples (Table S1). Cores MT2 core 3 and MT3 core 3 were not pro cessed for geochemical data (8 samples). Fe (II) concentrations were measured using absorbance spectrophotometry at 510 nm using a Hach FerroVer Iron Reagent (Loveland, CO). Ion chromatography (Dionex ICS 2100 Ion Chromatography System with an AS18 column) was used to determine concentrations of acetate, nitrite, nitrate, sulfate, and phosphate. For measures below detection, the value was set at half the detection limit for analyses (Table S2.1). 5 g of soil was added to 5 ml DI water in a 15 ml falcon tu be (1:1 v/v) and vortexed to create a soil slurry. Soil slurry pH was measured with Accumet AB150 pH/mV meter, and passed through a 0.2 um filter. The filtered liquid was stored in 2 ml microcentrifuge tubes at 20¡C until analysis. Soil samples were dri ed at 65¡C for 24 hours in aluminum tins, ground to a powder using a mortar and pestle, and were stored in a dark, room temperature cabinet until analysis. Total, organic, and inorganic carbon were measured using a Shimadzu 5000A Solid Sample Combustion U nit. Total carbon samples were combusted at 900¡C, while inorganic carbon samples were saturated in a 1:2 mix of 85% H 3 PO 4 to Milli Q water and then combusted at 200¡C. Organic carbon was determined by subtracting the inorganic carbon content from the tot al carbon content of the sample. Because TIC was below detection in many samples or such a small component of total carbon (mean 5%), here we report total soil carbon, which approximates TOC values. Salinity data were obtained from

PAGE 25

! %! the NERRS Centralized D ata Management Office. (http://cdmo.baruch.sc.edu/get/landing.cfm) DNA extraction, amplification and 16S rRNA gene amplicon sequencing For all 84 samples, total genomic DNA was extracted using the MoBio PowerSoil DNA Isolation Kit (Carlsbad, CA) from 0.4 g soil and quantified using Invitrogen Qubit dsDNA HS Assay (Life Technologies, Waltham, MA). The V3 V6 region of the 16S rRNA gene was PCR amplified and sequenced twice for each sample, separately targeting the bacterial and archaeal domains. Bacterial 16S rRNA gene amplification used the primers F338 and 1061R (Ong et al. 2013) . Archaeal 16S rRNA gene amplification used the primers F349, and 1041R ( Gantner et al. 2011; Klindworth et al. 2013) . Full primer names, primer sequences and reaction conditions are shown in Table S2.5. Triplicate reactions per sample were pooled using Zymo Clean and Concentrator 5 (Irvine, CA). The V3 V6 amplicons (approx . 635bp for Archaea and 723 bp for Bacteria) were fragmented using the Nextera XT shotgun metagenomic library preparation kit (Illumina, San Diego, CA) to produce a multiplexed sequencing library. To increase insert size, the tagmentation reagents (TD buff er, ATM) were reduced to 90% of specified volumes and 25ul AMPure XP beads (Beckman Coulter, Indianapolis, IN) were used for the library cleanup step. Libraries for each domain were prepared and sequenced separately using 2x150 bp paired end reads on the I llumina MiSeq at the University of Colorado Anschutz Medical Campus Genomics and Microarray Core. EMIRGE reconstruction of 16S rRNA gene amplicons Following sequencing, reads were preprocessed using SeqPrep (https://github.com/jstjohn/SeqPrep) to remove a dapter sequences and to merge overlapping

PAGE 26

! %$ ! reads ( m 0.3 n 0.7 o 12 Z 100000 N 1 A CTGTCTCTTA B CTGTCTCTTA). Reads were quality trimmed using sickle (v1.33; l 100 q 2) (Joshi and Fass 2011) . Merged reads were split at their midpoint in silico and added to non overlapping reads for downstream analysis. Amplicons were reconstr ucted for each sample and for each domain using EMIRGE (Miller et al. 2011; Miller et al. 2013) . EMIRGE candidate 16S rRNA databases (separate Bacterial and Archaeal) were produced using the Silva SSU Ref NR99 database (release 119) (Quast et al. 2013) , trimmed to the expected V3 V6 amplicons (Werner et al. 2012) with PrimerProspector (Walters et al. 2011) , using default parameters. To remove sequences likely to contain errors, artificia l duplications, or chimeras, the databases were further length filtered (610 bp " archaeal lengths " 650 bp, 2.1% removed; 675 " bacterial lengths " 775 bp, 0.8% removed). Candidate databases were sorted by length then clustered at 97% identity with usear ch ( cluster_smallmem) version 5.2.236 (Edgar 2010) , resulting in final candidate databases of 121 114 Bacterial and 2 550 Archae al sequences. EMIRGE was parameterized to perform 80 iterations, to merge 100% identical sequences ( j 1.0), and to map all reads regardless of insert size ( i 150, s 300). EMIRGE reconstructed sequences for each sample were then clustered using a 97% seq uence identity threshold (usearch cluster_smallmem, id 0.97), and EMIRGE NormPrior abundances for all members in a cluster were summed. OTU picking We retained sequences of both domains as low as 0.02% estimated per sample relative abundance after app lying a minimum expected per base coverage threshold of 20X. EMIRGE reconstructed sequences above a 20X expected coverage threshold from all samples were pooled for each domain, removing sequences containing Ns and those

PAGE 27

! %% ! predicted as chimeric using the DEC IPHER webtool (Wright, Yilmaz, and Noguera 2012) . Study wide representative OTU sequences and per sample abundances were identified using the following protocol. First, estimated read counts were computed using the product of the EMIRGE estimated relative abundance and the number of reads mapped per sample. The combined set of sequences from all samples were sorted by their per sample estimated read counts and dereplicated at 100% identity, and re ad counts summed using usearch ( cluster_smallmem, id 1.0, usersort). The resulting set of unique sequences was sorted by decreasing study wide summed read counts, and used as input to the cluster_otus step of the UPARSE pipeline (Edgar 2013) , using a 97% sequence identity to identify representative OTUs. Finally, all EMIR GE sequences from all samples (those above and below the 20X threshold) were mapped back to the OTUs at 97% identity using usearch_global (Edgar 2010) and estimated read counts were summed from mapped sequences to generate an OTU table. Taxonomy was assigned using the RDP classifier (Wang et al. 2007) (confidence level 0.8), which was retrained using the Greengenes 13_8 database (DeSantis et al. 2006) trimmed to the V3 V6 region using PrimerProspector as described a bove. To incorporate the most recent advances in archaeal taxonomy, the archaeal OTUs were also classified using the Silva SSURef NR99 database (version 128), using the same methods. Silva taxonomy labels were manually curated to remove non taxonomic iden tifiers such as "uncultured archaeon". Where there was a discrepancy between Greengenes and Silva taxonomy or taxonomy was not assigned below "archaea," OTUs were manually assigned taxonomy based on their position in the Silva tree. Three samples represent ing a fifth sediment depth were removed from further analyses due to insufficient replication. The final OTU tables were filtered to retain only those OTUs that appeared in at least 3 of 84 samples.

PAGE 28

! %& ! Simulation of V4 region 16S rRNA amplicons and OTU picki ng All EMIRGE reconstructed sequences were trimmed in silico to the V4 region of the 16S rRNA gene using Primer Prospector v. 1.0.1 with default settings and universal primer sequences F515 5' GTGCCAGCMGCCGCGGTAA 3' and 806R 5' GGACTACHVGGGTWTCTAAT 3'. B y using the same sequences that were input to V3 V6 OTU picking, this procedure maintained the same underlying community structure to estimate the effects of shorter amplicons and reduced sequencing depth on archaeal community resolution. Sequences that we re predicted to amplify using this primer set were input to the OTU picking protocol as described above. The confidence threshold for the RDP classifier was changed from 0.8 to 0.5 to reflect the shorter sequence length. The final OTU table was converted to relative abundance and all relative abundances reduced tenfold to simulate an archaeal community comprising only 10% of the sequencing in any sample. To identify which equivalent V3 V6 Woesearchaeota OTUs would have been identified with this protocol, we searched the V3 V6 Woesearchaeota OTUs against a BLASTN database of the V4 Woesearchaeota OTU sequences. Microbial community analyses OTU counts were normalized to within sample relative abundance (total sum scaling, TSS). Microbial community analyses and visualizations were conducted in R (R Core Team 2014) using the phyloseq, vegan, VennDiagram, gplots, and ggplot2 packages (Wickham 2009; Chen and Boutros 2011; McMurdie and Holmes 2013; Oksanen et al. 2013; Warnes et al. 2016) , and with QIIME (Caporaso et al. 2010) . For analyses of microbial community data in conjunction with geochemical data, only the 76 samples with both microbial and geochemical measures were used. PERMANOVA (Anderson 2001) tests were implemented

PAGE 29

! %' ! using the vegan::adonis function. The bioenv function in vegan was used to identify the subset of geochemical para meters that best correlated with the community membership dissimilarity matrix, and geochemical variables were tested for covariance using the Hmisc package (Harrell and Dupont 2015) . Mantel and partial mantel tests were conducted using vegan. GPS coordinates were converted to intersample distances as 30.78m for 1 second of latitude and 24.38m for 1 second of longitude (http://www.usgs.gov/faq/categories/9794/3022). All commands for these analyses are reported in Supplemental file 2.5. 16S rRNA phylogenetic tree construction All archaeal OTUs were aligned using SINA (Pruesse, Peplies, and Glšckner 2012) and added to the ARB guide tree (SILVA SSURef NR99, v123) using the parsimony add method to retain tree topology (Ludwig et al. 2004) . For the Bath yarchaeota tree, published representative Bathyarchaeota sequences (Kubo et al. 2012; Lloyd et al. 2013; Meng et al. 2014; Evans et al. 2015) were s imilarly added to the guide tree if not already present. The tree was pruned to retain only the representative sequences and the OTUs from this study, and representative sequences were used to identify subgroups. The Euryarchaeota phylogenetic tree was con structed using RAxML (GTRGAMMA, 5 searches, 100 bootstraps) (Stamatakis 2014) in ARB using all OWC Euryarchaeota OTUs and reference sequences, including cultured Euryarchaeota 16S rRNA gene sequences and sequences of the 2 nearest neighbors to the OWC OTUs as identified by the SINA alignment to SILVA SSURef NR99 v126 (Quast et al. 2013) . Only the Candidatus Methanoperedens and Methanosaeta gene ra are shown in Figure 2.4. For the Woesearchaeota tree, all reference sequences and OTUs located within the Woesearchaeota candidate phylum were retained and a maximum

PAGE 30

! %( ! likelihood tree constructed using RAxML (GTRGAMMA, 1 search, 100 bootstraps) with repre sentative DPANN genomes as the outgroup. The recently published partial 16S rRNA gene from the Woesearchaeota RC V genome bin reported by Lazar et al. (2017) was added to this tree with the parsimony add function in ARB. Trees were visualized and annotated using iTOL (Letunic and Bork 2007) . Data availability DNA sequencing data is available at the NCBI Sequence Read Archive under BioProject PRJNA325008. Results Site geochemistry Soil geoch emical parameters and corresponding microbiology samples (Table S 2. 1) were collected across two transects (Transect 2 and Transect 3). Each transect contained three sampling sites defined by hydrology [mud (M), mud transition (MT), open water covered (O)] , and multiple cores were pulled from each of the six sites, with each core divided into four depths for sampling (Figure 2. 1). Calculated water salinity over the sampling period ranged from 0.2 0.4 parts per thousand. With increasing soil depth, Fe (II) concentrations increased, while sulfate concentrations decreased, likely reflecting oxygen decrease with depth (Figure S 2. 1). Nitrite, pH and total carbon were largely invariant across sites and depths within Transect 2, but were more variable across Transect 3. The Transect 3 open water covered site (site O3) was most distinct from all other sites, with significantly lower pH and total soil carbon concentrations (Tukey's HSD p < 0.001) and nitrite below detection in 13 of 16 samples (Figure S 2. 1, Tabl e S 2. 1). Across sites, overall sample geochemistry varies more with soil depth, rather than by lateral distance (Figure S 2. 2). Of the

PAGE 31

! %) ! measured geochemical parameters (Table S 2. 1), pH, Fe (II), nitrite, sulfate, and total soil carbon were identified as the subset that best correlated with both archaeal and bacterial community inter sample dissimilarities (bioenv Spearman r= 0.62, 0.57 respectively). Figure 2.1 Site map and scale of wetland soil sampling. Hydric soil samples were collected from the Old W oman Creek freshwater wetland (star) in Ohio, USA (left). Samples were collected along two transects (red lines, transects T2 and T3). Each transect consists of three sites (M, MT, or O) defined by water cover, and 3 or 4 cores (C1 C4) per site were collec ted and sampled at 4 depths (D1 D4; right). A representative sample from the mud site of transect 2, core 4, at depth 3 is identified as M2C4D3. The 84 samples processed for geochemistry and microbial community analysis represent samples spatially separate d at scales from centimeters to hundreds of meters. Domain specific sequencing recovers bacterial and archaeal diversity We employed a domain specific 16S rRNA amplicon sequencing and assembly approach to concurrently characterize the archaeal and bac terial communities in 84 wetland !"#$% &'()#$'*+ ,-#.#!#'*#/# 0+*#1(2+3 45 46 4! 4/ !7#% 89(' %:; ,<3 %:; 2*=-1(2(',<>3 '0+-#?=2+* $'@+*+; ,83 >*=-1+$2#,-.63 A/#1=%0)+1B ##C+'$9+%(12*DE### ##4FG#1+H:+-$(-C !77#% I+2)=-; 80+-#I=2+* J'*+12+; I+2)=-; >*=-1+$2 K=L+#M*(+ >6 5% &(2+ ,-#.#!#0+*# 2 * = -1+$23 N5 N6 N! N/ >!

PAGE 32

! %* ! soil samples across two hydrological transects (Figure 2. 1, Transect 2 and Transect 3). 16S rRNA V3 V6 amplicons were a ssembled from an average of 337,298 +/ 167, 867 bacterial reads using the primers 338f/1061r (Ong et al. 2013) , and 275,955 +/ 185, 997 archaeal reads using the primers 349f/1041r (Gantner et al. 2011; Klindworth et al. 2013) per sample (+/ stand ard deviation; Table S 2. 2). In silico analyses of the selected PCR primers against an existing 16S rRNA database (Klindworth et al. 2013) predicted 84.4% coverage of the archaea and 95% coverage of the bacteria, with virtually no cross domain amplificatio n predicted. This domain specificity of the primers was confirmed in the sequencing: on average, 99.87% of the archaeal sequencing reads, and 99.99% of the bacterial sequencing reads mapped to sequences of the target domain (Figure S 2. 3; Table S 2. 2), and w e exclusively assembled sequences of the targeted domain within each sequencing run. The archaeal primers produced some non specific amplification of protein coding bacterial DNA, likely as a result of low annealing temperature (Gantner et al. 2011) , and EMIRGE identified and discarded these reads. Our pipeline identified 478 archaeal and 1082 bacterial 97% identity Operational Taxonomic Units (OTUs) distributed broadly across the wetland (Supplemental Files S 2. 1 S 2. 4). On average, each sample contained 31% (146 +/ 41) of archaeal and 44% (476 +/ 111) of bacterial OTUs. Across the w etland, 40% (188/478) of archaeal OTUs and 60% (655/1082) of bacterial OTUs are universally found at each of the six sampling sites (Figure S 2. 4). Hydrology paired sites from the two transects shared most OTUs (Figure S 2. 4). For example, 93% of archaeal OT Us found at the Mud 3 site are also found at the Mud 2 site. The most geochemically distinct site (O3) still shared 90% of archaeal and 94% of bacterial OTUs with at least one of the other five sites.

PAGE 33

! %+ ! Community level biogeography corresponds to geochemist ry Across samples, archaeal and bacterial beta diversity was strikingly similar. Bray Curtis inter sample dissimilarity matrices based on the bacterial and archaeal communities were highly correlated (Mantel R=0.86, p<0.0001), and non metric multi dimensional scaling (NMDS) ordinations of these dissimilarities produced significantly similar clustering patterns of samples for both domains (Figure 2. 2; Procrustes correlation=0.90, p<0.0001). Archaeal and bacterial communities found in individual sampl es were most similar in composition to communities from samples with similar geochemistry. Ordination of samples based on Euclidean distance of geochemical measurements alone uncovered a similar inter sample structure to that observed for both the archaeal and bacterial community (Figure S 2. 2; procrustes correlation = 0.65, 0.66 respectively, p<0.0001). Inter sample community dissimilarity was not solely due to increases in spatial distance between samples (Angermeyer, Crosby, and Huber 2016) . When controlling for the effects of lateral distance between sampling sites, both the archaeal and bacterial community structures still correlated significantly with measured geochemical parameters (Table S 2. 3; p<0.0001; partial Mantel R=0.58, 0.54 respectively).

PAGE 34

! %, ! Figure 2.2 Statistically significant clu stering of samples by water cover and soil depth as described by both archaeal and bacterial community composition. NMDS ordination of Bray Curtis dissimilarity for soil archaeal (A, B) and bacterial (C, D) communities. A single ordination was performed for each domain and is shown twice in color to emphasize the samples from Transect 2 (A, C) and from Transect 3 (B, D) separately, with samples from the opposite transect shown in grey. Measured Bray Curtis dissimilarities for both archaeal and bacterial c ommunities across the wetland confirmed that soil depth defines microbial community structure. Within Transect 2 (A), samples cluster more by soil depth than by site (PERMANOVA p < 0.001; R 2 = 0.52, 0.13 respectively); whereas in Transect 3 community dissim ilarity is dominated by the unique O3 site geochemistry, though samples are still primarily organized by soil depth. Overall, bacterial and archaeal communities were structured at spatial scales that correspond to geochemical gradients. We collected mult iple adjacent cores within #1 m 2 at each site (Figure 2. 1). At the meter scale, samples from adjacent cores clustered significantly by depth for five of the six sites (PERMANOVA p<0.001; Table S 2. 4). Alternatively, samples from the same core but from depths separated by centimeters did not cluster together (Figure S 2. 5, Table S 2. 4). Within Transect 2, microbial communities were most similar at equivalent soil depth, despite being collected from hydrologically distinct sites ~ 10 meters

PAGE 35

! &! apart (Figure 2. 2; mean Bray Curtis dissimilarity grouped by depth within site: 0.25, grouped by depth within Transect 2: 0.26, grouped by core within site: 0.38). Across transects (roughly 200 meters apart), the communities from both mud sites (M2, M3) are similar to eac h other at equivalent depths (Figure 2. 2). However, for the water covered soil samples (O2, O3), community similarity at equivalent depths is found only within and not across transects, in agreement with the geochemical differences in pH, nitrite, sulfate, and total carbon between the O3 site and all other sites (Figure 2. 2). Ba cterial community membership and distribution Bacterial OTUs (Supplemental files S 2. 2, S 2. 4) were most commonly from the Proteobacteria (44% of total bacterial relative abundance) , Chloroflexi (17.5%), Bacteroidetes (9.8%) and Nitrospirae (7.6%). The Deltaproteobacteria were the most abundant of the Proteobacteria , followed by the Betaproteobacteria , and Gammaproteobacteria . Most Chloroflexi OTUs were found within the Dehalococcoid etes , followed by Anaerolineae . The most abundant bacterial orders were the Bacteroidales and the Nitrospirales (8% each), followed closely by a Deltaproteobacteria order (BPC076) with 7% of bacterial abundance. Like many other bacterial orders, these thre e orders exhibited abundance distributions suggesting habitat preferences (Figure S 2. 6), in this case for shallow soils ( Bacteroidales ), deeper soils ( Nitrospirales ), and for the geochemically distinct O3 site (BPC076). Many OTUs were associated with know n sulfate or sulfur reducing lineages, including Deltaproteobacteria taxa Desulfarculaceae , Desulfobulbaceae , Desulfobacteraceae (including Desulfococcus) , Syntrophobacterales , and Thermodesulfobacteriales . Additional potential for sulfate reduction was represented by 19

PAGE 36

! &$ ! Thermodesulfovibrionaceae OTUs. A single high abundance sulfur oxidizing Thiobacillus sp. OTU (OWC_b8) was among the most abundant bacterial OTUs site wide (site wide mean 1.96%, max 5.2%) . In addition to sulfur cycling, we also inferred the potential for iron cycling and bacterial aerobic oxidation of methane (Figure S 2. 6). Fourteen OTUs were assigned to the metal reducing genus Geobacter , and 6 OTUs are assigned to the microaerophilic ir on oxidizing genus Gallionella . The bacterial methanotrophic OTUs were all Type I methanotrophs, with the exception of one low abundance OTU assigned to the Methylosinus genus (Supplemental File S 2. 2). Potential methanotroph OTUs included 21 OTUs within th e order Methylococcales (10 Methylococcaceae OTUs, 6 Crenotrichaceae ; Supplemental File 2. 2). Most Methylococcales OTUs decreased in abundance with soil depth. However, the most abundant Methylococcales OTU (OWC_b17; 99% identity to Methylobacter tundripal udum ) usually appeared in deeper sediment samples than the other Methylococcales OTUs (Supplemental File S 2. 2). Methane cycling archaeal community membership and distribution Domain specific sequencing revealed diverse archaeal communities acros s the wetland. Euryarchaeota contributed 52% of total archaeal relative abundance, followed by the Bathyarchaeota (formerly Miscellaneous Crenarchaeota Group) with 36% (Figure 2. 3; Supplemental File 2. 1). Multiple individual OTUs from these phyla were foun d at relative abundances greater than 20% in individual samples. The next most abundant phylum, the Woesearchaeota , comprised 8% of total relative abundance. Multiple other low abundance archaeal groups each at total relative abundances less than 1% of the archaea were consistently detected across the wetland (Figure 2. 3).

PAGE 37

! &% ! Phylum # OTUs Mud Mud transition Open water Transect 3 Phylum: Thaumarchaeota Verstraetearchaeota Woesearchaeota Other Euryarchaeota Bathyarchaeota Mud Mud transition Open water Transect 2 !"#$%&'()* '

PAGE 38

! && ! Figure 2.3 Archaeal taxa display abundance patterns corresponding to geochemical parameters and soil depth. Heatmap of relative abundance of archaeal OTUs summarized at the deepest informative level at or above genus according to Silva release 128 taxonomy. Only those taxa with a minimum 0.01% mean relative abundance across samples are included. Equivalent dept h samples are grouped together within each site, and presented in order of increasing soil depth left to right.

PAGE 39

! &' ! OTUs were identified from 5 of the 7 known methanogenic Euryarchaeota orders, including: Methanobacteriales , Methanomicrobiales ( Meth anoregula , Methanolinea, Methanospirillaceae ); Methanosarcinales ( Methanosaeta , Methanosarcina ); Methanocellales, and Methanomassiliicoccaceae (Supplemental files S 2. 1, S 2. 3). OTUs representing all known methanogenic metabolisms are present, with acetoclastic Methanosaeta spp. (max relative abundance 47%, mean 21% +/ 8%) and hydrogenotrophic Methanoregula spp. (max relative abundance 10%, mean 4% +/ 2%) being the most abun dant methanogenic OTUs. We also identified 3 OTUs at 96 98% 16S rRNA gene identity to sequences in the newly described methanogenic candidate genera Candidatus Methanomethylicus and Candidatus Methanosuratus within the candidate phylum Verstraetearchaeota (Vanwonterghem et al. 2016) . Of the 85 Bathyarchaeota OTUs, none shared more than 92% 16S rRNA gene identity to the 2 recently described methanogenic Bathyarchaeota (Evans et al. 2015) . We identified 10 OTU s classified in the SILVA taxonomy within the proposed methanogenic WSA2 class (Nobu et al. 2016) . However, all 10 OTUs were within the 20a 9 subclade, and share less than 80% identity with 16S rRNA genes from described methanogenic WSA2 Candidatus Methanofastidiosa genomes. The most abundant archaeal OTUs across the wetland were related to anaerobic methanotrophic Candidatus Methanoperedens spp. (ANME 2d). The 8 Candi datus Methanoperedens OTUs identified here were related to both the recently described Candidatus Methanoperedens sp. BLZ 1 (Arshad et al. 2015) (93.5% 99.7% ID over V3 V6 16S rRNA gene) and Candidatus Methanoperedens nitroreducens (Haroon et al. 2013) (93.8% 97.3% ID over V3 V6, Figure 4). The relative abundance of this approximately genus level group of OTUs (max relative abundance 58%; mean 12% +/ 13%; Figure 2. 3,

PAGE 40

! &( ! Figure 2. 4) appr oaches the abundance level observed for entire archaeal phyla. These OTUs increased in relative abundance with soil depth, and were particularly enriched at the M3 site due to the increase in a single Candidatus Methanoperedens sp. OTU (OWC_a2). OWC_a2 comprised, on average, 80% of the total ANME 2d abundance observed across all sites, and was the most abundant individual archaeal OTU in this study, reaching 45% of total archaeal abundance in a sample from the M3 site (Figure 2. 3, Figure 2. 4). Woesearchaeota, Bathyarchaeota, and other enigmatic archaea display notable diversity across the wetland Separately targeting the archaea allowed for robust detection of archaeal groups that were composed of numerous low a bundance OTUs. The recently described Woesearchaeota phylum (previously DHVEG 6 and Parvarchaea ) (Castelle et al. 2015) is represented by the largest number of archaeal OTUs (157), despite being lower in total relative abundance compared to the Euryarchaeota and Bathyarchaeota . Of the 157 Woesearchaeota OTUs, 145 individu al OTUs had a relative abundance reaching 0.1% in at least one sample, though only 17 of those OTUs reached a relative abundance of at least 1% in any sample. We asked whether the high level of phylogenetic diversity within this phylum would have been dete cted with an amplicon approach using standard pan domain PCR primers targeting the shorter V4 hypervariable region (Caporaso et al. 2012) . In silico simulation using th e V4 primer set and the same OTU picking procedure produced 241 Woesearchaeota OTUs. However, assuming archaea represent approximately 10% of the prokaryotic community (Schwarz, Eckert, and Conrad 2007; Prasse, Baldwin, and Yarwood 2015; Webster et al. 2015; Argiroff et al. 2016) only 11 of these shorter 241 Woesearchaeota OTU sequences would be reported above the 0.1% relative abundance threshold with universal V4

PAGE 41

! &) ! sequencing, and no single OTU reached 1% relative abundance within the archaea (Figure S 2. 7). The Bathyarchaeota cand idate phylum (Meng et al. 2014) is represented by 85 OTUs that are collectively abundant across the wetland, almost exclusively assigned to phylogenetic su bgroups 5b, 6, 7/17, 11, and 15 (Kubo et al. 2012) . Of these 85 OTUs, 81 had a relative abundance of at least 0.1% in at least one sample, and 22 reached 1%, whereas with the simulated V4 approach only 20 of 96 OTUs reached relative abundance of 0.1% in any single sample, and only 3 OTUs reached 1%. Indi vidual Bathyarchaeota subgroups had site specific distributions, which correlated with underlying geochemistry at the OTU level (Figure 2. 5) . Numerous additional underexplored archaeal taxa were represented by multiple OTUs, despite contributing a small component of archaeal relative abundance, and for all these groups we discerned habitat preferences across the wetland. A preference for deeper soils occurs for the Lokiarchaeota (formerly Marine Benthic Group B), certain Thaumarchaeota groups (pS L12, AK59, formerly Marine Benthic Group A), Marine Benthic Group D (DHVEG 1) and Terrestrial Miscellaneous Group (TMEG) OTUs. A preference for the low nitrite, low carbon, low pH O3 site occurs for the Aenigmarchaeota , Thermoplasmatales ASC21, Hadesarchae a , and a clade of Methanomicrobiaceae distinct from described genera within this family (Figure 2. 3).

PAGE 42

! &* ! Figure 2.4 Ð Methanosaeta spp. and Ca. Methanoperedens spp. OTUs in OWC soils are abundant and display OTU level habitat preferences. Selected clades from a Euryarchaeota maximum likelihood 16S rRNA gene tree show individual OTUs within (A) Candidatus Methanoperedens (ANME 2d) and (B) Methanosaeta genera. OTU mean relative abundance for each sampling depth at each site is shown. Bootstrap values > 0.8 are shown with black circles. Candidatus Methanoperedens Mud Mud transition Water covered Mud Mud transition Water covered ! "# $" $% &' ( Methanosaeta ! "

PAGE 43

! &+ !

PAGE 44

! &, ! Figure 2.5 Bathyarchaeota subgroups and OTUs display phylogenetically conserved abundance patterns in the OWC wetland, correlating to geochemical measures. a) Bathyarchaeota OTUs in OWC soils are dis tributed across several subgroups associated globally with freshwater sediments and display subgroup specific differentiated abundance. The heatmap shows within sample relative abundance of OTUs within each Bathyarchaeota subgroup. Replicate depth sample s are grouped together within each site, and presented in order of increasing soil depth left to right. Reference sequences (Kubo et al. 2012; Lloyd et al. 2013; Meng et al. 2014) were used to place OWC Bathyarchaeota OTU within subgroups. b) Phylogenetic tree of wetland Bathyarchaeota 16S rRNA gene sequences with reference sequence s as listed above. Heatmap shows each OTU's Spearman correlation with geochemical measures. Correlations are phylogenetically conserved among groups, agreeing with their site specific distributions as shown in panel A

PAGE 45

! '! Discussion Archaea play crucial roles in freshwater circumneutral wetlands, but studies to date have largely been limited in their ability to detect the full diversity of archaeal communities, or have used methods that focus only on methane cycling archaea. We applied replicated, domain specific amplicon sequencing to a model freshwater wetland, deeply characterizing both archaeal and bacterial communities across soil geochemical, depth, and hydrological gradients. With the targeted, increased sequencing depth afforded by this approach, correlated to geochemical data, we have identified patterns of microbial presence and abundance that suggest geochemical controls on community structure, as well as provided insights into the lifestyle of both abundant and rare taxa. Broadly, we infer th at geochemical environment, rather than dispersal limitation (Horner Devine et al. 2004; Martiny et al. 2011) , appears to be the controlling factor defining archaeal and bacterial community structure in this wetland. First, the relatively high percentage of OTUs shared across the wetland (Figure S 2. 4) suggests that there are limited physical barriers to dispersal. Second, microbial communities varied more with geochemical measures changing over centimeters of depth rather than with lateral distances of hundr eds of meters. Finally, these patterns held for both the archaeal and bacterial communities (Figure 2. 2), even though these communities were measured independently. To our knowledge, microbial community heterogeneity across spatial scales has not been expl ored within freshwater wetland soils, and the broad spatial similarity observed here contrasts with the high level of spatial heterogeneity observed within some dry soils (O'Brien et al. 2016) . OWC is a methane emitting wetland (Nahlik and Mitsch 2010) , so we asked if the membership and composition of the methanogenic archaeal community was similar to those

PAGE 46

! '$ ! reported in other wetland ecosystems. Consistent with other freshwater wetlands (Borrel e t al. 2011; Bridgham et al. 2013) , we identified multiple archaea belonging to known methanogenic orders, with the most abundant being acetoclastic Methanosaeta spp., followed by hydrogenotrophic Methanoregula spp. These two genera are frequently ident ified together in wetland settings, with relative dominance depending on factors such as pH, season, and carbon availability (Kotsy urbenko et al. 2007; Sun et al. 2012; He et al. 2015) . Although activity cannot be determined from relative abundance, based on the replicated abundance distributions of these methanogens, we hypothesize that acetoclastic methanogenesis may be more rele vant in the wetland during the sampling season. However, our targeted sequencing of the archaea also identified multiple additional methanogenic groups, spanning 5 of the 7 known methanogenic Euryarchaeota orders, including members of the Methanomassilii coccaceae , as well as newly described methylotrophic methanogens within the phylum Verstraetearchaeota (Vanwonterghem et al. 2016) . It is unlikely that the level of diversity of methanogenic taxa and inferred methanogenic substrates detected here represents some unique capacity of the OWC wetland. Rather, this result is more likely a product of our approach, which resulted in increased archaeal sampling depth and phyl ogenetic resolution. For example, previously unreported Verstraetearchaeota were recently found at very low relative abundances in shotgun metagenomic sequencing data from freshwater wetlands sediments (Vanwonterghem et al. 2016) . In our study, the most abundant of the 3 Verstraetearchaeota OTUs has a maximum relative abundance of only 0.6% within the archaeal domain, and would likely have been below detection in a study using more traditional universal primers. However, this OTU is present in 80 of the 84 samples across the wetland, and all three OTUs show a distinct

PAGE 47

! '% ! abundance distribution that suggests a habitat preference for deeper soils, and for the O3 samplin g site (Figure 2. 3). Additional experiments are needed to determine activity of the low abundance Verstraetearchaeota in the wetland. Nonetheless, increased detection of low abundance methanogens has important consequences for interpreting the methanogenic capacity of the wetland beyond what is typically inferred by examining only the most abundant methanogenic taxa. Unexpectedly, the most abundant methane cycling archaea within this wetland were not most similar to canonical methanogens, but rather were p hylogenetically affiliated with anaerobic methanotrophic Candidatus Methanoperedens spp. (also known as ANME 2d or AAA). In separate enrichment cultures , Candidatus Methanoperedens spp. organisms have been shown to conduct anaerobic oxidation of methane (A OM) using nitrate (Raghoebarsing et al. 2006; Haroon et al. 2013; Arshad et al. 2015) or Fe(III) and Mn(IV) (Ettwig et al. 2016) as electron acceptors. Related sequences have also been linked to freshwater AOM coupled to sulfate reduction (Schubert et al. 2011; Timmers et al. 201 5) . ANME 2d have been identified in globally distributed freshwater ecosystems (Welte et al. 2016; Vaksmaa et al. 2017) including rice paddies (Lee et al. 2015) , aquifer sediments (Flynn et al. 2013; Castelle et al. 2015) , lake sediments (Stein et al. 2001; Llir—s et al. 2010; Schubert et al. 2011; Kadnikov et al. 2012; Fan and Xing 2016) , river sediments (Rastogi et al. 2009) , estuarine sediments (Li et al. 2012; Prasse, Baldwin, and Yarwood 2015) , minerotrophic fens (Cadillo Quiroz et al. 2008) , mud volcanoes (Wrede et al. 2012) , and high altitude cold wetland sediments (G. Zhang et al. 2008) and the broad distribution of t his group suggests an important role in global carbon cycling in these habitats.

PAGE 48

! '& ! The wetland contained Candidatus Methanoperedens spp. 16S rRNA sequences similar to sequences from both metal reducing and nitrate reducing Candidatus Methanoperedens (Figur e 4), suggesting that these two processes might both occur in natural freshwater wetland environments. Intriguingly, the dominant Candidatus Methanoperedens sp. OTU in this study (OWC_a2) shares 99.7% nucleotide identity with the metal reducing Candidatus Methanoperedens sp. BLZ 1 (Ettwig et al. 2016) and is most abundant at a site (M3) that is marked by both a significant increase in the abundance of iron oxidizing Gallionella spp. OTUs and a significant decrease in abundance of metal reducing Geobacter spp. OTUs (Figure S 2. 6). Based on the findings of this study, ongoing research is using pore water dialysis samplers to correlate the abundance and transcripts of Candidatus Methanoperedens spp. populations to local methane flux measurements. However, our results here complement existing soil laboratory enrichment studies o f iron reducing Candidatus Methanoperedens spp. (Ettwig et al. 2016) and provide additional evidence that archaeal AOM linked to iron reduction may be a key part of the methane cycle in some freshwater ecosystems (Sivan et al. 2011; Sivan et al. 2014; Bar Or et al. 2015) . We also examined our data for prevalence of other known anaerobic methanotrophs. AOM is carried out in other freshwater settings by the nitrite reducing, methane oxidizing bacterium Candidatus Methylomirabilis oxyfera and related members of the NC10 phylum (Raghoebarsing et al. 2006; Ettwig et al. 2010; Deutzmann et al. 2014; Hu et al. 2014; Shen et al. 2016) . Among the 6 low abundance NC10 OTUs identified in the OWC wetland, none were identified as greater than 91.8% 16S rRNA gene nucleotide identity to Ca. M. oxyfera. Further understanding of in situ anaerobic methane oxidation has important implications for m odeling redox cycling in wetlands (Smemo and Yavitt 2011) , especially as current climate

PAGE 49

! '' ! models assume methane oxidation is aerobic, and constrained by depth and saturation dependent O 2 concentration and competition (Riley et al. 2011) . It is well known that the combined effects of depth correlated geochemical redox gradients (Cadillo Quiroz et al. 2006; Lazar et al. 2015; Lee et al. 2015; Chu et al. 2016) and water cover (Kotiaho et al. 2010) can be strongly associated with changes in soil microbial communities. However, we also observed OTU level differences in occupancy and abundance along soil depth gradients , which would not have been predicted based on redox requirements of closely related organisms. For example, although many anaerobic methanogenic Methanosaeta sp. OTUs increase in abundance with depth, one Methanosaeta OTU had the opposite abundance patter n, and was the most abundant archaeal OTU in 0 5 cm depth samples otherwise characterized by an abundance of aerobic bacterial taxa (Figure 2. 4). It is possible that perhaps this taxa has unique antioxidant strategies for tolerating oxygen fluctuations, as has been suggested for other methanogens (Tholen, Pester, and Brune 2007; Angel et al. 2011; Jasso Ch‡vez et al. 2015) . Similarly, while examining the methanotrophic community, we also identified OTUs from the bacterial Methanococcal es family. Interestingly, these aerobic methanotrophic OTUs are persistent throughout the sampled depth profile, including to depths where known anaerobic methanotrophic taxa dominate (Figure S 2. 6). These apparently paradoxical co occurrences of organisms representing aerobic and anaerobic processes may be the result of redox micro sites in the soil (Jakobsen 2007) . It is also possible that one or other of the populations is not active during the time of sampling, due to death or dormancy from u nfavorable redox conditions. Activity measurements will be needed to resolve the role of individual methane cycling OTUs at different depths.

PAGE 50

! '( ! Despite detecting numerous methanogens and methanotrophs, the majority of the archaeal OTUs present in the wetla nd soils are not currently known to be involved directly in methane cycling. In fact, most of these taxa are from entirely uncultivated lineages. These taxa include the candidate phyla Woesearchaeota and Bathyarchaeota , whose metabolic potential is just b eginning to be illuminated by limited metagenomics studies (Castelle et al. 2015; Lazar et al. 2016; Lazar et al. 2017) . Here, Woesearchaeota OTUs display distinct abundance patterns suggesting occupancy of at least two environmental niche s. Despite the high phylogenetic diversity represented by 157 OTUs, most of the individual Woesearchaeota OTUs shared the same strong association with shallow soils, and corresponding positive correlation with soil sulfate and presumably dissolved oxygen c oncentrations. Yet, in contrast to this broad trend, there are 33 (21%) sequences that displayed an opposite distribution across the wetland, enriched in deeper soils, suggesting divergent lifestyles among the Woesearchaeota warranting further metabolic ex ploration (Figure S 2. 7). Despite the large preference of Woesearchaeota OTUs for shallow soils, the sampled Woesearchaeota genomes to date suggest fermentative metabolisms (Castelle et al. 2015; Lazar et al. 2017) and/or symbiotic lifestyles (Castelle et al. 2015) , rather than aerobic metabolisms. The high overall correlation of this group with soil sulfate concentrations in this wetland may represent an environmental determinant on its distribution. Whe reas one complete Woesearchaeota genome was found to encode an ATP sulfurylase, the remaining genes necessary for sulfur cycling metabolisms were absent from this genome (Castelle et al. 2015) . These apparent inconsistencies between known genomic potential and the majority of our observed habitat preferences are resolved by phylogenetic comparison of the 16S rRNA

PAGE 51

! ') ! sequences from this study alongside those from the available sequenced genomes (Figure S 2. 7). Seven of 11 Woesearchaeota genomic samples all derive from a single aquifer groundwater source (Rinke et al. 2013; Castelle et al. 2015; Anantharaman et al. 2016; Lazar et al. 2017) , and most available genomes cluster narrowly in our expanded phylogenetic tree. That is, reconstructed genomes thus far a ccount for only a small, non representative fraction of the Woesearchaeota phylogenetic diversity found in this study (Figure S 2. 7). This restricted genomic sampling limits inference of Woesearchaeota metabolism for most of our shallow associated OTUs, and more broadly within wetlands and related freshwater lakes and sediments where Woesearchaeota have been observed (Amaral Zettler et al. 2008; Ye et al. 2009; Borrel et al. 2012; Ortiz Alvarez and Casamayor 2016) . Future genomic sampling from this phylum should tar get the full breadth of Woesearchaeota metabolisms suggested by the phylogenetically diverse habitat preferences for shallow soils identified here. Within the Bathyarchaeota candidate phylum, we also identified environmentally differentiated distribution patterns suggesting subgroup specific habitat preference. The Bathyarchaeota are among the most abundant organisms reported in marine and freshwater sediments globally (Biddle et al. 2006; Borrel et al. 2012; Kubo et al. 2012; Lloyd et al. 2013; Fillol, Auguet, et al. 2015; Lazar et al. 2015) . Approximate family level subgroups have been shown to display habitat preferences (e.g. salinity, sulfate, and soil depth) that likely reflect conserved underlying metabolic capacities (Kubo et al. 2012; Fillol, Auguet, et al. 2015; Lazar et al. 2015; Lazar et al. 2016) . The Bathyarchaeota OTUs in this study are confined almost exclusively to subgroups 5b, 6, 7/17, 11, and 15 and display group specific abundance distributions that extend previously des cribed habitat preferences (Figure 2. 5) .

PAGE 52

! '* ! Bathyarchaeota groups 6, 15, and 7/17 are broadly distributed across the OWC wetland, yet display soil depth linked habitat preferences. Agreeing with findings from more saline estuarine sediments (Lazar et al. 2015) , Subgroup 6 was generally more abundant in shallower, sulfate rich soils, extending this habitat preference to freshwater environments. Partial genomes reconstructed from estuarine sediment metagenomes (Lazar et al. 2016) revealed potential fermentative metabolisms for subgroups 6, 15, and 7/17. Acetate was hypothesized to be produced via the breakdown of plant based carbohydrates, proteins and amino acids, rather than competing wi th respiring metabolisms, or potentially via autotrophy, and acetogenesis may be conserved across additional Bathyarchaeota subgroups (He et al. 2016) . This metabolic strategy could explain the broad distribution of these subgroups here, and presents a potential link to acetoclastic methanogenesis performed by the Methanosaeta spp. and Methanosarcinales OTUs. Bathyarchaeota subgroups 5b and 11 have a more restricted distribution across the wetland. Previously, subgroups 5b and 11 were suggested as freshwater indicator taxa, being found almost exclusively in freshwater habitats globally (Fillol, Auguet, et al. 2015) , but we do not find these groups broadly distributed across these freshwater wetland soils. Instead, OTUs from subgroups 5b and 11 display preferential association with the water covered O3 site and for deeper soils (Figure 2. 5). Individual O TUs within subgroups 5b and 11 consistently correlate positively with Fe(II) and negatively with pH, sulfate, nitrite, and total soil carbon (Figure 2. 5). These distributions suggest more finely partitioned habitat preferences beyond salinity tolerances fo r these subgroups. Subgroups 5b and 11 still lack representative genomes, but our data hint at very different lifestyles from other freshwater Bathyarchaeota groups.

PAGE 53

! '+ ! Conclusions Environmental differences among freshwater wetland ecosystems likely translate to differences in microbial communities, which in turn differentially impact biogeochemical cycling. By employing a high resolution, domain specific sequencing approach, we prov ide a more complete picture of the complex archaeal community in a model methane emitting freshwater wetland. We found surprising diversity among known methanogens, and OTU level habitat preferences suggesting the potential for subtle controls on methane e missions that may not be phylogenetically conserved. We also infer that archaeal anaerobic oxidation of methane performed by Candidatus Methanoperedens spp. could be a currently unknown mechanism regulating net methane emissions in this wetland. In additio n to the functional potential represented by these characterized taxa, we detected a diverse set of archaea for which metabolic properties are largely unknown. However, for these groups, as exemplified by the Bathyarchaeota and Woesearchaeota , the phylum l evel to OTU level archaeal habitat preferences we describe serve as important ecological context within which to interpret emerging genomic and functional studies of archaea in similar habitats. Assuming that they are active, function of these less charact erized taxa will need to be incorporated into a community level understanding of carbon cycling in freshwater wetland soils. Acknowledgments This research was supported in part by the Ohio Water Development Authority (#6835, to KCW). We thank the staff of the Old Woman Creek National Estuary Research Reserve, in particular Kristi Arend, and Frank Lopez for site access, housing infrastructure, and space and equipment for analytical and sample processing. We thank Elmar Pruesse for assistance with ARB.

PAGE 54

! ', ! Autho r Contributions Field sampling was designed and executed by Kelly Wrighton and Kay Stefanik. The domain specific sequencing method was designed , tested in silico, and implemented by Adrienne Narrowe and Christopher Miller. Kay Stefanik and Jordan Angle c onducted geochemical analyses . Jordan Angle and Rebecca Daly performed DNA extraction and PCR . Adrienne Narrowe performed sequencing library preparation . Adrienne Narrowe and Christopher Miller performed all post sequencing bioinformatics and data analysis . Adrienne Narrowe and Christopher Miller wrote the manuscript contained in this chapter , which was read and approved by all co authors.

PAGE 55

! (! CHAPTER III COMPLEX EVOLUTIONARY HISTORY OF TRANSLATION ELONGATION FACTOR 2 AND DIPHTHAMIDE BIOSY NTHESIS IN ARCHAEA AND PARABASA LID S 2 Abstract Diphthamide is a modified histidine residue which is uniquely present in archaeal and eukaryotic elongation factor 2 (EF 2), an essential GTPase responsible for catalyzing the coordinated translocation of tRNA and mRNA through the ribosome . In part due to the role of diphthamide in maintaining translational fidelity, it was previously assumed that diphthamide biosynthesis genes ( dph ) are co nserved across all eukaryotes and archaea. Here, comparative analysis of new and existing genomes reveals that some archaea (i.e., members of the Asgard superphylum, Geoarchaea , and Korarchaeota ) and eukaryotes (i.e., parabasalids) lack dph . In addition, w hile EF 2 was thought to exist as a single copy in archaea, many of these dph lacking archaeal genomes encode a second EF 2 paralog missing key residues required for diphthamide modification and for normal translocase function, perhaps suggesting functiona l divergence linked to loss of diphthamide biosynthesis. Interestingly, some Heimdallarchaeota previously suggested to be most closely related to the eukaryotic ancestor maintain dph genes and a single gene encoding canonical EF 2. Our findings reveal that the ability to produce diphthamide, once thought to be a universal feature !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!! ! % ! Portions of this chapter were previously published online as : Narrowe AB * , Spang A * , Stairs CW, C aceres EF, Baker BJ, Miller CS, Ettema TJG (2018). Complex evolutionary history of translation Elongation Factor 2 and diphthamide bios ynthesis in Archaea and parabasa lids. bioRxiv , and are included with the permission of the copyright holder. Star indica tes equal contribution to online publication. Full author contributions are listed at the end of the chapter. !

PAGE 56

! ($ ! in archaea and eukaryotes, has been lost multiple times during evolution, and suggest that anticipated compensatory mechanisms evolved independently. Introduction Elongation facto r 2 (EF 2) is a critical component of the translational machinery that interacts with both the small and large ribosomal subunits. EF 2 functions at the decoding center of the ribosome, where it is necessary for the translocation of messenger RNA and assoc iated tRNAs (Spahn et al. 2004) . Archaeal and eukaryotic EF 2, as well as the homologous bacterial EF G, are members of the highly conserved translational GTPase protein superfamily (Atkinson 2015) . Gene duplications and subsequent neo functionalizations have been inferred for eukaryotic EF 2 (eEF 2), with the identification of the spliceosome component Snu114 (Fabrizio et al. 1997) , and Ria1, a 60S ribosomal subunit biogenesis factor (Becam et al. 2001) . Bacterial EF G is involved in both translocation and ribosome recycling and has undergone multiple duplica tions, including sub functionalizations separating the translocation and ribosome recycling functions (Tsuboi et al. 2009; Suematsu et al. 2010) as well as neo functionalizations including roles in back translocation (Qin et al. 2006) , translation termination (Freistroffer et al. 1997) , r egulation (Li et al. 2014) and tetracycline resistance (Donhofer et al. 2012) . However, to date, archaea were thought to en code only a single essential protein within this superfamily, i.e. archaeal EF 2 (aEF 2) (Atkinson 2015) . Unlike bacterial EF Gs, archaeal and eukaryoti c EF 2s contain a post translationally modified amino acid which is synthesized upon the addition of a 3 amino 3 carboxypropyl (ACP) group to a conserved histidine residue and its subsequent modification to diphthamide by the concerted action of 3 (in arch aea) to 7 enzymes (in eukaryotes) (De CrÂŽcy Lagard et

PAGE 57

! (% ! al.; Schaffrath et al. 2014) . While diphthamide is perhaps best known as the target site of bacterial ADP ribosylating toxins ( Iglewski, Liu, and Kabat 1977; Jorgensen et al. 2008) and as required for sensitivity to the antifungal sordarin (Botet et al. 2008) , its exact role remains a subject of investigation. Yeast mutants incapable of synthesizing diphthamide have a higher rate of translational frame shifts, suggesting that this residue plays a critical role in reading frame fidelity during translation (Ortiz et al. 2006) . Furthermore, structural studies of eEF 2 using high resolution Cryo EM have indicated that diphthamide interacts directly with codon anticodon bases in the tr anslating ribosome, and facilitates translocation by displacing ribosomal decoding bases (Anger et al. 2013; Murray et al. 2016) . In addition, diphthamide has been proposed to play a role in the regulation of translation, as it represents a site for reversible endogenous ADP ribosylation (Schaffrath et al. 2014) , and in the selective translation of certain genes in response to cellular stress (ArgŸelles et al. 2014) . Given its anticipated role at the core of the translational machinery, it is not surprising that, with the sole exception of Korarchaeum cryptofilum (De CrŽcy Lagard et al.; Elkins et al. 2008) , the diphthamide biosynthetic pathway is u niversally conserved in all archaea and eukaryotes. Indeed, while not strictly essential, loss of diphthamide biosynthesis has been shown to result in growth defects in yeast (Kimata and Kohno 1994; Ortiz et al. 2006) and some archaea (Blaby et al. 2010) , and is either lethal or causes severe developmental abnormalities in mammals (Liu et al. 2006; Webb et al. 2008; Yu et al. 2014) . In the current study, we explore the evolution and function of EF 2 and of diphthamide biosynthesis genes using genomic data from novel major archaeal lineages that were recently discovered using metagenomics and single cell genomics approaches (Hug, Baker, et al. 2016; Adam et al. 2017; Spang, Caceres, and Ettema 2017) . In particular, we

PAGE 58

! (& ! report the presence of EF 2 paralogs in many archaeal genomes belonging to the Asgard archaea, Korarchaeota and Bathyarchaeota (Meng et al. 2014; Evans et al. 2015; Spang et al. 2015; He et al. 2016; Lazar et al. 2016; Zaremba Niedzwiedzka et al. 2017) and the unexpected absence of diphthamide biosynthesis genes in several archaea and in parabaslid eukaryotes. Our findings reveal a complex evolutionary history of EF 2 and diphthamide biosynthesis genes, and point to novel mechanisms of translational regulation in several archaeal li neages. Finally, our results are compatible with scenarios in which eukaryotes evolved from an Asgard related ancestor (Spang et al. 2015; Zaremba Niedzwiedzka et al. 2017) and suggest the presence of a diphthamid yl ated EF 2 in this lineage. Materials And Methods Sampling and sequencing of ABR Loki and Thorarchaeota. Sampling, DNA extraction, library preparation and sequencing was produced as described in (Zaremba Niedzwiedzka et al. 2017) . We chose the four deepest samples, at 125 and 175 cm bel ow sea floor (MM3/PM3 and MM4/PM4 respectively), as they showed highest lokiarchaeal diversity in a maximum likelihood phylogeny of 5 to 15 ribosomal proteins (RP15) encoded on the same contig (Zaremba Niedzwiedzka et al. 2017) . Adapters and low quality bases were trimmed using Trimmomatic version 0.32 with the following parameters: PE phred33 ILLUMINACLIP:NexteraPE PE.fa:2:30:10:1:true LEADING:3 TRAILING:6 SLIDINGWINDOW:4:15 MINLEN:36 (Bo lger, Lohse, and Usadel 2014) . Assembly of ABR Loki and Thorarchaeota . Samples from the same depth were assembled together using IDBA UD (Peng et al. 2012) (version 1.1.1 384, -maxk 124 r ) producing four different assemblies (S1:MM1/ PM1, S2:MM2/PM2, S3:MM3/PM3, S4:MM4/PM4). Assemblies S3

PAGE 59

! (' ! and S4 were particularly interesting as they showed the highest lokiarchaeal diversity. However, some lokiarchaeal members showed highly fragmented contigs, probably due to the low abundances of these organisms. In an attempt to produce longer contigs we co assembled those reads coming from Asgard archaea members in the samples MM3, PM3, MM4 and PM4. Asgard archaea reads were identified using Clark (version 1.2.3, m 0) (Ounit et al. 2015) and Bowtie2 (version 2.2.4, default parameters) (Langm ead and Salzberg 2012) against a customized Asgard archaea database. Classified reads were extracted and co assembled using SPAdes (version v.3.9.0, -careful) (Bankevich et al. 201 2) . In brief, the Asgard database was composed of Asgard genomes publicly available on February 2017. Clark does not perform well when organisms present in the samples of interest are not highly similar to the ones present in the provided database. To increase the classification sensitivity, we included in our database low quality Asgard MAGs (with highly fragmented contigs) generated from assemblies S3 and S4, using CONCOCT (Alneberg et al. 2014) . Coverage profiles required by CONCOCT were estimated using kallisto (version 0.43.0, quant -plaintext) (Bray et al. 2016) . All available samples from the same location (MM1, PM1, MM2, PM2, MM3, PM3, MM4, PM4) were used and mapped independently against th e assemblies S3 and S4. For each assembly, MAGs were reconstructed using two different minimum contig length thresholds (2000 and 3000 bp). We used the number of containing clusters of ribosomal proteins (ribocontigs) as a proxy to estimate the microbial d iversity present in the community. The maximum number of clusters ( c option in CONCOCT) was estimated by calculating approximately 2.5 times the estimated number of species in the sample (Johannes Alneberg, personal communication), resulting into 900 and 600 for S3 and S4, respectively. Potential Asgard archaea bins were identified based on the

PAGE 60

! (( ! presence of ribocontigs classified as Asgard archaea and were included in the database. Binning of ABR Loki and Thorarchaeota . Several binning tools with differen t settings were run independently: CONCOCT_2000: version 0.4.0, -read_length 200 and minimum contig length of 2000. CONCOCT_3000: version 0.4.0, -read_length 200 and minimum contig length of 3000. In both cases, coverage files were created mapping all 8 samples against the co assembly using kallisto. MaxBin2: version 2.2.1, min_contig_length 2000 markerset 40 Ð plotmarker (Wu, Simmons, and Singer 2016) . The 8 samples w ere mapped against the co assembly using Bowtie2. Coverage was estimated using the getabund.pl script provided. MyCC_4mer: 4mer t 2000 (Lin and Liao 2016) . MyCC_56mer: 56mer t 2000. Both coverage profiles were obtained as the author s described in their manual. The results of those 5 binning methods were combined into a consensus: contigs were assigned to bins if they had been classified as the same organism by at least 3 out of 5 methods. The resulting bins were manually inspected a nd cleaned further using mmgenome (Albertsen et al. 2013) . Completeness and redundancy was computed using CheckM (Parks et al. 2015) . Sampling and sequencing of OWC Thorarchaeo ta . Eight soil samples were collected from the Old Woman Creek (OWC) National Estaurine Research Reserve and DNA was extracted as described previously ( Chapter II; Narrowe et al. 2017) . Library preparation and five lanes of Illumina HiSeq 2x125 bp sequencing followed standard operating procedures at the US DOE Joint Genome Institute (GOLD study ID Gs0114821) . Sample M3 C4 D3 had replicate extraction, library preparation, and two lanes o f sequencing performed, and reads were combined before

PAGE 61

! () ! downstream analysis. For 3 additional samples (M3 C4 D4, O3 C3 D3, O3 C3 D4) one lane of sequencing was performed. For the other 4 samples (M3 C5 D1, M3 C5 D2, M3 C5 D3, M3 C5 D4) DNA was sheared to 30 0bp with a Covaris S220, metagenomic sequencing libraries were prepared using the Nugen Ovation Ultralow Prep kit, and all four samples were multiplexed on one lane of Illumina HiSeq 2x125 sequencing at the University of Colorado Denver Anschutz Medical Ca mpus Genomics and Microarray Core. Assembly and binning of OWC Thorarchaeota . For initial assembly of the 5 full lane sequencing runs, adapter removal, read filtering and trimming were completed using BBDuk (sourceforge.net/projects/bbmap) ktrim=r, minle n=40, minlenfraction=0.6,mink=11 tbo, t pe k=23, hdist=1 hdist2=1 ftm=5 , maq=8, maxns=1, minlen=40, minlenfraction=0.6, k=27, hdist=1, trimq=12, qtrim=rl. Filtered reads were assembled using megahit (Li et al. 2015) version 1.0.6 with -k list 23,43,63,83,103,123. The individual metagenome from the O3 C4 D3 sample was binned using Emergent Self Organizing Maps (ESOM) (Dick et al. 2009) of tetranucleotide frequency (5kb contigs, 3kb windows). BLAST hits of predicted proteins identified a Thorarchaeota population bin. All scaffolds containing a window in this bin were used as a mapping reference and reads from the 9 OWC libraries were mapped to this bin using bbsplit with default parameters (sourceforge.net/projects/bbmap). The mapped reads were reassembled using SPA des version 3.9.0 with -careful k 21,33,55,77,95,105,115,125 (Bankevich et al. 2012) . Finally, the reads which were input to the reassembly were mapped to the assembled scaffolds using Bowtie 2 (Langmead and Salzberg 2012) to generate a coverage profile which was used to manual identify bins using Anvi'o (Eren et al. 2015) . Proteins were predicted using prodigal (Hyatt et al. 2010) and searched against UniRef90

PAGE 62

! (* ! release 11 2016 (Suzek et al. 2015) , with the taxonomy of best blast hits used to validate contigs as probable Thorarchaeota. Contigs having no top hit to the publicly available Thorarchaeota genomes were manually examined and removed if they could be assigned to another genome bin in the larger metagenomic assembly. Genome completeness and contamination was estimated using CheckM (Parks et al. 2015) . Identification of diphthamide biosynthesis genes and EF 2 homologs in eukaryotes and archaea. The EGGNOG members dataset (available at http://eggnogdb.embl.de/#/app/downloads) was surveyed for sequences corresponding to the following clusters of orthologous groups (COG): EF 2, COG0480; DPH1/DPH2, COG1736; DPH3, COG5216; DPH4, COG0484; DPH5, COG1798 ; DPH6, COG2102; and DPH7, ENOG4111MMJ. For genomes not represented in EGGNOG, we manually inspected publicly available genomes as indicated by Ôorthology assignment source' (Supplementary File S1). Similarly, an in house arCOG dataset, modeled after the p ublicly available arCOGs from Makarova et al. (Makarova, Wolf, and Koonin 2015) , was queried for the c orresponding COG distribution in relevant archaeal genomes. Finally, aEF 2 and aEF 2p genes in Thorarchaeota OWC Bin 2,3 and 5 were identified using HMMER: version 3.1b2, hmmsearch -cut tc (Eddy 2011) against PFAM models PF00 679 (EF G_C) and PF03764 (EFG_IV). Conserved synteny surrounding the Thorarchaoeta aEF 2p gene was used to further search for partial aEF 2p genes. In addition, all contigs with matching HMM hits to dph2 and dph5 in the full OWC assembly were manually ex amined for potential Thorarchaeal dph genes; none were identified.

PAGE 63

! (+ ! Phylogenetic analyses Elongation factor 2: EF 2 and EF 2 paralogs of Asgard archaea, Koarchaeota and Bathyarchaeota were aligned with a representative set of archaeal, bacterial EF 2 and eukaryotic EF 2, EFL1 and snRNP homologs using mafft linsi (Kat oh and Standley 2013) . Subsequently, poorly aligned ends were removed manually before the alignments were trimmed with trimAl 5% (Capella Gutierrez et al. 2009) , yielding 871 aligned amino acid positions. Maximum likelihood analyses were performed using IQ tree using the mixture model LG+C60+R4+F, which was selected among the C series models based on its Bayesian information criterion score by the built in model test implemented in IQ tree. Branch supports were assessed using ultrafast bootstrap approximation as well as with single branch test ( alr t option). Diphthamide biosynthesis proteins Dph1/Dph2 (IPR016435; arCOG04112) and Dph5 (IPR004551; arCOG04161): Both Dph1 and Dph2 as well as Dph5 homologs of a representative set of eukaryotes were aligned with archaeal Dph1/2 and Dph5 homologs, respect ively. Several DPANN genomes contain two genes encoding the CTD and NTD of Dph1/2 (Fig. 3. 1, Supplementary File S 3. 1) such that Dph1/2 homologs of these organisms had to be concatenated prior to aligning Dph1/2 sequences. Alignments were performed using ma fft linsi and trimmed with BMGE (Criscuolo and Gribaldo 2010) using the blossum 30 matrix and setting the entropy to 0.55. This resulted in final alignments of 170 (Dph1/2) and 221 (Dph5). Maximum lik elihood analyses were performed using IQ tree (Nguyen et al. 2015) with the mixture models re sulting in the lowest BIC: LG+C50+R+F (Dph1/2) and LG+C60+R+F (Dph5), respectively. Branch supports were assessed using ultrafast bootstrap approximation (Hoang et al. 2018) as well as with the single branch test ( alrt flag).

PAGE 64

! (, ! Concatenated ribosomal proteins: A phylogenetic tree of co localized ribosomal proteins was performed using the rp15 pipeline as described previously (Zaremba Niedzwiedzka et al. 2017) . In brief, archaeal ribosomal proteins encoded in the r p rotein gene cluster (requiring a minimum of 11 ribosomal proteins) were aligned with mafft linsi, trimmed with trimAl using the gappyout option, concatenated and subjected to maximum likelihood analyses using IQ tree with the LG+C60+R4+F model chosen base d on best BIC score as described above. Branch supports were assessed using ultrafast bootstrap approximation as well as with the single branch test ( alrt option) in IQTREE. Structural modeling of EF 2 homologs . Structural models of a/eEF 2 genes and pa ralogs were generated using the i Tasser standalone package version 5.1 (Yang et al. 2015) , and visualized and analyzed using U CSF Chimera version 1.11.12 (Petter sen et al. 2004) . The best structural hits to the PDB for each sequence's top scoring model were identified using COFACTOR (Roy, Yang, and Zhang 2012) . The Drosophila melanogaster eEF 2 structure in complex with the ribosome (PDB:4V6W) was used as a structural reference to which all models were superim posed (aligned) using Chimera's MatchMaker. Loop motif logos of EF 2 homologs e/aEF 2 and paralog sequences which were used to generate the EF 2 tree were clustered at 90% amino acid identity using CD HIT: version 4.6, c 0.9 n 5 (Fu et al. 2012) and the sequence alignment was filtered to retain only cluster centroids. The conserved loop sequences were extracted from the filtered EF 2 al ignment using Jalview version 2.10.1 (Waterhouse et al. 2009) , verified by cross referencing to the structural models, and sequence

PAGE 65

! )! logos generated on cluster centroids only using WebLogo: version 2.8.2 (weblogo.berkeley.edu) (Crooks et al. 2004) . Accession Numbers Taxonomy and accession numbers for all genes analyzed in this study are listed in Supplementary File S 3. 1. Results Most Asgard archaea, Korarchaeota and Geoarchaea as well as parabasalids lack diphthamide synthesis genes It was previously assumed that EF 2 of all eukaryotes and Archaea was uniquely characterized by the presence of diphthamide. To examine if this assumption is still valid when taking into account recently sequenced genom es, we surveyed 337 archaeal and 168 eukaryotic genomes (File S1) for each of the three known archaeal (De CrÂŽcy Lagard et al.) and seven eukaryotic (Su, Chen, et al. 2012; Su, Lin, et al. 2012; Uthman et al. 2013) dph genes . While most archaeal genomes encode clear dph homologues, we failed to detect the diphthamide biosynthesis genes in a large divers ity of metagenome assembled genomes (MAGs) of uncultured archaea, including newly assembled MAGs analyzed for this study (Fig. 3. 1, Supplementary Fig. S 3. 1, Supplementary File S 3. 1). In particular, our analyses showed that, as reported for K. cryptophilum (De CrÂŽcy Lagard et al.; Elkins et al. 2008) , all Korarchaeota and Geoarchaea as well as nearly all members of the Asgard archaea lack the conserved archaeal diphthamide biosynthesis genes dph1/2 , dph5 and dph6 . As an exception, Asgard archaea related to the Heimdallarchaeote LC3 clade were found to encode the complete archaeal dip hthamide biosynthetic pathway (Fig. 3. 1). Genes coding for Dph5 and Dph6 could not be detected in two Bathyarchaeota draft genomes (RBG_13_46_16b and

PAGE 66

! )$ ! SG8_32_3). However, it is unclear whether these two genomes are in the process of losing dph biosynthesis genes or whether the absence of dph5 and dph6 genes is due to the incompleteness of these draft genomes. We also surveyed 168 eukaryotic genomes and high quality transcriptomes, including those lineages that have undergone drastic genome reduction, such as microsporidians (Corradi et al. 2010) , diplomonads (Morrison et al. 2007) , and degenerate nuclei (i.e., nucleomorphs) of secondary plastids in cryptophytes (Lane et al. 2007) (Supplementary File S 3. 1) for dph gene homologs. We detected dph homologues in all eukaryotic genomes and transcriptomes except for parabasalid protists, including animal pathogens such as Trichomonas vaginalis, Tritrichomonas foet us and Dientamoeba fragilis (Supplementary File S 3. 1). Unless these archaea and parabasalids possess alternative, yet undiscovered diphthamide biosynthesis pathways, these findings suggest that their cognate EF 2 lacks the modified diphthamide residue. As a peculiarity, while the Dph1/2 protein is encoded by a single fusion gene in seemingly all archaea, we found that in several members of the DPANN archaea (Rinke et al. 2013; Castelle et al. 2015) this protein is encoded by two genes that separately code for the N and C terminal domains. T o our knowledge, this is the first systematic report of the widespread absence of diphthamide biosynthesis in diverse eukaryotes and archaea.

PAGE 67

! )% ! Figure 3.1 Diphthamide biosynthesis genes are conserved across most eukaryotic and archaeal lineages. Eukaryotic and archaeal orthologues of diphthamide biosynthesis (DPH) genes were retrieved from the publicly available EGGNOG and an in house archaeal orthologues (arCOG) datasets. Complete list of genomes surveyed can be found in Supplementary File S1 in cluding reduced genomes from nucleomorphs (not shown on figure). Total number of genomes surveyed are shown next to each group. Since Dph4 is a member of the large DNAJ containing protein family, we could not unequivocally identify this protein based on or thology alone and is therefore excluded from the figure. + No arCOG available for DPH3. *All eukaryotic genomes are complete except five deeply sequenced transcriptomes from Parabasalia; dark and light grey circles indicate whether homologues were detected in more or less than 50% of the genomes surveyed respectively; yellow circles indicate the absence of a detectable homologue; pink circles indicate lack of conservation of the diphthamide modification motif; half circles indicate the presence of multiple copies of EF 2 with and without the conserved diphthamide modification motif. 1 Homologue detected in the original assembly (ABR_125 (Zaremba Niedzwiedzka et al., 2017) ) but not in the reassembly (ABR16 genome); a closer inspection of the contig r evealed that it is chimeric and will thus be removed from the final bin; 2 Homologue detected in only one Lokiarchaeota assembly (AB_15); 3 Several DPANN genomes contain two proteins that encode the CTD and NTD of Dph1/2, respectively. Opisthokonta (189) Apusozoa (1) Amoebozoa (5) Excavata (13) Archaeplastida (25) SAR (29) Parabasalia (6*) !"#$%%%% !"#&%%%% !"#' ( %%%% !"#)*!"#+%%%% ,-./0-1%23-45-6 !"#$%&'()% 78+ !"#9%% :78+%;:4:&? Geoarchaeota >)@? Bathyarchaeota >))? Crenarchaeota >&@? Heimdallarchaeota >'? 734A:4BC:-/D:%%>)&9? Heimdallarchaeota LC3 >+? EC/4:4BC:-/D:%%>F? Odinarchaeota >)? !"GHH%%>IF? Thaumarchaeota >)I? Korarchaeota >$? Aigarchaeota >I? J%&@K%=-./0-1 LM%&@K%=-./0-1 3.523-%!"#%0/D5N H/%C/0/
PAGE 68

! )& ! Various archaea l genomes that lack diphthamide biosynthesis genes encode an EF 2 paralog To shed light into the implications of the potential lack of diphthamide in members of the Asgard archaea and Korarchaeota , we performed detailed analyses of eukaryotic and archaeal EF 2 homologs (Fig. 3. 1). First, we found that the draft genomes of most Asgard archaea, some Korarchaeota (Kor 1 and 3), and a few Bathyarchaeota encode two distantly related EF 2 paralogs. In contrast, the genomes of K. cyptophilum and two nove l marine Korarchaeota (Kor 2 and 4) and Heimdallarchaeota LC2 and LC3 as well as Geoarchaea do not encode an EF 2 paralog. Given that the Heimdallarchaeota LC2 genome was estimated to be only 70 79 % complete (Zaremba Niedzwiedzka et al. 201 7) , and based on phylogenetic analyses (see below), we consider it possible that this genome might encode an as yet unassembled aEF 2 paralog. The presence of paralogous aEF 2 in most Asgard archaea and some Korarchaeota genomes corresponds with the abs ence of diphthamide synthesis genes (Fig. 3. 1 and 3. 2). Yet, even though the genomes of K. cryptophilum , Kor 2, Kor 4, and Geoarchaea as well as of Heimdallarchaeote LC2 lack dph genes, they do not encode an EF 2 paralog. In all other archaeal genomes, inc luding that of Heimdallarchaeote LC3, the absence of an EF 2 paralog correlates with the presence of dph genes.

PAGE 69

! )' ! Figure 3.2

PAGE 70

! )( ! Figure 3.2 The evolution of archaeal EF 2 family proteins. Phylogenetic tree of EF 2 family proteins based on maximum likelihood analyses of 871 aligned positions using IQ tree. EF 2 of Bathyarchaeota grouping in an unexpected position or representing potential aEF 2p are shaded in orange. aEF 2 of Kor and Asgard archaea are shaded in purple, while their aEF 2p are shaded in green. Highlighted amino acids show the conservation of key residues and black/white circles reveal the presence/absence of dph biosynthesis genes in the respective organisms/MAGs. Branch supp ort values are based on ultrafast bootstrap approximation as well as single branch tests, respectively and are represented by differentially colored circles as detailed in the figure panel. Whenever branch support values were below 80 for any of the two me thods, values have been removed and branches cannot be considered significantly supported. Scale bar indicates the number of substitutions per site. Abr.; snRNP: U5 small nuclear ribonucleoprotein EFL1: elongation factor like GTPase ; n.c.: not conserved; p.c.: partially conserved; n.d.: not determined.

PAGE 71

! )) ! Archaea with two EF 2 family proteins encode only one bona fide EF 2 We next addressed whether residues and structural motifs shown to be necessary for canonical translocation were conserved in t he various EF 2 and EF 2 paralogs. Domain IV of EF 2, representing the anticodon mimicry domain, is critical for facilitating concerted translocation of tRNA and mRNA (Rodnina et al. 1997; Ortiz et al. 2006) . This domain includes three loops that extend out from the body of EF 2 and interact with the decoding center of the ribosome. The first of these three loops (HxDxxHRG) (canonical residue positions are numbered according to sequence associated with D. mel anogaster structural model PDB 4V6W (Anger et al. 2013) ) contains the site of the diphthamide modified histidine, H701, and is highly conserved across archaea and eukaryotes (Ortiz et al. 2006; Y. Zhang et al. 2008) . High con servation is also seen in a second adjacent loop (SPHKHN) in the a/eEF 2 domain IV (S581 N586), which contains a lysine residue (K584) that interacts directly with the tRNA at the decoding center, and is itself positioned by a stacking interaction between P582 and H585 (Murray et al. 2016) . The third loop appears to stabilize the diphthamide loop, partially via a salt br idge formed between a nearby glutamate residue (E660) and R702 in the diphthamide loop (Anger et al. 2013) . Both of these residues are highly conserved among archaea and eukaryotes. Our analyses reveal that the sequence motifs in these loops are also strictly conserved among the EF 2 family proteins o f the Heimdallarchaeote LC3 lineage, Geoarchaea , as well as in those Korarchaeota and Bathyarchaeota that lack an EF 2 paralog (Fig. 3 .3 , Supplementary Fig. S 3. 2a). Notably, this conservation is seen irrespective of the presence or absence of dph genes in those genomes. However, most bona fide EF 2 of parabasalids (which lack dph genes), possesses a glycine to asparagine mutation at residue 703 (Fig. 3. 3,

PAGE 72

! )* ! Supplementary Fig. S 3. 2b, Supplementary Fig. S 3. 3a), which may compensate for the lack of the diphthamide residue by contributing an amide group (Fig. 3. 3, Supplementary Fig. S 3. 3b). In contrast, in those Asgard archaea and Korarchaeota (Kor 1/3 clade) that encode two EF 2 family proteins, even within the bona fide EF 2 copy, these domain IV mot ifs show reduced conservation. In the diphthamide loop, R702 is universally replaced by a threonine residue. In 21 of 22 aEF 2 proteins, there is a correlated mutation of E660 to either arginine or lysine (Supplementary Fig. S 3. 4). Structural homology mode ling suggested that these correlated mutations likely prevent unfavorable electrostatic interactions between domain IV loops, and maintain stabilization of the diphthamide loop (Supplementary Fig. S 3. 4). While G703 is conserved in most EF 2s of archaea, al l Lokiarchaeota (except Lokiarchaeota CR_4), encode either a serine or a glutamine at this site (Fig. 3. 3, Supplementary Fig. S 3. 2a). Furthermore, analysis of the second loop (S581 N586) revealed additional crucial mutations in the EF 2 of these archaea; n otably, K584 is not conserved (Fig. 3. 3, Supplementary Fig. S 3. 2a). Despite these modifications which correlate with the presence of an EF 2 paralog in these archaea, there is still evidence for strong selection pressure maintaining many of the key conserv ed residues in these domain IV motifs, including H701, the target site of diphthamide modification (Fig. 3 .3 , Supplementary Fig. S 3. 2a). In contrast, our analyses of the multiple sequence alignment and structural models suggest that the paralogous EF 2 ( aEF 2p) proteins encoded by these archaea lack conservation in the stabilizing second loop (SPHKHN) as well as the first diphthamide loop (HxDxxHRG), including H701 (Fig. 3. 3). Based on predicted fold conservation in domains I and II, and the overall conse rvation of the five sequence motifs (G1 G5) characterizing

PAGE 73

! )+ ! GTPase superfamily proteins (Atkinson 2015) , aEF 2p likely maintains GTPase activity (Supplementary Fig. S 3. 5). However, given the apparent lack of conservation in key domain IV loops, it is unlikely that aEF 2p proteins can serve as functional translocases in protein translation. Figure 3.3 Predicted structure of Asgard archaea EF 2 and EF 2 paralogs Structural modeling of representative EF 2 genes and paralogs compared to eukaryotic EF 2 structure shows conservation of overall EF 2 structure regardless of diphthamide synthesis capacity (top). The overall fold of two loops located at the tip of domain IV is conserved, but otherwise highly conserved sequence motifs in these loops are not conserved in DPH Asgard archaea and Korarchaea or in EF 2 paralogs (middle). Bottom panels show a close up of the key residues from the motifs, highlighting that these residues are those positioned at the tip of the domain IV loops crucial for interaction with the decoding site in canonical EF 2 structures. Histidine residue that is the site of dph modification is starred. Thorarchaeota OWC-2 EF-2 paralog DPH Asgard and Korarchaeum EF-2 paralog 0 1 2 3 4 bits P S K E I E L S P D K P E R Q N L K F E R L T N A S 0 1 2 3 4 bits K N G S E P I D N G I S Q K S N diphthamide loop CDK-recognition loop V499 S502 T501 S589 E587 T584 K582 Thorarchaeota OWC-2 putative EF-2 DPH Asgard and Korarcheaum putative EF-2 1 2 3 4 bits H N T V E D T P R A H R T S Q G 1 2 3 4 bits S S N A P N S K T G L H N diphthamide loop CDK-recognition loop G485 H486 P483 H593 H598 D595 G600 D. melanogaster eEF-2 PDB : 4v6w:Az DPH + Eukaryota canonical eEF-2 diphthamide loop CDK-recognition loop G703 D698 H701 H696 K584 P582 H585 Hadesarchaea DG-33 aEF-2 0 1 2 3 4 bits H E D T S P A A V I H R G 0 1 2 3 4 bits T S P N R K H N diphthamide loop CDK-recognition loop DPH + Archaea canonical aEF-2 P481 H590 H595 D592 G597 H484 K483 * * * 0 1 2 3 4 bits H G S T A D S A L I H R G 0 1 2 3 4 bits S A Q P N K H N

PAGE 74

! ), ! EF 2 homologs of archaea experienced complex evolutionary history To resolve the evolutionary history of EF 2, we performed phylogenetic analyses of archaeal EF 2 (aEF 2) and aEF 2p, bacterial EF G and eukaryotic EF 2 family proteins, i.e. EF 2, Ria1 (or Elon gation factor like, EFL1) and Snu114 (or U5 small nuclear ribonucleoprotein, snRNP/ U5 116kD) (Fig. 2) (Atkinson 2015) . First, our analyses revealed that sequences from all non LC3 Asgard archaea and the Kor 1 and 3 marine Korar chaeota formed two distinct clades, one of which contains canonical aEF 2 proteins (as defined by conservation of the domain IV loop known to interact with the ribosomal decoding center during translocation) while the other cluster comprises aEF 2p (Fig. 3 . 2). However, the phylogenetic placement of these protein clades relative to each other and within the phylogenetic backbone is not fully resolved due to lack of statistical support. This might be caused by modified (accelerated) evolutionary rates that ap pear to characterize the evolution of aEF 2 and aEF 2p in lineages that encode a paralog, as indicated by increased relative branch lengths for both the aEF 2 and aEF 2p clades (Fig. 3. 2, Supplementary Files S 3. 2 and S 3. 3). Secondly, bathyarchaeal EF 2 homologs were also found to form two separate clades. One of these clades is placed within the TACK superphylum, and includes both canonical bathyarchaeal EF 2s as well as potential paralogs (i.e., RBG_13_46_16b and SG8 32 3). In contrast, the second clade is only comprised of two sequences (i.e., RBG_13_46_16b and AD8 1), and is placed as a sister group of all TACK, Asgard and eukaryotic EF 2 homologs (Fig. 3. 2). In spite of this deep placement in the phylogenetic analyses, the second clade is comprised of the canonical EF 2 homologs of Bathyarchaeota genomes RBG_13_46_16b and AD8 1, based on analysis of key domain IV residues. Currently, only the most complete

PAGE 75

! *! of the latter two draft genomes, RBG_13_46_16b, contains an aEF 2 paralog. Therefore, the curren t data is insufficient to resolve the puzzling pattern of EF 2 evolution in the Bathyarchaeota phylum. Finally, in our analysis, eEF 2, Ria1 and Snu114 were found to form a highly supported monophyletic group that emerged as a sister group to the aEF 2 pr oteins encoded by the genomes comprising the Heimdallarchaeote LC3 clade (LC3 and B3). Close inspection of the EF 2 sequence alignment revealed that eukaryotic and LC3 EF 2 homologs share common indels to the exclusion of all other archaeal EF 2 family protein sequences (Supplementary Fig. S 3. 6, Supplementary Fig. S 3. 7). Notably, these highly conserved indels were found to be encoded by the genomic bins of two distantly related members of the Heimdallarchaeota LC3 lineage, which were independently assem bled and binned from geographically distinct metagenomes (Spang et al. 2015; Zaremba Niedzwiedzka et al. 2017) . This refutes recently raised claims stating that these indels in Heimdallarchaeote LC3 may be the results of contamination from eukaryotes (Da Cunha et al. 2017) while supporting the sister relationship of eukaryote s and Asgard archaea (Spang et al. 2015; Eme et al. 2017; Zaremba Niedzwiedzka et al. 2017; Spang et al. 2018) . In addition, despite the low sequence identity of 39%, the high confidence modeled structure of Heimdallarchaeote LC3 EF 2 was highly similar to Drosophila melanogaster eEF 2 (RMSD (root mean square deviation) 1.3 across all 796 residues to D. melanogas ter structural model PDB 4V6W (Anger et al. 2013) ; Supplementary File S 3. 1). By comparison, the Heimdallarchaeaote AB 125 model aligns less confidently to the Drosophila EF 2 structure (RMSD 16.4). The observed phylogenetic topology and the presence of the full complement of dph biosynthesis genes in LC3 genomes (Figs. 3. 1 and 3. 2), support an

PAGE 76

! *$ ! evolutionary scenario in which Heimdallarchaeote LC3 and eukaryotes share a common ancestry with EF 2 being vertically inherited from this archaeal ancestor. Discussion The use of metagenomic approaches has led to an expansion of genomic data from a large diversity of previously unknown archaeal and bacterial lineages and has changed our perception of the tree of life, microbial metabolic diversity and evolution, as well as the origin of eukaryotes (Brown et al. 2015; Castelle et al. 2015; Spang et al. 2015; Hug, Baker, et al. 2016; Parks et al. 2017; Zaremba Niedzwiedzka et al. 2017) . Since most of what is known about archaeal informational processing machineries is based on a few model organisms, we aimed to use the expansion of genomic data to investigate key elements of the translational machinery EF 2 and diphthamidylation across the tree of life. Our analyses of archaeal EF 2 family proteins and the distribution of diphthamide biosynthesis genes have revealed unusual features of the core translation machinery in several archaeal lineages. These findings negate two long held assumptions regarding the archaeal an d eukaryotic translation machineries, with both functional and evolutionary implications. First, we show that diphthamide modification is not universally conserved across Archaea and eukaryotes. Second, we demonstrate that, much like Bacteria and eukaryote s (Atkinson 2015) , the archaeal EF 2 protein family has undergone several gene duplication events, presumably coupled to functional differentiation of EF 2 paralogs, throughout archaeal evolution. The evolution of archaeal diph thamide biosynthesis and EF 2 is especially intriguing in the context of eukaryogenesis. Recent findings based on comparative genomics indicate that eukaryotes evolved from a symbiosis between an alphaproteobacterium with an archaeal

PAGE 77

! *% ! host that shares a mos t recent common ancestor with extant members of the Asgard archaea, possibly a Heimdallarchaeota related lineage (Spang et al. 2015; Zaremba Niedzwiedzka et al. 2017) . Our study adds additional data to support this scenario by revealing close sequence and predicted structural similarity of canonical EF 2 proteins of the Heimdallarchaeote LC3 lineage and eukaryotic EF 2 proteins, including shared indels. Furthermore, phylogenetic analyses of EF 2 family proteins reveals that EF 2 of the Heimdallarchaeote LC3 lineage forms a monophyletic group with EF 2 family proteins of eukaryotes, and therefore suggests that the archaeal ancestor of eukaryotes was equipped with an EF 2 protein similar to the homologs found in this lineage. The subsequent evolution of the eukaryotic EF 2 family appears to have included at least two ancient duplication events leading to Ria1 and Snu114. Importantly, the presence of characteristic eukaryotic indels in EF 2 of all members of the Heimdallarchaeote LC3 lineage further strengthens this hypo thesis and underlines that concerns raised about the quality of these genomic bins (Da Cunha et al. 2017) are unjustified (Spang et al. 2018) . In addition, the LC3 clade also represents the sole group within the Asgard archaea that is characterized by the presence of the full complement of archaeal diphthamide biosynthesis pathway genes. However, while phylogenetic analyses of Dph1/2 show weak support for a sister relationship between Heimdallarchaeota and eukaryotes, eukaryotic Dph5 appears to be most closely related to homologs of Woesearchaeaota (Supplementary Fig. S 3. 8, Supplementary File S 3. 3), an archaeal lineage belonging to the proposed DPANN superphylum (Rinke et al. 2013; Castelle et al. 2015; Willi ams et al. 2017) , comprising various additional lineages with putative symbiotic and/or parasitic members (reviewed in Spang et al. (Spang, Caceres, and Ettema 2017) ). Notably, a previous study has also revealed

PAGE 78

! *& ! an affiliation of some eukaryotic tRNA synthetases with DPANN archaea (Furukawa et al. 2017) . Given that several DPANN lineages infect or closely associ ate with other archaeal lineages, they may exchange genes with their hosts frequently, as was shown for Nanoarchaeum equitans and its crenarchaeal host Ignicoccus hospitalis (Podar et al. 2 008) . Following a similar reasoning, the archaeal ancestor of eukaryotes (i.e. a relative of the Asg ard archaea) may have acquired genes (e.g. dph5 ) from an ancestral DPANN/Woesearchaeota symbiont. However, prospective analyses and generation of genomic data from additional members of the Asgard and DPANN archaea are necessary to test this hypothesis and to clarify the evolutionary history of the origin of diphthamide biosynthesis genes in eukaryotes. Furthermore, our findings have practical implications for studies that involve phylogenetic and metagenomic analyses. Previously, EF 2 has been widely used as a phylogenetic marker, in both single gene (Iwabe et al. 1989; Baldauf, Palmer, and Doolittle 1996; Hashimoto and Hasegawa 1996; Elkins et al. 2008) , and multiple gene alignments of universal single copy genes [ (Williams et al. 2012; Guy, Saw, and Ettema 2014; Raymann, Brochier Armanet, and Gribaldo 2015) , and others] to assess the relationships between Archaea, Bacteria and eukaryotes. However, the presence of paral ogs of EF 2 in various Archaea and eukaryotes suggest that EF 2 should be excluded from such datasets. In addition, EF 2, Dph1/2, and Dph5 are part of single copy marker gene sets regularly used to estimate genome completeness and purity of archaeal metage nomic bins (Wu and Scott 2012; Parks et al. 2015) . The presence of duplicate d aEF 2 gene families, the absence of dph genes in most Asgard archaea, Geoarchaea and Korarchaeota , and the presence of two split genes

PAGE 79

! *' ! for Dph1/2 in DPANN makes these genes unsuited as marker genes, and should hence be excluded from marker gene sets used to assess genome completeness. The observed absence of dph biosynthesis genes in various Archaea as well as parabasalids is surprising given that diphthamide was previously thought to be a conserved feature across Archaea and eukaryotes (Schaffrath et al. 2014) , and critical for ensuring translational fidelity (Ortiz et al. 2006) . While we currently cannot rule out the possibility that dph lacking archaea and parabasalids perform the multi step process of diphthamidylation using a set of yet unknown enzymes, future proteomics studies will be needed to conclusively rule out the presence of diphthamide in these taxa. Yet, it is more likely th at these groups have evolved a different mechanism or mechanisms to fulfill the proposed roles of diphthamide in translation. Many of the dph lacking archaeal genomes encode two paralogs of the aEF 2 gene. Despite the apparent absence of diphthamide, our sequence and structural modeling analyses imply that these dipthamide deficient aEF 2 proteins are likely under strong selective pressure to maintain translocase function. In contrast, analyses of the aEF 2p suggest that, while this paralog is a member of the translational GTPase superfamily, aEF 2p is unlikely to function in the same manner as canonical aEF 2. In fact, the complete lack of sequence conservation in aEF 2p key domain IV loop residues indicates that these paralogs are not likely to act as tra nslocases (Fig. 3. 3, Supplementary Fig. S 3. 2a) (Rodnina et al. 1997; Ortiz et al. 2006) and instead perform alternative roles. For instance, it seems possible that aEF 2p may compensate for the absence of diphthamide in at least some dph lacking lineages. However, other functions for aEF 2p suc h as error correcting back translocation or ribosome recycling also seem possible, given the observed sub and neo functionalizations seen in

PAGE 80

! *( ! eukaryotic and bacterial EF 2/EF G paralogs (Qin et al. 2006; Tsuboi et al. 2009) . Alternatively, given proposed regulation of translation via ADP ribosylation of diphthamide (Schaffrath et al. 2014) and a role of diphthamide in responding to oxidative stress (ArgŸelles et al. 2013; ArgŸelles et al. 2014) , aEF 2p could perform another, yet unknown role in translation regulation. Currently, the consequences for the absence of dph biosynthesis genes in parabasalids and in several Archaea remain unclear. Future studies could gain insight into such questions by studying translation in the genetically tractable parabasalid Trichomonas vaginalis, whose cell biology and metabolism has been extensively studied . In addition, acquisition of additional sequencing data or enrichment cultures from members of the Asgard superphylum, Korarchaeota , and other novel archaeal lineages will lead to a better understanding of the evolution and function of EF 2 family proteins, and the absence of dph biosynthesis genes. Acknowledgements We thank Jordan Angle, Kay Stefanik, Rebecca Daly, and Kelly Wrighton for assistance with sampling of OWC sediments, and Felix Homa for computational support. Sequencing of OWC metagenomes was conducted in part by the U.S. Department of Energy Joint Genom e Institute, a DOE Office of Science User Facility that is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE AC02 05CH11231. Sequencing of Aarhus bay metagenomes was performed by the National Genomics Infrastructure sequencing platforms at the Science for Life Laboratory at Uppsala University, a national infrastructure supported by the Swedish Research Council (VR RFI) and the Knut and Alice Wallenberg Foundation. We thank the Uppsala Multidisciplinary Center for Adva nced Computational Science (UPPMAX) at Uppsala University and the Swedish

PAGE 81

! *) ! National Infrastructure for Computing (SNIC) at the PDC Center for High Performance Computing for providing computational resources. This work was supported by grants of the European Research Council (ERC Starting grant 310039 PUZZLE_CELL), the Swedish Foundation for Strategic Research (SSF FFL5) and the Swedish Research Council (VR grant 2015 04959) to T.J.G.E.. C.W.S. is supported by a European Molecular Biology Organisation long te rm fellowship (ALTF 997 2015) and the Natural Sciences and Engineering Research Council of Canada postdoctoral research fellowship (PDF 487174 2016). Author Contributions Adrienne Narrowe, Anja Spang, Christopher Miller, and Thijs Ett ema designed the b ioinformatics and computational experiments . Adrienne Narrowe, Anja Spang, Courtney Stairs, Eva Caceres, and Brett Baker conducted phylogenetic analyses . Adrienne Narrowe, Anja Spang, Courtney Stairs, and Eva Caceres conducted EF2 sequence analyses. Adri enne Narrowe and Christopher Miller performed protein structural analyses. Adrienne Narrowe, Anja Spang, Courtney Stairs, and Eva Caceres conducted sequence analyses . Adrienne Narrowe, Anja Spang, and Christopher Miller wrote the manuscript contained in this chapter , with contrib utions from Courtney Stairs, Eva Caceres , and Thijs Ettema . The final manuscript was read and approved by all co authors.

PAGE 82

! ** ! CHAPTER IV BATHYARCHAEOT A POPULATIONS IN WETLAND SOILS CONTAIN PREVIOUSLY UNKNOWN METABOLIC COMPLEXITY 3 Introduction I nitially described in marine sediments as the Miscellaneous Crenarchaeota Group (MCG) (Inagaki et al. 2003) , it has become clear that the recently renamed Bathyarchaeota may be the most abundant and broadly distributed archaeal phylum globally (Biddle et al. 2006; S¿rensen and Teske 2006; Meng et al. 2009; Kubo et al. 2012; Lloyd et al. 2013; Men g et al. 2014; Fillol, Sanchez Melsio, et al. 2015; Fillol, Auguet, et al. 2015; Lazar et al. 2015; Xiang et al. 2017) . Marker gene studies have shown that the Bathyarchaeota are broadly distributed across a variety of habitats, including marine sedime nts (Inagaki et al. 2003; Biddle et al. 2006; S¿rensen and Teske 2006; Lloy d et al. 2013; He et al. 2016; Yu et al. 2017) , estuarine sediments (Lazar et al. 2015; Lazar et al. 2016) , wetland soils (Narrowe et al. 2017) , aquifers (Anantharaman et al. 2016; Jewell et al. 2017) , coal bed methane we lls (Evans et al. 2015) and hot springs (McKay et al. 2017) . Despite the global distribution, Bathyarchaeota thus far remain poorly characterized at the genomic level relative to their broad global distribution. While 16S rRNA gene trees suggest up to 22 family level subgroups within this phylum (Kubo et al. 2012; Meng et al. 2014; Lazar et al. 2015; McKay et al. 2017; Xia ng et al. 2017) , there are as yet no isolate cultures or genomes, and only 39 partial to near complete Bathyarchaeota genomes found in !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!! ! & ! This work was a collaborative project with the following authors: Adrienne B. Narrowe, Lindsey M. Solden, Jordan C. Angle, Rebecca A. Daly, Mik aya A. Borton, Kelly C. Wrighton, Christopher S. Miller. Full author contributions are listed at the end of the chapter. !

PAGE 83

! *+ ! public databases. Of those 39, the ones including a 16S rRNA gene that can be assigned to a subgroup represent only 8 of the 22 subgroups . Functional gene predictions from the partial to near complete Bathyarchaeota genomes suggest that the Bathyarchaeota metabolisms are as diverse as their phylogenetic and habitat distribution. Described partial genomes indicate evide nce for heterotrophy (Biddle et al. 2006; Seyler, McGuinne ss, and Kerkhof 2014) including the potential for degradation of extracellular proteins (Lloyd et al. 2013) , aromatic compounds (Meng et al. 2014) , and compl ex carbohydrates (Lazar et al. 2016) , as well as autotrophic assimila tion of carbon (Evans et al. 2015; He et al. 2016; Lazar et al. 2017) . Multiple lines of genomic evidence now support a role for some Bathyarchaeota as sources of acetate, either via fermentation (Lazar et al. 2016) or as homoacetogens (He et al. 2016) , and it appears that many Bathyarchaeota are capable of both pathways. On the basis of pathway analysis in two partially assembled genomes, some Bathyarchaeota may be m ethanogens (Evans et al. 2015) . Respirat ion via reduction of protons may be possible for subgroup 1, but no other respiratory processes have been conclusively inferred for this phylum (Lazar et al. 2016) . The large scale metabolic inferences are thus far based on sampling from only three environments: coal bed methane wells (Evans et al. 2015) , marine sediments (He et al. 2016) , and estuarine sediments (Lazar et al. 2016) , and Bathyarchaeota genomic potential in freshwater environments has yet to be fully characterized (Jewell et al. 2017) . Across and within these habitats, biogeographic distributions have suggested subgroup specific habitat preferences, likely linked to distinct metabolic features of the different Bathyarchaeota subgroups (Fillol, Auguet, et al. 2015; Fillo l, Sanchez Melsio, et al. 2015; Lazar et al. 2015; Xiang et al. 2017) . However, such a linkage between genome inferred metabolism and habitat preference has

PAGE 84

! *, ! been suggested for only 4 partial genomes from a single site/habitat type to date (Lazar et al. 2016) . Our previous work on archaeal community diversity in a metha ne emitting freshwater wetland identified that Bathyarchaeota comprised up to 1/3 of all archaeal 16S rRNA gene sequences ( Chapter II, Figures 2.3 and 2.5; Narrowe et al. 2017) . Sampling across a wetland hydrological gradient and down a soil core depth gradient provided clear evidence for subgroup specific habitat preferences linked to hydrologic and geochemical features. To explore the genomic determinants of these preferen ces in a freshwater habitat, and to genomically characterize previously unsampled Bathyarchaeota subgroups, we performed shotgun metagenomic sequencing of 4 samples predicted to harbor abundant, diverse Bathyarchaeota genomes from our prior 16S rRNA gene s tudy. With these data, we link Bathyarchaeota genome bins to their distributions across the physical and geochemical gradients in the wetland, confirm Bathyarchaeota core metabolic features, and expand the range of predicted metabolic potential for this p hylum. Methods and Materials Sampling and DNA extraction Freshwater wetland soil cores were collected from Old Woman Creek, Huron OH, USA (OWC) in October 2013, and total DNA extracted as described previously ( Chapter II; Narrowe et al. 2017) . Based on the high abundance of Bathyarchaeota and other taxa of interest as identified using 16S rRNA amplicon sequencing ( Chapter II; Narro we et al. 2017) , 4 samples were chosen for shotgun metagenomic sequencing (M3 C4 D3, M3 C4 D4, O3 C3 D3, O3 C3 D4) with the goal of producing metagenome assembled genomes .

PAGE 85

! +! Library preparation and Sequencing Library preparation and five lanes of Illumina HiSeq 2x125 bp sequencing followed standard operating procedures at the US DOE Joint Genome Institute (GOLD study ID Gs0114821). Sample M3 C4 D3 had replicate extraction, library preparation, and two lanes of sequencing performed, and reads were combined before quality trimming and assembl y. For 3 additional samples (M3 C4 D4, O3 C3 D3, O3 C3 D4) one lane of sequencing was performed. This study also made use of read coverage profiles from 4 additional lower read coverage samples (M3 C5 D1, M3 C5 D2, M3 C5 D3, M3 C5 D4). For these samples, DNA was sheared to 300bp with a Covaris S220, metagenomic sequencing libraries were prepared using the Nugen Ovation Ultralow Prep kit, and all four samples were multiplexed on one lane of Illumina HiSeq 2x125 sequencing at the University of Colorado Denve r Anschutz Medical Campus Genomics and Microarray Core. Genome Assembly and Binning The five full lane sequencing runs (2X M3 C4 D3, M3 C4 D4, O3 C3 D3, and O3 C3 D4) were initially assembled following JGI standard protocols as follows. Adapter removal, re ad filtering and trimming were completed using BBDuk (sourceforge.net/projects/bbmap) ktrim=r, minlen=40, minlenfraction=0.6, mink=11 tbo, t pe k=23, hdist=1 hdist2=1 ftm=5 , maq=8, maxns=1, minlen=40, minlenfraction=0.6, k=27, hdist=1, trimq=12, qtrim=rl. F iltered reads were then assembled using megahit (Li et al., 2015) version 1.0.6 with -k list 23,43,63,83,103,123. To generate coverage profiles for use in binning, all 9 read sets were mapped to final scaffolds from each assembly using seal (v. June 27, 2 016; https://sourceforge.net/projects/bbmap/) (ambiguous=random threads=8 interleaved=f

PAGE 86

! +$ ! prealloc=t nzo=false speed=12 minkmerhits=1 k=27). Each assembly was binned individually using the coverage profiles as input to MetaBAT (v. 2.12.1) ( -cvExt ) (Kang et al. 2015) . Bin completion and contamination was estimated using CheckM (Pa rks et al. 2015) and for those bins predicted as archaeal, single copy marker genes (SCGs) were predicted using Amphora2 (Wu and Scott 2012) . For the archaeal bins, the set of predicted SCGs were searched against UniRef90 (rel. 11_2017) using BLASTP (evalue 1e 10) (Altschul 2008) . The consensus taxonomy of the best blast hit for each bin's SGCs was used to identity genome bins that were likely bathyarchaeotal. Functional annotation and bin QC Gene predictions and annotations for each assembly were performed using the JGI Microbial Genome Annotation Pipeline (Huntemann et al. 2015) ; and for each putative bathyarchaeotal bin, the genes corresponding to the bi nned scaffolds were extracted and a homology search performed via BLASTP (v. 2.7.1+) (Altschul 2008) against Uniref90 (rel ease 11_2017). Bins were additionally annotated using GhostKOALA (genus_prokaryotes + family _eukaryotes + viruses) (Kanehisa, Sato, and Moris hima 2016) and Interproscan (v . 5.24 63.0) ( iprlookup goterms pa) (Jones et al. 2014) . To conservatively filter bins, any scaffold not containing at least one protein with a top hit to a Uniref90 sequence annotated as "Bathyarchaeota", "Crenarchaeota", or 'MCG' was flagged for removal from the bin. This approach risks discar ding legitimate bathyarchaeotal scaffolds with protein families that have not yet been detected in the limited existing bathyarchaeotal genomes, especially for short scaffolds. To guard against discarding true genomic novelty, we made the assumption that a scaffold with legitimate but only novel proteins in one genome bin might have close homologs in a scaffold from a phylogenetically

PAGE 87

! +% ! related bin, and that this second scaffold might contain more informative proteins with reliable homology to existing Bathy archaeota genomes. Thus, we next checked all scaffolds flagged for removal against a database of predicted proteins retained after the previous filtering step, from bins assigned as belonging to the same subgroup (see below for subgroup assignment). Any scaffolds containing at least 2 BLASTP hits to a retained scaffold from another bin, of at least 90% amino acid identity over 100 amino acids, were restored to the bin. Scaffolds containing genes from key metabolisms discussed below were also manually exa mined to verify their likely Bathyarchaeota origin. Final bin quality was estimated using CheckM (Parks et al. 2015) , which provides a measure of bin completeness and contamination based upon the presence and number of conserved single copy marker genes in the genome bins. Metabolic pathways were visualized using the KEGG Mapper Reconstruct Pathway webserver (Aoki Kinoshita and Kanehisa 2007) Identification and validation of population clusters Bin subgroup clusters (roughly corresponding to bins with the same Bathyarchae ota subgroup) were identified using several m etrics. FastANI (Jain et al. 2017) was used for an all vs. all comparison of bin to bin average nucleotide id entity ( -minFrag 20). Bin population clusters were further defined using phylogenetic markers including 16S rRNA genes and ribosomal S3 protein genes. For bins that contained multiple copies of any marker gene, the bin was retained if the phylogenetic as signment of all copies agreed at the subgroup level. Where the ANI analysis suggested linkage across subgroup bin clusters, which was inconsistent with other bin pairs within the population bin clusters, we manually examined the linkages. For the linked b ins: O3D4_bin153 (N=316 scaffolds) and O3D3_bin_323 (N=441 scaffolds) there were only 51 54 BLASTN hits greater than 100nt in length

PAGE 88

! +& ! satisfying the evalue 1e 10 parameter. While many of the proteins encoded on these scaffolds are highly conserved, the ex tremely high ANI across these scaffolds suggest that they were likely misbinned in either the O3D3 or the O3D4 assembly and binning. We conservatively removed these scaffolds from both bins to avoid making metabolic inferences based on scaffolds that we c ould not assign with confidence. Ribosomal S3 gene phylogeny and subgroup assignment Ribosomal S3 genes were identified in bins and in all publicly available Bathyarchaeota genomes/bins using hmmsearch ( -cut_tc) (Eddy 2011) against PFAM00189. Sequences were aligned using mafft (L INS i) (Katoh et al. 2002) , and the alignments trimmed using trimAl ( gappyout) (Capella Gutierrez et al. 2009) . Maximum likelihood gene phylogenies were generated from the trimmed alignments usi ng IQ TREE using the best model chosen by BIC (LG+F+R5 ) , branch supports were estimated using UFBoot ( -m MFP bb 1000 alrt 1000) (Nguyen et al. 2015; Hoang et al. 2018) , and trees were visualized using iTOL (Letunic and Bork 2007) . Identification of putative Bathyarchaeotal mcrABG genes To id entify possible Bathyarchaeotal mcrABG genes, all predicted protein coding sequences in the assemblies were searched (BLASTP, evalue 1e 10) against a database containing all mcrABG gene sequences from Candidatus Bathyarchaeota BA1 and BA2 (Evans et al. 2015) and Candidatus Syntrophoarchaeum sp. (Laso PÂŽrez et al. 2016) as both these groups encode mcrA genes that are divergent from Euryarchaeotal mcrA genes . Hits greater than 50% amino acid identity and at least 150 amino aci ds in length were se arched against GenBank to exclude sequences with higher identity to known euryarchaeotal methanogen mcrABG sequences. For each of mcrA, mcrB, and mcrG, the remaining putative

PAGE 89

! +' ! Bathyarchaeotal sequences were combined with a set of reference sequences including those of BA1, BA2, and Ca. Syntrophoarchaeum sp . , and , in the case of the mcrA gene, with all mcrA sequences from all assemblies, and were aligned using mafft (L INS i) (Katoh et al. 2002) , and the alignments trimmed using trimAl ( gappyout) (Capella Gutierrez et al. 2009) . Maximum likelihood gene phylogenies were generated from the trimmed alignments using IQ TREE using the best model chosen by BIC (mcrA: LG+F+R5; mcrB: LG+F+R6; mcrG: LG+R4 ) , branch supports wer e estimated using UFBoot ( m MFP bb 1000 alrt 1000) (Nguyen et al. 2015; Hoang et al. 2018) , and trees were visualized using iTOL (Letunic and Bork 2007) . Results and Discussion Recovery of multiple Bathyarchaeota bins We recovered 28 partial to near complete Bathyarchaeota genome bins from the metagenomic assembly of 4 wetland soil samples (depths 13 35cm.) . An additional 2 bins were identified from the assembly of 2 shallow depth wetland soil samples (0 12cm) (Table 4.1) . The se bins ranged from 23 96% complete, and most bins have less th an 20% estimated contamination. However, based on the strict filtering criteria we applied to the bins, this estimated contamination likely represents co binning of contigs from closely related species, and bins with greater than 20% estimated contaminati on are considered composite bins of closely related genomes within their respective subgroups. For example O3D4 bin201 appears to contain 2 closely related group 5b genomes. Overall, these genome bins range in size from ~0.24Mbp to 3.5Mbp, with estimat ed complete genome sizes (calculated from bin length and estimated completeness) ranging from ~0.7Mbp to 2.6Mbp, largely agreeing with previously reported Bathyarchaeota genome

PAGE 90

! +( ! sizes (Evans et al. 2015; He et al. 2016; Lazar et al. 2016) . Interestingly, 3 of 5 group 15/17 bins ha ve estimated genomes sizes less than 1Mbp. For two of these bins, O3D3_Bin_340 and C4D4_Bin_16 (estimated sizes 708 805kbp), the nearest reference genome, Candidatus Bathyarchaeota archaeon RBG 16 48 13 is also estimated to be 84% complete with a bin size of only 0.8Mbp. However, other genomes in groups 15 and 17 are larger, approaching 1.5 2Mbp. As more genomes become available, it will become clear if there is in fact a subset of Bath yarchaeota with reduced genomes. Table 4.1 Bathyarchaeota genom e bin metrics !"#$%&'()*+,!"#"'()*+". /$#"'()*+&,/$#$'()*+$.$ !"#$%&'()*+&-& !"#$%.'()*+&0, !1234565*577 !"#$% &&#"& !"#'( !(#!' $$#%) '*#& !1*682)*86)1* %) %+#() "&#&! %$#)' + "(#&* 9:;1*6)<7 $'' $*' $)" $*) "$( $"( =>0 '$+% '%%+ ))%) )""! $*&$ '%"% 258*:45*<6? '%*& '"*) )+!% '(&) $*+" '%+" 61684:45*<6? "'!!&&% "&)%&)+ "!(+"+( "*%"&$& )$($') "$$)*&& 41*<576:;1*6)< $'%"% "&""* %+!"$ %'&"( "%+** %+!%+ !"#"'()*+&&& @A#'&"B &>'()*+, !"#$%&'()*+&-$ /$#$'()*+&> !"#$%&'()*+&.> !"#$%.'()*+,C !1234565*577 ))#'! (+#' &'#'* )%#)' $%#+$ &+#"' !1*682)*86)1* "#'& "!#& )#&" &#&" &#") $#!' 9:;1*6)<7 "!$ $$$ %&( %++ "() "(+ =>0 $*)' &)$( ''$+ $*&$ %*'' ''"' 258*:45*<6? '+$& )!&" '"%$ $*"% %'+! '$*' 61684:45*<6? &*(%"! "*"())( ""+'*+( !(%'"& '')%&$ !*+(*" 41*<576:;1*6)< ")')! %&(++ "%*!" *!!" "+*$% "$&** /$#$'()*+..> /$#"'()*+, /$#$'()*+CC /DE='&>'()*+.!"#$%.'()*+.&. !1234565*577 $+#+& !%#)* !%#'$ '(#(& &)#'% !1*682)*86)1* + "%#$! "&#(% "+#)* "+#%( 9:;1*6)<7 () $+( %!! "() $'$ =>0 $('" '(&" ')$" $!$& '+*% 258*:45*<6? $&)+ '!+" ''&& $('" '"!' 61684:45*<6? $"+%&" "''(+%( "%$!"(+ !"+)"% "'$")"$ 41*<576:;1*6)< &(&% "*+)$ ")*%* "$'%* "&!%% FGH
PAGE 91

! +) ! Table 4.1 (continued) Bathyarchaeota genome bin metrics !"#"$%&'(") !"#"$%&'(*+, !"#-$%&'(*-) !"#-$%&'(./* !"#-$%&'(.., 01234565'577 !"#!$ $%#%& %'#() "$#*$ +'#"+ 01'682&'86&1' )#$( !!#!! )+#*& (*"#"( &(#!& 9:;1'6&<7 ($' &"& &)% +!' *%' =>/ !%!$ !$)+ !"%) &%++ !+'( 258':45'<6? !"'& !+*) &'+* &+&% !+$' 61684:45'<6? $*&)$! (%&''$$ (%$&"!$ !&$$($" (')*+++ 41'<576:;1'6&< ((&+& (+)+" ("!%) !&"(* "$&) 0-#-$%&'(*.* !"#"$%&'(*>" 0-#-$%&'(*, !"#"$%&'("-/ !"#-$%&'(+ 01234565'577 &%#*" *)#) !$#*& *!#! $$#'% 01'682&'86&1' '#!( '#"! ' '#"+ '#"! 9:;1'6&<7 %$ ('+ %( $( *$& =>/ &((% &(*) !)+" !+*! &!(+ 258':45'<6? !"+& !"") !$'& !%($ &(%( 61684:45'<6? !&(+!( &*+&"$ *"("'* *!*+)' (('!%&+ 41'<576:;1'6&< (())' (!*'* +)"& %&$( (&!)! !"#"$%&'(*,@ !"#-$%&'(.!"#-$%&'(*>" 01234565'577 %'#!+ &&#&% !"#(! 01'682&'86&1' ("#"( '#"! (#& 9:;1'6&<7 !*+ ("' *$& =>/ &$)% &"&( !+(" 258':45'<6? &)++ &$$" !%&! 61684:45'<6? (&"$%(& %%+'+% ('(&&*% 41'<576:;1'6&< (++') ($&%% (("(* ABC:8'E:*) ABCC

PAGE 92

! +* ! Phylogenetic placement of bins The Old Woman Creek (OWC) Bathyarchaeota genome bins dramatically increase the genomic sampling of several Bathyarchaeota subgroups, and are the first Bathyarchaeota genomes described from freshwater wetland soils. To phylogenetically place our genome bins in the context of previously described Bathyarchaeota genomes, we constructed a phylogenetic tree using the small riboso mal subunit protein S3 gene, a known single copy phylogenetic marker gene. (Figure 4.1a) While based on only a single marker gene, the phylogeny is well supported and recapitulates prev iously describ ed relationships among the reference genome bins (He et al. 2016) . Analysis of partial and full assembled SSU 16S rRNA genes in the OWC bins and reference g enomes indicates that the clades identified in the S3 phylogeny are congruent with 16S rRNA based bathyarchaeotal subgroup designations (Kubo et al. 2012; Meng et al. 2014; Fillol, Sanchez Melsio, et al. 2015; Lazar et al. 2015; Xiang et al. 2017) . The Bathyarchaeota genom e bins we present here include thr ee genome bins within subgroup 11, a subgroup with no prior genomic representatives; five genome bins from within group 5 (5a, 5b, 5bb); and 14 genome bins within subgroup 6, a group previously represented by only a single genome bin (Lazar et al. 2016) . An a dditional five genome bins are placed among representatives from groups 15 and 17; however, these bins and most nearby genomes lack 16S rRNA genes, and have relatively long branch lengths, suggesting that additional genomic representatives from this part of the Bathyarchaeota phylum will be needed to more precisely plac e these genomes (Table 4.1, Figure 4.1a).

PAGE 93

! ++ ! Figure 4 . 1

PAGE 94

! +, ! Figure 4.1 Expanded genomic sampling of multiple Bathyarchaeota subgroups. A) Maximum likelihood phylogeny of bathyarchaeotal ribosomal S3 genes. Black labels indicate previously available partial to near complete genomes. Colored labels indicate genome bins from this study, and labels indicate sample origin. Orange: mud flat samples; Blue: water covered samples. Darker shade indicates increased soil depth. The lightest shade denotes two additional bins from 0 12cm sediments. Red and black stars indicate the presence of a 16S rRNA gene in bins from this study and previousl y reported genomes respectively. Subgroup identification is based in part on placement of 16S rRNA sequences or best BLASTN hits within the ARB Silva (v132) guide tree. Ribosomal S3 phylogeny and placement of 16S rRNA sequences are in agreement for the s ubgroups shown. Closed circles on branches indicate IQ TREE ultrafast bootstrap support >95%. B) Expected relative abundance of Bathyarchaeota subgroups in wetland soil samples based on 16S rRNA gene relative abundance (adapted from Narrowe et al. 2017). Subgroup/sample distribution of genome bins agrees with predictions from 16S rRNA census. Metagenomic bins were recovered from samples where subgroups were predicted to be abundant.

PAGE 95

! ,! Predicted subgroup habitat preferences are supported by metagenomic analy sis The intra wetland bathyarchaeotal subgroup level habitat preferences we identified previously (as discussed in Chapter II) (Narrowe et al. 2017) are supported by the distribution of recovered genome bins (Figure 4.1b). In our previous anal ysis of 16S rRNA genes across the wetland, we identified that 16SrRNA genes from Bathyarchaeota subgroup 6 were found in all samples, with a slight increase in abundance in shallow soils. In this study, we recovered subgroup 6 genome bins from all metagen omic samples, consistent with their wetland wide distribution. In contrast, 16S rRNA genes from subgroups 5b and 11 were found almost exclusively in soils from the Open water 3 site, increasing in abundance with soil depth (Figure 2.5) (Narrowe et al. 2017 ) . Consistent with those 16S amplicon measured distributions, the Bathyarchaeota subgroup 11 and subgroup 5b genome bins identified here arise only from the Open Water 3 site samples and were not reconstructed from the mud flat (Figure 4.1). This suppor ts our initial observation that while subgroups 5b and 11 were declared indicator taxa from freshwater sediments (Fillol, Auguet, et al. 2015) , within our freshwater wetland these two groups display a more restricted range, which may be used to further resolve the habitat preferences of these groups ( Chapter II; Narrowe et al. 2017) . Two additional deeply branching bin pairs likely belonging to subgroups 15 and 17 suggest a common, but as yet undetermined, habitat feature in Mud flat 3 depth 4 and Open water 3 depth 3 as these pairs are represented from each of those samples. The p resence of these two groups also agrees with expectations based on our 16S rRNA gene analysis, which predicted the presence of these subgroups across the wetland (Figure 2.5, Figure 4.1b).

PAGE 96

! ,$ ! ANI analysis to validate bin clusters: The presence of m ultiple closely related genomes is known to complicate metagenomic assembly, resulting in shorter fragmented assembly for some members of the community (Sharon et al. 2012; Howe et al. 2014; Hug, Thomas, et al. 2016) . In order to guard against the possibility of using a misbinned contig containing a marker gene to assign a bin within the wrong subgroup, we also performed an all v ersus all average nucleotide identity (ANI) analysis including al l wetland Bathyarchaeota bins and all publicly available Bathyarchaeota genomes/bins shown in the ribosomal S3 tree (Figure 4.2). ANI has been shown to have a discontinuous distribution, with ANI values at or above 95% indicating genomes from the same spe cies, and more distantly related genomes presenting ANI less than 83% (Jain et al. 2017) . Thus, our expectation was that genomes within a subgroup would have more similar ANI than to outgroup bins. Our results confirmed the 16S rRNA and S3 gene based group assignments and indicated that all OWC genome bins are properly placed within each subgroup. Bins in subgro up 6 have ANI linkages only to other bins in this group, many with species level ANI values (>95%). ANI linkages between subgroups 5 and 11 were found among 4 bins; however, the maximum across group ANI was 83% and involved only 20 of 392 scaffolds, as co mpared to the subgroup 11 intra subgroup species level link of 97.5% which involved 115 of 342 scaffolds (Table 2) . These findings are consistent with the subgroup level relationship among the bins defined by phylogenetic marker genes, and allow us to use the combined genomic information from these partial genome bins to represent the metabolic potential from within these subgroups.

PAGE 97

! ,% ! Figure 4.2 Average nucleotide identity (ANI) confirms bin placement within subgroups. All vs. all comparison of bin ANI shows higher identity within than across subgroups. Bin names and subgroup placement are as in Figure 4.1. Only bin with reported matches to other bins are shown. ANI corresponds to line thickness. Highest (100%) and l owest (76%) ANI are indicated to show scale. mcrABG analysis The recent discoveries of putative methanogenic Bathyarchaeota (Evans et al. 2015) , and Verstraetearchaeota (Vanwonterghem et al. 2016) , ha ve challenged the longstanding can on of methanogenesis being possible only within the Euryarchaeota. With our previous findings of Verstraetearchaeota and the abundance and richness of Bathyarchaeota within the methane emitting Old Woman Creek wetland, we as ked if the OWC Bathyarchaeota might

PAGE 98

! ,& ! be methanogens. The methyl coenzyme M reductase subunits alpha, beta, and gamma ( mcrABG ) genes are the hallmark genes for methanogenesis. In addition to their presence in coal bed methane wells ( Candidatus Bathyarchaeota archaeon BA1 and BA2) (Evans et al. 2015) , bathyarchaeotal mcrA amplicon sequences were also reported from sediments in Yellowstone hot springs (McKay et al. 2017) . The BA1 and BA2 genomes belong within Bathyarchaeota subgroups 3 and 8 (Evans et al. 2015) . The Yellowstone mcrA amplicon sequences could not be assigned to subgroups, but occurred in sediments in which groups 2, 6, 15, 20, 10, and 14 were abundant based on 16S rRNA gene amplicon sequencing paired to the m crA gene sequencing (McKay et al. 2017) . Within our metagenomic assembly, which includes subgroups 5b, 6, 11, 15, and 17, we identified 9 putative bathyarchaeotal mcrA gene sequences, 7 bathyarchaeotal mcrB sequences , and four bathyarchaeotal mcrG sequences. These mcrABG seq uences were found in both Open W ater 3 soil samples ( mcrABG) and also in the Mud 3, D epth 4 sample ( mcrA only ). Despite a ttempts at reassembly, the contigs containing these sequences remained short and thus we were not able to assign them to any of the specific Bathyarchaeota genome bins. However, the OWC bathyarchaeotal mcrABG sequences are most closely related to those of Ca. Bathyarchaeota BA1 and BA2 (Evan s et al. 2015) and branch near the mcrABG sequences from butane utilizing Candidatus Syntrophoarchaeum sp. (Laso PÂŽrez et al. 2016) (Figure 4.3, Figure S4.1, Figure S4.2) . The long branch length separating the Ca. Syntrophoarchaeum and Ca. Bathyarchaeota BA1 and BA2 mcrA sequences from the euryarchaeotal mcrA sequences was suggested to be the result of sequence and structural divergence that permits these mcrA to accommodate larger alkanes, in particular butane (Laso PÂŽrez et al. 2016) . In the case of Ca. Syntrophoarchaeum, the mcrA protein subunit has been shown to activate butane for oxidation, analogous to

PAGE 99

! ,' ! methane oxidation by mcrA for methane oxidation via reverse methanogenesis (Laso PÂŽrez et al. 2016) . While we were not able to assign them to genome bins, the presence of these genes across multiple samples in this wetland indicates that the potential for methane production (or possibly butane oxidation) by Bathyarchaeota likely extends more broadly across the phylum than known to date. More critically, this suggests that additional m ethanogen diversity may be present and yet unaccounted for in our understanding of wetlands methane emissions. Metabolic potential of Bathyarchaeota carbon cycling in wetland soils To date , inferences of metabolic potential for Bathyarchaeota have been m ade based on only 13 genome bins representing only 8 of the approximately 22 bathyarchaeotal subgroups and reflecting only 3 environments: coal bed methane wells (Evans et al. 2015) , marine sediments (He et al. 2016) , and brackish estuarine sediments (Lazar et al. 2016) . Thus far, all described Bathyarchaeota genomes encode components of the Wood Ljungdahl pathway (WLP). This pathway, found in both the bacteria and archaea, is also known as the acetyl CoA pathway can be used for carbon fixation or can be used oxidatively (Borrel, Adam, and Gribaldo 2016; Chistoserdova 2016 ; Adam, Borrel, and Gribaldo 2018) . In the archaea, this pathway is found in and often associated with methanogens; however, with the discovery and description of additional archaeal genomes, it has become clear that this pathway is also found in non m ethanogenic archaea, and is even absent in the case of the methanogenic Candidatus Methanomassiliicoccus sp. (Borrel, Adam, and Gribaldo 2016) .

PAGE 100

! ,( ! Figure 4.3

PAGE 101

! ,) ! Figure 4.3 mcrA gene phylogeny Divergent mcrA genes found in OWC metagenomes branch close to those of the putative methanogen Ca. Bathyarchaeota BA1, and near to those of putative butane oxidizing Ca. Syntrophoarchaeum sp. Maximum likelihood phylogeny of mcrA amino acid sequences includes sequences f rom publicly available genomes and all predicted mcrA from all OWC metagenomic assemblies. Likely OWC Bathyarchaeotal sequences are shown with colored labels (Medium blue: Open water 3, Depth 3; Dark blue: Open water 3, Depth 4; Dark orange: Mud flat 3, D epth 4). Ca. Bathyarchaeota sp. and Ca. Syntrophoarchaeum sp. mcrA sequences are shown in bold.

PAGE 102

! ,* ! The subgroups represented by the OWC Bathyarchaeota genomes encode all components of the archaeal WLP, with the exception of subgroup 11, whe re all genome bins lack methenyl H 4 MPT cyclohydrolase ( mch ) . All genome bins (and thus all subgroups) also lack subunits A G of H 4 MPT S methyltransferase ( mtr ), with only mtrH detected ( Figure 4.4). This particular absence of most mtr subunits has been n oted previously in the methanogenic Ca. Bathyarchaeota BA1 and BA2 (Evans et al. 2015) , as well as in the acetogenic B23,24,26 1,26 2, and B63 genomes (He et al. 2016) . The absence of mtr precludes the possibility of methanogenesis from CO2 and H2, however, decoupling the carbon fixation of the WLP from the energy conservation of the mcrABG and hdrABC/mvhABG remains possible for methylotrophic meth anogens such as some Candidatus Methanomassiliicoccus sp., which in addition to lacking the methyl branch of the WLP, also lacks the mtr complex (Borrel et al . 2013; Borrel, Adam, and Gribaldo 2016) , and for Ca. Bathyarchaeota BA1, which was inferred to be capable of methylotrophic methanogenesis (Evans et al. 2015) . On the assumption that the putatively methanogenic source of the bathyarchaeotal mcrABG genes found in the OWC metagenomes is among t he subgroups for which we have identified genome bins, and given that all OWC Bathyarchaeota bins, regardless of subgroup, lack the mtr complex genes; we predict that a methanogenic OWC Bathyarchaeota would likely be a methylotrophic methanogen with a meta bolism similar to that described for Ca. Bathyarchaeota BA1.

PAGE 103

! ,+ ! Figure 4.4

PAGE 104

! ,, ! Figure 4 Metabolic pathways represented in Bathyarchaeota subgroups. Selected metabolic pathways present in OWC Bathyarchaeota genome bins are shown as defined by subgroup assignment. Genes were annotated with KEGG identifiers using GhostKOALA. Full circles indicate the presence of a gene in at least one bin in the subgroup. In the case of multi subunit enzymes, full circles represent the presence of the majority of subunits. Vertical half circles represent less than a majority of the subunits. Missing circles indicate that the genes were not detected. Full circles for irreversible reactions shown with bi directional arrows indicates the presence of genes for bo th reactions. Half filled horizontal circles indicate absence of a gene for one of the reactions . Multiple copies of bathyarchaeotal mcrABG genes were identified in three of the five assemblies, and were not able to be binned, so are not assigned to a subg roup. Pathways and subunits in grey were not detected in any bins. Genes associated with transporters are shown along the top of the figure and genes associated with energy conservation using a H+ or Na+ gradient are along the bottom of the figure.

PAGE 105

! $-! Overall, the reconstructed metabolisms of all subgroups described here (Figure 4.4) are at their core similar to that described for BA1 (subgroup 3). However the presence of methyl utilizing genes that would be necessary for BA1 type methanogene sis suggests subgroup 5b or possibly groups 15/17 as candidates for the source of the Bathyarchaeotal mcrABG genes. Of the new genomes described here, group 5b O3D4_Bin_147 contains both the mtaA and mttB genes and two additional subgroup 5b bins contain mttB. The deeply branching, putative group 15 O3D4_Bin_340 contains the mtmB gene, suggesting that these two groups may be capable of methylotrophic methanogenesis as described for BA1. In the case of BA1, reoxidation of ferredoxin reduced by the hdrABC/ mvhABG complex was p roposed to be performed by hdrD coupled to glcD but none of the group 5b or group 15/17 bins encode either hdrD or E. Subgroup 11 here encodes both hdrD and glcD genes, but does not encode any of the methyl utilizing genes described abo ve. It was prop o sed that ech coupled to the generation of a proton gradient could perform this reoxidation of ferredoxin (Borrel, Adam, and Gribaldo 2016) . Unlike BA1, none of the genome bins reported here encode ech, rather, groups 5b, 11, and 6 encode instea d rnfBG , which in used in the aceticlastic methanogen Methanosarcina acetivorans for generation of a Na+ gradient (Welte and Deppenmeier 2014) . This could replace the Na+ gradient that is typically provided by mtr complex. However, all of BA1, group 5b, and group 11 lack genes for an ATP synthase. Based on the missing components to conserve energy from BA1 methanogenesis it was hypoth esized that the reoxidation of ferredoxin in this case might be coupled to CO2 fixation via the WLP, which then would lead to further energy conservation via utilization of the acetate end product (Borrel, Adam, and Gribaldo 2016) .

PAGE 106

! $-$ ! It is also possible that the divergent Bathyarchaeota mcrABG genes are used not for methane metabolism but for the oxidation of butane (Laso PÂŽrez et al. 2016) . Of the 4 subgroups we describe here, groups 5b an d 11 appear to contain the genes necessary for the beta oxidation of butyryl CoA to acetate (Figure 4.4). Interestingly the genes for this pathway, including etfAB and homologs to the Ca. Syntrophoarchaeum acyl CoA dehydrogenases are also found in BA1 and in the publicly available putative subgroup 5b genomes RBG_13_46_16b, and CG07_land. In addition to the mtaA methyltransferase noted above, these genes are essential components of the butane oxidizing metabolism described for Ca. Syntrophoarchaeum sp. (Laso PÂŽrez et al. 2016) . While it is not clear if the OWC bathyarchaeotal mcrABG indicate methanogenesis similar to BA1 or butane oxidation similar to Ca. Syntrophoarchaeum sp., many of the components for both metabolisms are present within group 5b or group 11 genome bins. We hypothesize that one of these subgroups is the source of the mcrABG genes found in the metagenomic assembly. Further investigation into the energy conservation mechanisms of mcrABG encoding Bathyarchaeota is needed. Additional experimentation on the OWC Bathyarchaeota conclusively linking the mcrABG genes to one or more subgroups will help to resolve these questions. It was suggested that the patchy distributi on of (apparent) methanogenesis within the Bathyarchaeota may reflect a relatively recent and ongoing loss of methanogenic capacity within this phylum due to the presence of alternate, more favorable metabolic strategies (Borrel, Adam, and Gribaldo 2016) Each Bathyarchaeota subgroup we describe here appear to encode multiple pathways for the acquisition of carbon for biomass generation. Many of the previously described Bathyarchaeota are inferred to use the WLP for carbon fixation leading to the productio n of acetate (He et al. 2016; Lazar et al. 2016) or oxidatively (Lazar

PAGE 107

! $-% ! et al. 2016) for the breakdown of acetyl CoA generated from beta oxidation and from peptide degradation. It is possible that Bathyarchaeota are facultatively autotrophic, agreeing with previous description s of the phylum as broadly distributed, successful generalists (Fillol, Auguet, et al. 2015) . The OWC Bathyarchaeota all contain the necessary genes for autotrophic acetogenesis initiated via the WLP or for acetogenesis via the fermentative breakdown of organic compounds. Unlike Ca. Bathyarchaeota B24, B26 1, and B26 2, which encode a bacterial type pta ack pathway for acetate generation from Acetyl CoA (He et al. 2016) , the 4 subgroups we describe here appear to maintain the more typical archaeal acd (acetate --CoA ligase (ADP forming)) gene. No OWC bins contain the pta gene and only a single subgrou p 11 bin (O3D4_Bin_153; 39% complete) appears to encode a copy of the ack gene. While it is incomplete, it is worth noting that this bin does not contain either the acdA or acdB subunits, and like B24, and B26 Bathyarchaeota may potentially replace the ar chaeal acd acetate production pathway with the bacterial type pta ack pathway. In addition to autotrophic acetate production via the WLP, heterotrophic acetate production appears to be possible for these Bathyarchaeota. While the group 6 and group 15/17 genomes encode the complete TCA cycle, group 5 and 11 lack aconitase and isocitrate dehydrogenase and group 5 also lacks both citrate synthase and citrate lyase. The remaining partial TCA cycle in these two subgroups may be used either in support of autot rophic biomass generation, or heterotrophic incorporation initiating from fatty acid oxidation. In addition to the genes for the beta oxidation of butyryl CoA as noted above, these two subgroups contain the genes for the beta oxidation of acryloyl CoA to enter the TCA cycle via succinyl CoA. In our previous study, we found that subgroups 5b and 11 were most abundant in the geochemically divergent Open water 3 soils. FTICR MS analysis indicated

PAGE 108

! $-& ! that the carbon at this site was characterized by a higher lev el of fatty acids compared to other sites ( data not shown). The shared capacity for beta oxidation across subgroups 5b and 11, and their shared wetland wide distribution favoring a site rich in fatty acids suggests a clear link to their apparent common hab itat preference. These varied metabolic capacities agree with previous suggestions that the Bathyarchaeota are successful generalists, facultatively able to fix carbon, potentially through multiple mechanisms, in addition to using fatty acids and peptides (Biddle et al. 2006; Lloyd et al. 2013; Meng et al. 2014; Seyler, McGuinness, and Kerkhof 2014; Evans et al. 201 5; Fillol, Auguet, et al. 2015; He et al. 2016; Lazar et al. 2016) . Given their broad distribution across the wetland, and globally, the Bathyarchaeota, as acetogens, can be key players supporting methanogenesis. The main active methanogen in the Old W oman Creek wetland is the acetoclastic Methanothrix paradoxum (Angle et al. 2017) , so the potential importance of Bathyarchaeota as a source of acetate in the wetland should be explored further. Finally, with the expanded representation of subgroup 6 genome bins from our study, we add to the metabolic potential previously inferred for this group on the basis of a single genome bin (Lazar et al. 2016) . In contrast to previous reports (Lazar et al. 2016) , the subgroup 6 genomes here contain genes encoding components of cytochrome c oxidase as well as cytochrome c assembly factors. These suggest a previously undescribed respiratory capacity for this group. In addition to the cytochrome c oxidases, subgroup 6 encodes most subunits of ATP synthase, and most subunits of the NADH quinone oxidoreductase ( nuoA N ). However, the remainder of this potential respiratory chain remains unclear.

PAGE 109

! $-' ! Conclusions Here, we have described the first Bathyarchaeota genomes from freshwater wetland soils. With these genomes, we have increased the genomic representation of subgroup 6 from a single genome t o 15 genome bins, and we provide the first descriptions of representatives from within subgroup 11. While a few genomes associated with group 5 can be found in public databases, their metabolisms to d ate have not yet been described. Group 5b and group 11 appear to be capable of fatty acid oxidation and in agreement with our prior work, are most abundant at a wetland s ite where the organic carbon consisted predominantly of fatty acids. We found that the Wood Ljungdahl pathway is broadly conserved across a ll Bathyarchaeota subgroups we examined, adding addition al support for the presence of this pathway in the common bathyarchaeotal ancestor (Spang and Ettema 2017) . Bathyarchaeotal mcrABG genes, ass ociated with methanogenesis/methane oxidation or potentially butane oxidation, are found in these wetland metagenomes, suggesting a more broad distribution of these genes than previously described. Genomic content of group 5b bins presented here makes thi s group a compelling candidate for the source of these genes, though this phylogenetic placement remains to be confirme d. The abundance and known distribution of this particular subgroup within the OWC wetland offers the opportunity for in situ exploratio n of non euryarchaeotal alkane metabolism. Author Contributions Adrienne Narrowe, Christopher Miller, and Kelly Wrighton designed the study. Jordan Angle, Rebecca Daly, and Mikayla Borton extracted DNA for sequencing. Adrienne Narrowe performed the bioinformatics analyses (binning, annotation, metabolic reconstruction), with additional genome bins contributed by Lindsey Solden. All other

PAGE 110

! $-( ! analyses were performed by Adrienne Narrowe, who w rote the chapter in consultation with Christopher Miller.

PAGE 111

! $-) ! CHAPTER V CONCLUSIONS AND FUTURE DIRECTIONS Local habitat variability within a single wetland, where soils sampled just meters apart can differ significantly in geochemical measures, provided th e opportunity to explore the relationship between the structure and function of microbial communities and their environment. Particularly for the archaea, we found unexpected diversity at the community, genome, and gene levels. We developed a method to profile both the bacteria and archaea within this model wetland at high resolution and provide the first microbial census for the Old Woman Creek wetland. Remarkably, though OWC has been studied for many years, the microbial community had not yet been thor oughly characterized, and this microbial community data can ultimately inform ongoing efforts to model wetland scale carbon cycling at this site. With this census, we found a high degree of habitat specificity for multiple archaeal and bacterial taxa, wh ich suggests underlying genomic/metabolic variation within these closely related populations. We tested this hypothesis with an in depth analysis of the Bathyarchaeota subgroups found in the wetland, identifying differences in key metabolic pathways that c orrespond to the observed habitat preferences for these subgroups, which correspond to geochemical measurements. This demonstrates the value of paired broad 16S rRNA gene sampling with targeted metagenomic sequencing, a strategy that can continue to be pu rsued for each of the many other understudied clades present within this wetland. The microbial census provides a link between genome and habitat for hundreds of additional MAGs , which we have already identified. As with the Bathyarchaeota analysis, thes e

PAGE 112

! $-* ! genome/habitat linkages can be leveraged to place metabolic inferences into environmental context. The OWC microbial census provided a starting point for the long term study of methane emissions in the wetland. A Methanothrix ( Methanosaeta ) sp. OTU we identified in this census as associated with shallow, presumably oxic sites, was found to be the main active methanogen in these soils, and we further identified that this species is globally distributed (Angle et al. 2017) . The finding of a methanogen active in surface, transiently oxic soil s challenges the existing paradigm of methanogen distribution and activity being limited to anoxic habitats, and is being used to update models of methane emissions. Ongoing work at the OWC site will incorporate measures of methane flux at the centimeter and site level scale, coupled to marker gene, metagenomic and metatranscriptomic sequencing. Especially given their broad distribution beyond the OWC wetland, the possible role of Bathyarchaeota in wetland methane metabolism should be explored further. Soil enrichments targeting Bathyarchaeota from several wetland sites can be used to test hypotheses regarding Bathyarchaeota fatty acid metabolism and methane production/butane oxidation. With the knowledge of the genomic content and distribution of these associated Bathyarchaeota subgroups across the wetland, these enrichments can be targeted to the subgroups described here, and inferred metabolic potential will guide enrichment strategies. Given the ancient and conserved nature of the translation elongation proteins, the elongation factor 2 paralog we discovered and the apparent loss of diphthamide synthesis genes across most of the Asgard archaea is intriguing in the context of eukaryotic evolution. Diphthamide itself has implications in cancer v ia a role in cell cycle regulation, and the

PAGE 113

! $-+ ! finding that a tractable eukaryotic model system ( Trichomonas sp. ) naturally lacks this modification can be used to better understand the poorly understood cellular and molecular role of diphthamide. Future rese arch into the evolution of EF2 and diphthamide synthesis can be achieved with the heterologous expression of the EF2 and EF2p genes to address questions regarding the activity and function of these paralogs and their role in regulation of translation. Fur ther exploration of the distribution of the paralogous EF2 with the discovery of additional genomes will shed light on to the evolution of this gene family.

PAGE 114

! $-, ! REFERENCES Adam PS, Borrel G, Brochier Armanet C, Gribaldo S. 2017. The growing tree of Archaea: new perspectives on their diversity, evolution and ecology. ISME J 11:2407 Ð 2425. Adam PS, Borrel G, Gribaldo S. 2018. Evolutionary history of carbon monoxide dehydrogenase/acetyl CoA syntha se, one of the oldest enzymatic complexes. Proc. Natl. Acad. Sci. U. S. A. 115:E1166 Ð E1173. Albertsen M, Hugenholtz P, Skarshewski A, Nielsen KL, Tyson GW, Nielsen PH. 2013. Genome sequences of rare, uncultured bacteria obtained by differential coverage bi nning of multiple metagenomes. Nat Biotechnol 31:533 Ð 538. Alneberg J, Bjarnason BS, de Bruijn I, Schirmer M, Quick J, Ijaz UZ, Lahti L, Loman NJ, Andersson AF, Quince C. 2014. Binning metagenomic contigs by coverage and composition. Nat Methods 11:1144 Ð 114 6. Altschul SF. 2008. BLAST Basic Local Alignment Search Tool. Distribution 1:4 Ð 5. Amaral Zettler L a, Rocca JD, Lamontagne MG, Dennett MR, Gast RJ. 2008. Changes in microbial community structure in the wake of Hurricanes Katrina and Rita. Environ. Sci. Te chnol. 42:9072 Ð 8. Anantharaman K, Brown CT, Hug LA, Sharon I, Castelle CJ, Probst AJ, Thomas BC, Singh A, Wilkins MJ, Karaoz U, et al. 2016. Thousands of microbial genomes shed light on interconnected biogeochemical processes in an aquifer system. Nat. Com mun. 7:13219. Anderson MJ. 2001. A new method for non parametric multivariate analysis of variance. Austral Ecol. 26:32 Ð 46. Angel R, Matthies D, Conrad R, Denman K, Brasseur G, Chidthaisong A, Clais P, Cox R, Shindell D, Faluvegi G, et al. 2011. Activation of methanogenesis in arid biological soil crusts despite the presence of oxygen.Gilbert JA, editor. PLoS One 6:e20453. Anger AM, Armache JP, Berninghausen O, Habeck M, Subklewe M, Wilson DN, Beckmann R. 2013. Structures of the human and Drosophila 80S rib osome. Nature 497:80 Ð 85. Angermeyer A, Crosby SC, Huber JA. 2016. Decoupled distance decay patterns between dsrA and 16S rRNA genes among salt marsh sulfate reducing bacteria. Environ. Microbiol. 18:75 Ð 86. Angle JC, Morin TH, Solden LM, Narrowe AB, Smith G J, Borton MA, Rey Sanchez C, Daly RA, Mirfenderesgi G, Hoyt DW, et al. 2017. Methanogenesis in oxygenated soils is a substantial fraction of wetland methane emissions. Nat. Commun. 8:1567. Aoki Kinoshita KF, Kanehisa M. 2007. Gene annotation and pathway ma pping in KEGG. Methods Mol. Biol. 396:71 Ð 91.

PAGE 115

! $$! Argiroff WA, Zak DR, Lanser CM, Wiley MJ. 2016. Microbial Community Functional Potential and Composition Are Shaped by Hydrologic Connectivity in Riverine Floodplain Soils. Microb. Ecol.:1 Ð 15. ArgŸelles S, Caman dola S, Cutler RG, Ayala A, Mattson MP. 2014. Elongation factor 2 diphthamide is critical for translation of two IRES dependent protein targets, XIAP and FGF2, under oxidative stress conditions. Free Radic Biol Med 67:131 Ð 138. ArgŸelles S, Camandola S, Hut chison ER, Cutler RG, Ayala A, Mattson MP. 2013. Molecular control of the amount, subcellular location, and activity state of translation elongation factor 2 in neurons experiencing stress. Free Radic Biol Med 61:61 Ð 71. Arroyo P, S‡enz de Miera LE, Ansola G. 2015. Influence of environmental variables on the structure and composition of soil bacterial communities in natural and constructed wetlands. Sci. Total Environ. 506:380 Ð 390. Arshad A, Speth DR, de Graaf RM, Op den Camp HJM, Jetten MSM, Welte CU. 2015. A Metagenomics Based Metabolic Model of Nitrate Dependent Anaerobic Oxidation of Methane by Methanoperedens Like Archaea. Front. Microbiol. 6:1423. Atkinson GC. 2015. The evolutionary and functional diversity of classical and lesser known cytoplasmic and organellar translational GTPases across the tree of life. BMC Genomics 16:78. Baker BJ, Comolli LR, Dick GJ, Hauser LJ, Hyatt D, Dill BD, Land ML, Verberkmoes NC, Hettich RL, Banfield JF. 2010. Enigmatic, ultrasmall, uncultivated Archaea. Proc. Natl. Acad. Sci. U. S. A. 107:8806 Ð 11. Baker BJ, Lazar CS, Teske AP, Dick GJ. 2015. Genomic resolution of linkages in carbon, nitrogen, and sulfur cycling among widespread estuary sediment bacteria. Microbiome 3:14. Baker GC, Smith JJ, Cowan DA. 2003. Review and re a nalysis of domain specific 16S primers. J. Microbiol. Methods 55:541 Ð 555. Baldauf SL, Palmer JD, Doolittle WF. 1996. The root of the universal tree and the origin of eukaryotes based on elongation factor phylogeny. Proc Natl Acad Sci U S A 93:7749 Ð 7754. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, et al. 2012. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single Cell Sequencing. J. Comput. Biol. 19:455 Ð 477. Bar Or I, Ben Dov E, Kushmaro A, Eckert W, Sivan O. 2015. Methane related changes in prokaryotes along geochemical profiles in sediments of Lake Kinneret (Israel). Biogeosciences 12:2847 Ð 2860. Barber‡n A, Ramirez KS, Leff JW, Bradford MA, Wall DH, Fierer N. 2014. Wh y are some microbes more ubiquitous than others? Predicting the habitat breadth of soil bacteria. Ecol. Lett. 17:794 Ð 802.

PAGE 116

! $$$ ! Bastviken D, Tranvik LJ, Downing JA, Crill PM, Enrich Prast A. 2011. Freshwater methane emissions offset the continental carbon sink. Science 331:50. Bates ST, Berg Lyons D, Caporaso JG, Walters WA, Knight R, Fierer N. 2011. Examining the global distribution of dominant archaeal populations in soil. ISME J. 5:908 Ð 17. Beal EJ, House CH, Orphan VJ. 2009. Manganese and iron dependent marin e methane oxidation. Science 325:184 Ð 7. Becam AM, Nasr F, Racki WJ, Zagulski M, Herbert CJ. 2001. Ria1p (Ynl163c), a protein similar to elongation factors 2, is involved in the biogenesis of the 60S subunit of the ribosome in Saccharomyces cerevisiae. Mol Genet Genomics 266:454 Ð 462. Bernal B. 2008. A comparison of soil carbon pools and profiles in wetlands in Costa Rica and Ohio. Ecol. Eng. 34:311 Ð 323. Biddle JF, Lipp JS, Lever M a, Lloyd KG, S¿rensen KB, Anderson R, Fredricks HF, Elvert M, Kelly TJ, Schrag DP, et al. 2006. Heterotrophic Archaea dominate sedimentary subsurface ecosystems off Peru. Proc. Natl. Acad. Sci. U. S. A. 103:3846 Ð 3851. Blaby IK, Phillips G, Blaby Haas CE, Gulig KS, El Yacoubi B, de Crecy Lagard V. 2010. Towards a systems approach in the genetic analysis of archaea: Accelerating mutant construction and phenotypic analysis in Haloferax volcanii. Archaea 2010:426239. Bodelier PLE, Meima Franke M, Hordijk CA, Steenbergh AK, Hefting MM, Bodrossy L, von Bergen M, Seifert J. 2013. Microbial minorities modulate methane consumption through niche partitioning. ISME J. 7:2214 Ð 2228. Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114 Ð 2120. Borrel G, Adam PS, Gribaldo S. 2016. Metha nogenesis and the Wood Ljungdahl Pathway: An Ancient, Versatile, and Fragile Association. Genome Biol. Evol. 8:1706 Ð 1711. Borrel G, JŽzŽquel D, Biderre Petit C, Morel Desrosiers N, Morel J P, Peyret P, Fonty G, Lehours A C. 2011. Production and consumption of methane in freshwater lake ecosystems. Res. Microbiol. 162:832 Ð 47. Borrel G, Lehours A C, Crouzet O, JŽzŽquel D, Rockne K, Kulczak A, Duffaud E, Joblin K, Fonty G. 2012. Stratification of Archaea in the deep sediments of a freshwater meromictic lake: v ertical shift from methanogenic to uncultured archaeal lineages.Neufeld J, editor. PLoS One 7:e43346. Borrel G, O'Toole PW, Harris HMB, Peyret P, Brugre JF, Gribaldo S. 2013. Phylogenomic data support a seventh order of methylotrophic methanogens and provide insights into the evolution of methanogenesis. Genome Biol. Evol. 5:1769 Ð 1780.

PAGE 117

! $$% ! Botet J, Rodriguez Mateos M, Ballesta JP, Revuelta JL, Remacha M. 2008. A chemical genomic screen in Saccharomyces cerevisiae reveals a role for diphthamidation of t ranslation elongation factor 2 in inhibition of protein synthesis by sordarin. Antimicrob Agents Chemother 52:1623 Ð 1629. Bourne DG, McDonald IR, Murrell JC. 2001. Comparison of pmoA PCR primer sets as tools for investigating methanotroph diversity in three Danish soils. Appl. Environ. Microbiol. 67:3802 Ð 9. BrŠuer SL, Cadillo Quiroz H, Ward RJ, Yavitt JB, Zinder SH. 2011. Methanoregula boonei gen. nov., sp. nov., an acidiphilic methanogen isolated from an acidic peat bog. Int. J. Syst. Evol. Microbiol. 61:45 Ð 52. Bray NL, Pimentel H, Melsted P, Pachter L. 2016. Near optimal probabilistic RNA seq quantification. Nat Biotechnol 34:525 Ð 527. Bridgham SD, Cadillo Quiroz H, Keller JK, Zhuang Q. 2013. Methane emissions from wetlands: biogeochemical, microbial, and mo deling perspectives from local to global scales. Glob. Chang. Biol. 19:1325 Ð 46. Brown CT, Hug LA, Thomas BC, Sharon I, Castelle CJ, Singh A, Wilkins MJ, Wrighton KC, Williams KH, Banfield JF. 2015. Unusual biology across a group comprising more than 15% of domain Bacteria. Nature 523:208 Ð 211. Cadillo Quiroz H, BrŠuer S, Yashiro E, Sun C, Yavitt J, Zinder S. 2006. Vertical profiles of methanogenesis and methanogens in two contrasting acidic peatlands in central New York State, USA. Environ. Microbiol. 8:1428 Ð 40. Cadillo Quiroz H, Yashiro E, Yavitt JB, Zinder SH. 2008. Characterization of the archaeal community in a minerotrophic fen and terminal restriction fragment length polymorphism directed isolation of a novel hydrogenotrophic methanogen. Appl. Environ. Microbiol. 74:2059 Ð 68. Cai Y, Zheng Y, Bodelier PLE, Conrad R, Jia Z, Conrad R, Bender M, Conrad R, Kolb S, Holmes AJ, et al. 2016. Conventional methanotrophs are responsible for atmospheric methane oxidation in paddy soils. Nat. Commun. 7:11728. Capella G utierrez S, Silla Martinez JM, Gabaldon T, Capella GutiŽrrez S, Silla Mart’nez JM, Gabald—n T. 2009. trimAl: a tool for automated alignment trimming in large scale phylogenetic analyses. Bioinformatics 25:1972 Ð 1973. Caporaso JG, Kuczynski J, Stombaugh J, B ittinger K, Bushman FD, Costello EK, Fierer N, Pe–a AG, Goodrich JK, Gordon JI, et al. 2010. QIIME allows analysis of high throughput community sequencing data. Nat. Methods 7:335 Ð 336. Caporaso JG, Lauber CL, Walters WA, Berg Lyons D, Huntley J, Fierer N, Owens SM, Betley J, Fraser L, Bauer M, et al. 2012. Ultra high throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms. ISME J. 6:1621 Ð 1624.

PAGE 118

! $$& ! Caporaso JG, Lauber CL, Walters WA, Berg Lyons D, Lozupone CA, Turnbaugh PJ, Fierer N, Knight R. 2011. Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proc. Natl. Acad. Sci. U. S. A. 108 Suppl:4516 Ð 4522. Case RJ, Boucher Y, Dahllšf I, Holmstršm C, Doolittle WF, Kjelleberg S. 2007. Use of 16S rRNA and rpo B genes as molecular markers for microbial ecology studies. Appl. Environ. Microbiol. 73:278 Ð 88. Castelle CJ, Hug LA, Wrighton KC, Thomas BC, Williams KH, Wu D, Tringe SG, Singer SW, Eisen J a, Banfield JF. 2013. Extraordinary phylogenetic diversity and me tabolic versatility in aquifer sediment. Nat. Commun. 4:2120. Castelle CJ, Wrighton KC, Thomas BC, Hug LA, Brown CT, Wilkins MJ, Frischkorn KR, Tringe SG, Singh A, Markillie LM, et al. 2015. Genomic expansion of domain archaea highlights roles for organism s from new phyla in anaerobic carbon cycling. Curr. Biol. 25:690 Ð 701. Chase AB, Karaoz U, Brodie EL, Gomez Lunar Z, Martiny AC, Martiny JBH. 2017. Microdiversity of an Abundant Terrestrial Bacterium Encompasses Extensive Variation in Ecologically Relevant Traits. MBio 8:e01809 17. Chen H, Boutros PC. 2011. VennDiagram: a package for the generation of highly customizable Venn and Euler diagrams in R. BMC Bioinformatics 12:35. Chin Y P, Traina SJ, Swank CR, Backhus D. 1998. Abundance and properties of dissolv ed organic matter in pore waters of a freshwater wetland. Limnol. Oceanogr. 43:1287 Ð 1296. Chistoserdova L. 2016. Wide Distribution of Genes for Tetrahydromethanopterin/Methanofuran Linked C1 Transfer Reactions Argues for Their Presence in the Common Ancest or of Bacteria and Archaea. Front. Microbiol. 7:1425. Chu H, Sun H, Tripathi BM, Adams JM, Huang R, Zhang Y, Shi Y. 2016. Bacterial community dissimilarity between the surface and subsurface soils equals horizontal differences over several kilometers in th e western Tibetan Plateau. Environ. Microbiol. 18:1523 Ð 1533. Conrad R. 2002. Control of microbial methane production in wetland rice fields. Nutr. Cycl. Agroecosystems 64:59 Ð 69. Conrad R, Klose M, Lu Y, Chidthaisong A. 2012. Methanogenic Pathway and Archae al Communities in Three Different Anoxic Soils Amended with Rice Straw and Maize Straw. Front. Microbiol. 3:4. Corradi N, Pombert JF, Farinelli L, Didier ES, Keeling PJ. 2010. The complete sequence of the smallest known nuclear genome from the micr osporidian Encephalitozoon intestinalis. Nat Commun 1:77.

PAGE 119

! $$' ! De CrŽcy Lagard V, Forouhar F, Brochier Armanet C, Tong L, Hunt JF. Comparative genomic analysis of the DUF71/ COG2102 family predicts roles in diphthamide biosynthesis and B12 salvage. Criscuolo A, Gribaldo S. 2010. BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol Biol 10:210. Crooks GE, Hon G, Chandonia J MM, Brenner SE. 2004. WebLogo: a sequence logo generator. Genome Res 14:1188 Ð 1190. Da Cunha V, Gaia M, Gadelle D, Nasir A, Forterre P. 2017. Lokiarchaea are close relatives of Euryarchaeota, not bridging the gap between prokaryotes and eukaryotes. PLoS Genet 13:e1006810. Daly RA, Borton M A, Wilkins MJ, Hoyt DW, Kountz DJ, Wolfe RA, Welch SA, Marcus DN, Trexler R V., MacRae JD, et al. 2016. Microbial metabolisms in a 2.5 km deep ecosystem created by hydraulic fracturing in shales. Nat. Microbiol. 1:16146. Danczak RE, Johnston MD, Kenah C, S lattery M, Wrighton KC, Wilkins MJ. 2017. Members of the Candidate Phyla Radiation are functionally differentiated by carbon and nitrogen cycling capabilities. Microbiome 5:112. Degnan PH, Ochman H. 2012. Illumina based analysis of microbial community div ersity. ISME J. 6:183 Ð 94. Denman KL, Brasseur G, Chidthaisong A, Ciais P, Cox PM, Dickinson RE, Hauglustaine D, Heinze C, Holland E, Jacob D, et al. 2007. Couplings between changes in the climate system and biogeochemistry. In: Solomon S, Qin D, Manning M, Chen Z, Marquis M, Averyt KB, Tignor M, Miller HL, editors. Climate Change 2007: The Physical Science Basis. Contribution of Working Group I to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change. Cambridge, UK and New York, NY: Cambridge University Press. p. 499 Ð 587. DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, Huber T, Dalevi D, Hu P, Andersen GL. 2006. Greengenes, a chimera checked 16S rRNA gene database and workbench compatible with ARB. Appl. Environ. Mi crobiol. 72:5069 Ð 5072. Deutzmann JS, Stief P, Brandes J, Schink B. 2014. Anaerobic methane oxidation coupled to denitrification is the dominant methane sink in a deep lake. Proc. Natl. Acad. Sci. 111:201411617. Dick GJ, Andersson AF, Baker BJ, Simmons SL, Thomas BC, Yelton a P, Banfield JF. 2009. Community wide analysis of microbial genome sequence signatures. Genome Biol. 10:R85.

PAGE 120

! $$( ! Donhofer A, Franckenberg S, Wickles S, Berninghausen O, Beckmann R, Wilson DN. 2012. Structural basis for TetM mediate d tetracycline resistance. Proc Natl Acad Sci U S A 109:16900 Ð 16905. Eddy SR. 2011. Accelerated Profile HMM Searches. PLoS Comput Biol 7:e1002195. Edgar RC. 2010. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26:2460 Ð 2461. Edg ar RC. 2013. UPARSE: highly accurate OTU sequences from microbial amplicon reads. Nat. Methods 10:996 Ð 8. Egger M, Rasigraf O, Sapart CJ, Jilbert T, Jetten MSM, Ršckmann T, van der Veen C, Banda N, Kartal B, Ettwig KF, et al. 2014. Iron mediated anaerobic oxidation of methane in brackish coastal sediments. Environ. Sci. Technol. Elkins JG, Podar M, Graham DE, Makarova KS, Wolf Y, Randau L, Hedlund BP, Brochier Armanet CC, Kunin V, Anderson I, et al. 2008. A korarchaeal genome reveals insights into the evolu tion of the Archaea. Proc. Natl. Acad. Sci. 105:8102 Ð 8107. Eloe Fadrosh EA, Paez Espino D, Jarett J, Dunfield PF, Hedlund BP, Dekas AE, Grasby SE, Brady AL, Dong H, Briggs BR, et al. 2016. Global metagenomic survey reveals a new bacterial candidate phylum in geothermal springs. Nat. Commun. 7:10476. Eme L, Spang A, Lombard J, Stairs CW, Ettema TJG. 2017. Archaea and the origin of eukaryotes. Nat. Rev. Microbiol. 15:711. Engelbrektson A, Kunin V, Wrighton KC, Zvenigorodsky N, Chen F, Ochman H, Hugenholtz P. 2010. Experimental factors affecting PCR based estimates of microbial species richness and evenness. ISME J. 4:642 Ð 7. Eren AM, Esen OC, Quince C, Vineis JH, Morrison HG, Sogin ML, Delmont TO. 2015. Anvi'o: an advanced analysis and visualization platform fo r 'omics data. PeerJ 3:e1319. Ettwig KF, van Alen T, van de Pas Schoonen KT, Jetten MSM, Strous M. 2009. Enrichment and molecular detection of denitrifying methanotrophic bacteria of the NC10 phylum. Appl. Environ. Microbiol. 75:3656 Ð 62. Ettwig KF, Butler MK, Le Paslier D, Pelletier E, Mangenot S, Kuypers MMM, Schreiber F, Dutilh BE, Zedelius J, de Beer D, et al. 2010. Nitrite driven anaerobic methane oxidation by oxygenic bacteria. Nature 464:543 Ð 8. Ettwig KF, Zhu B, Speth D, Keltjens JT, Jetten MSM, Karta l B. 2016. Archaea catalyze iron dependent anaerobic oxidation of methane. Proc. Natl. Acad. Sci. 113:12792 Ð 12796. Evans PN, Parks DH, Chadwick GL, Robbins SJ, Orphan VJ, Golding SD, Tyson GW. 2015. Methane metabolism in the archaeal phylum Bathyarchaeota revealed by genome centric metagenomics. Science (80 . ). 350:434 Ð 438.

PAGE 121

! $$) ! Fabrizio P, Laggerbauer B, Lauber J, Lane WS, Luhrmann R. 1997. An evolutionarily conserved U5 snRNP specific protein is a GTP binding factor closely related to the ribosomal translocas e EF 2. EMBO J 16:4092 Ð 4106. Fan X, Xing P. 2016. Differences in the composition of archaeal communities in sediments from contrasting zones of Lake Taihu. Front. Microbiol. 7:1510. Federal Geographic Data Committee. 2013. Classification of wetlands and de epwater habitats of the United States. FGDC STD 004 2013. Second. Washington, DC. Fenner N, Freeman C. 2011. Drought induced carbon loss in peatlands. Nat. Geosci. 4:895 Ð 900. Fillol M, Auguet J C, Casamayor EO, Borrego CM. 2015. Insights in the ecology and evolutionary history of the Miscellaneous Crenarchaeotic Group lineage. ISME J. 10:665 Ð 677. Fillol M, Sanchez Melsio A, Gich F, M. Borrego C. 2015. Diversity of Miscellaneous Crenarchaeotic Group archaea in freshwater karstic lakes and their segregation b etween planktonic and sediment habitats. FEMS Microbiol. Ecol. 91:fiv020 fiv020. Flynn TMT, Sanford RRA, Ryu H, Bethke CCM, Levine ADA, Ashbolt NNJ, Santo Domingo JW, Fredrickson J, Balkwill D, Bethke CCM, et al. 2013. Functional microbial diversity explai ns groundwater chemistry in a pristine aquifer. BMC Microbiol. 13:146. Frank JA, Reich CI, Sharma S, Weisbaum JS, Wilson BA, Olsen GJ. 2008. Critical evaluation of two primers commonly used for amplification of bacterial 16S rRNA genes. Appl. Environ. Micr obiol. 74:2461 Ð 2470. Freistroffer D V, Pavlov MY, MacDougall J, Buckingham RH, Ehrenberg M. 1997. Release factor RF3 in E.coli accelerates the dissociation of release factors RF1 and RF2 from the ribosome in a GTP dependent manner. EMBO J 16:4126 Ð 4133. Fu L, Niu B, Zhu Z, Wu S, Li W. 2012. CD HIT: accelerated for clustering the next generation sequencing data. Bioinformatics 28:3150 Ð 3152. Furukawa R, Nakagawa M, Kuroyanagi T, Yokobori SI, Yamagishi A. 2017. Quest for Ancestors of Eukaryal Cells Based on Phy logenetic Analyses of Aminoacyl tRNA Synthetases. J Mol Evol 84:51 Ð 66. Gantner S, Andersson AF, Alonso S‡ez L, Bertilsson S. 2011. Novel primers for 16S rRNA based archaeal community analyses in environmental samples. J. Microbiol. Methods 84:12 Ð 8. Gruber Dorninger C, Pester M, Kitzinger K, Savio DF, Loy A, Rattei T, Wagner M, Daims H. 2015. Functionally relevant diversity of closely related Nitrospira in activated sludge. ISME J. 9:643 Ð 655.

PAGE 122

! $$* ! Guy L, Saw JH, Ettema TJ. 2014. The archaeal legacy of eukaryotes: a phylogenomic perspective. Cold Spring Harb Perspect Biol 6:a016022. Haroon MF, Hu S, Shi Y, Imelfort M, Keller J, Hugenholtz P, Yuan Z, Tyson GW. 2013. Anaerobic oxidation of methane coupled to nitrate reduction in a novel archaeal lineage. Nature 500:5 67 Ð 70. Harrell FEJ, Dupont C. 2015. Hmisc: Harrell Miscellaneous R package version 3.16 0. Hashimoto T, Hasegawa M. 1996. Origin and early evolution of eukaryotes inferred from the amino acid sequences of translation elongation factors 1alpha/Tu and 2/G. A dv Biophys 32:73 Ð 120. Hawkins AN, Johnson KW, BrŠuer SL. 2014. Southern Appalachian Peatlands Support High Archaeal Diversity. Microb. Ecol. 67:587 Ð 602. He S, Malfatti SA, McFarland JW, Anderson FE, Pati A, Huntemann M, Tremblay J, Glavina del Rio T, Waldrop MP, Windham Myers L, et al. 2015. Patterns in wetland microbial community composition and functional gene repertoire associated with methane emissions. MBio 6:e00066 15. He Y, Li M, Perumal V, Feng X, Fang J, Xie J, Sievert SM, Wang F, Kallmeyer J, Pockalny R, et al. 2016. Genomic and enzymatic evidence for acetogenesis among multiple lineages of the archaeal phylum Bathyarchaeota widespread in marine sediments. Nat. Microbiol. 1:16035. Herdendorf CE, Klarer DM, Herdendorf RC. 2006. The ecology of O ld Woman Creek, Ohio: an estuarine and watershed profile. 2nd Ed. Ohio Department of Natural Resources, Division of Wildlife, Columbus, Ohio,. Hess M, Sczyrba A, Egan R, Kim T W, Chokhawala H, Schroth G, Luo S, Clark DS, Chen F, Zhang T, et al. 2011. Metag enomic discovery of biomass degrading genes and genomes from cow rumen. Science 331:463 Ð 467. Hoang DT, Chernomor O, von Haeseler A, Minh BQ, Vinh LS. 2018. UFBoot2: Improving the Ultrafast Bootstrap Approximation. Mol Biol Evol 35:518 Ð 522. Horner Devine MC , Lage M, Hughes JB, Bohannan BJM. 2004. A taxa area relationship for bacteria. Nature 432:750 Ð 3. Howe AC, Jansson JK, Malfatti SA, Tringe SG, Tiedje JM, Brown CT. 2014. Tackling soil diversity with the assembly of large, complex metagenomes. Proc. Natl. A cad. Sci. U. S. A. 111:4904 Ð 9. Hu B, Shen L, Lian X, Zhu Q, Liu S, Huang Q, He Z, Geng S, Cheng D, Lou L, et al. 2014. Evidence for nitrite dependent anaerobic methane oxidation as a previously overlooked microbial methane sink in wetlands. Proc. Natl. Aca d. Sci. U. S. A. 111:4495 Ð 500.

PAGE 123

! $$+ ! Hug LA, Baker BJ, Anantharaman K, Brown CT, Probst AJ, Castelle CJ, Butterfield CN, Hernsdorf AW, Amano Y, Ise K, et al. 2016. A new view of the tree of life. Nat. Microbiol. 1:16048. Hug LA, Thomas BC, Sharon I, Brown CT, Sh arma R, Hettich RL, Wilkins MJ, Williams KH, Singh A, Banfield JF. 2016. Critical biogeochemical functions in the subsurface are associated with bacteria from new phyla and little studied lineages. Environ. Microbiol. 18:159 Ð 173. Huntemann M, Ivanova NN, M avromatis K, Tripp HJ, Paez Espino D, Palaniappan K, Szeto E, Pillay M, Chen I MA, Pati A, et al. 2015. IMG Microbial Genome Annotation Pipeline SOP The Standard Operating Procedure of the DOE JGI Microbial Genome Annotation Pipeline (MGAP v.4). Huttenhowe r C, Gevers D, Knight R, Abubucker S, Badger JH, Chinwalla AT, Creasy HH, Earl AM, FitzGerald MG, Fulton RS, et al. 2012. Structure, function and diversity of the healthy human microbiome. Nature 486:207 Ð 214. Hyatt D, Chen G LL, Locascio PF, Land ML, Larim er FW, Hauser LJ. 2010. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11:119. Iglewski BH, Liu P V, Kabat D. 1977. Mechanism of action of Pseudomonas aeruginosa exotoxin Aiadenosine diphosphate ri bosylation of mammalian elongation factor 2 in vitro and in vivo. Infect Immun 15:138 Ð 144. Inagaki F, Suzuki M, Takai K, Oida H, Sakamoto T, Aoki K, Nealson KH, Horikoshi K. 2003. Microbial Communities Associated with Geological Horizons in Coastal Subseaf loor Sediments from the Sea of Okhotsk. Appl. Environ. Microbiol. 69:7224 Ð 7235. Ishoey T, Woyke T, Stepanauskas R, Novotny M, Lasken RS. 2008. Genomic sequencing of single microbial cells from environmental samples. Curr. Opin. Microbiol. 11:198 Ð 204. Ivers on V, Morris RM, Frazar CD, Berthiaume CT, Morales RL, Armbrust EV. 2012. Untangling genomes from metagenomes: Revealing an uncultured class of marine euryarchaeota. Science (80 . ). 335:587 Ð 590. Iwabe N, Kuma K, Hasegawa M, Osawa S, Miyata T. 1989. Evolut ionary relationship of archaebacteria, eubacteria, and eukaryotes inferred from phylogenetic trees of duplicated genes. Proc Natl Acad Sci U S A 86:9355 Ð 9359. Jain C, Rodriguez R LM, Phillippy AM, Konstantinidis KT, Aluru S. 2017. High throughput ANI Analy sis of 90K Prokaryotic Genomes Reveals Clear Species Boundaries. bioRxiv:225342. Jakobsen R. 2007. Redox microniches in groundwater: A model study on the geometric and kinetic conditions required for concomitant Fe oxide reduction, sulfate reduction, and m ethanogenesis. Water Resour. Res. 43.

PAGE 124

! $$, ! Jannasch HW. 1975. Methane oxidation in Lake Kivu (central Africa). Limnol. Oceanogr. 20:860 Ð 864. Jasso Ch‡vez R, Santiago Mart’nez MG, Lira Silva E, Pineda E, Zepeda Rodr’guez A, Belmont D’az J, Encalada R, Saavedra E , Moreno S‡nchez R, Imlay J, et al. 2015. Air Adapted Methanosarcina acetivorans Shows High Methane Production and Develops Resistance against Oxygen Stress.Witt SN, editor. PLoS One 10:e0117331. Jewell TNM, Karaoz U, Bill M, Chakraborty R, Brodie EL, Will iams KH, Beller HR. 2017. Metatranscriptomic Analysis Reveals Unexpectedly Diverse Microbial Metabolism in a Biogeochemical Hot Spot in an Alluvial Aquifer. Front. Microbiol. 8:40. John Parkes R, Brock F, Banning N, Hornibrook ERC, Roussel EG, Weightman AJ , Fry JC. 2012. Changes in methanogenic substrate utilization and communities with depth in a salt marsh, creek sediment in southern England. Estuar. Coast. Shelf Sci. 96:170 Ð 178. Jones P, Binns D, Chang HY, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell A, Nuka G, et al. 2014. InterProScan 5: Genome scale protein function classification. Bioinformatics 30:1236 Ð 1240. Jorgensen R, Purdy AE, Fieldhouse RJ, Kimber MS, Bartlett DH, Merrill AR. 2008. Cholix toxin, a novel ADP ribosylating factor fro m Vibrio cholerae. J Biol Chem 283:10671 Ð 10678. Joshi NA, Fass JN. 2011. Sickle: A sliding window, adaptive, quality based trimming tool for FastQ files. Jungbluth SP, Amend JP, RappŽ MS. 2017. Metagenome sequencing and 98 microbial genomes from Juan de Fu ca Ridge flank subsurface fluids. Sci. data 4:170037. Juottonen H, Hynninen A, Nieminen M, Tuomivirta TT, Tuittila E S, Nousiainen H, Kell DK, YrjŠlŠ K, Tervahauta A, Fritze H. 2012. Methane cycling microbial communities and methane emission in natural and restored peatlands. Appl. Environ. Microbiol. 78:6386 Ð 9. Kadnikov V V., Mardanov A V., Beletsky A V., Shubenkova O V., Pogodaeva T V., Zemskaya TI, Ravin N V., Skryabin KG, KA K, AV M, et al. 2012. Microbial community structure in methane hydrate bearing sediments of freshwater Lake Baikal. FEMS Microbiol. Ecol. 79:348 Ð 358. Kanehisa M, Sato Y, Morishima K. 2016. BlastKOALA and GhostKOALA: KEGG Tools for Functional Characterization of Genome and Metagenome Sequences. J. Mol. Biol. 428:726 Ð 731. Kang DD, Frou la J, Egan R, Wang Z. 2015. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3:e1165. Kantor RS, Wrighton KC, Handley KM, Sharon I, Hug LA, Castelle CJ, Thomas BC, Banfield JF. 2013. Small ge nomes and sparse metabolisms of sediment associated bacteria from four candidate phyla. MBio 4:e00708 13.

PAGE 125

! $%! Karst SM, Dueholm MS, McIlroy SJ, Kirkegaard RH, Nielsen PH, Albertsen M. 2018. Retrieval of a million high quality, full length microbial 16S and 18S rRNA gene sequences without primer bias. Nat. Biotechnol. 36:190 Ð 195. Kashtan N, Roggensack SE, Rodrigue S, Thompson JW, Biller SJ, Coe A, Ding H, Marttinen P, Malmstrom RR, Stocker R, et al. 2014. Single cell genomics reveals hundreds of coexisting subpo pulations in wild Prochlorococcus. Science 344:416 Ð 20. Katoh K, Misawa K, Kuma K, Miyata T. 2002. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30:3059 Ð 3066. Katoh K, Standley DM. 2013. MAFF T multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30:772 Ð 780. Kembel SW, Wu M, Eisen J a., Green JL, Kembel SW. 2012. Incorporating 16S gene copy number information improves estimates of microbial di versity and abundance. PLoS Comput. Biol. 8:e1002743. Kimata Y, Kohno K. 1994. Elongation factor 2 mutants deficient in diphthamide formation show temperature sensitive cell growth. J Biol Chem 269:13497 Ð 13501. Kirschke S, Bousquet P, Ciais P, Saunois M, C anadell JG, Dlugokencky EJ, Bergamaschi P, Bergmann D, Blake DR, Bruhwiler L, et al. 2013. Three decades of global methane sources and sinks. Nat. Geosci. 6:813 Ð 823. Klarer DM, Millie DF. 1992. Aquatic Macrophytes and Algae at Old Woman Creek Estuary and O ther Great Lakes Coastal Wetlands. J. Great Lakes Res. 18:622 Ð 633. Klindworth A, Pruesse E, Schweer T, Peplies J, Quast C, Horn M, Glšckner FO. 2013. Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next generation sequencing base d diversity studies. Nucleic Acids Res. 41:e1. Kotiaho M, Fritze H, MerilŠ P, Juottonen H, LeppŠlŠ M, Laine J, Laiho R, YrjŠlŠ K, Tuittila E S. 2010. Methanogen activity in relation to water table level in two boreal fens. Biol. Fertil. Soils 46:567 Ð 575. K otsyurbenko OR, Friedrich MW, Simankova M V, Nozhevnikova AN, Golyshin PN, Timmis KN, Conrad R. 2007. shift from acetoclastic to H2 dependent methanogenesis in a west Siberian peat bog at low pH values and isolation of an acidophilic Methanobacterium strai n. Appl. Environ. Microbiol. 73:2344 Ð 8. Kubo K, Lloyd KG, Biddle JF, Amann R, Teske A, Knittel K. 2012. Archaea of the Miscellaneous Crenarchaeotal Group are abundant, diverse and widespread in marine sediments. ISME J. 6:1949 Ð 1965.

PAGE 126

! $%$ ! Lane CE, van den Heuvel K, Kozera C, Curtis BA, Parsons BJ, Bowman S, Archibald JM. 2007. Nucleomorph genome of Hemiselmis andersenii reveals complete intron loss and compaction as a driver of protein structure and function. Proc Natl Acad Sci U S A 104:19908 Ð 19913. La ne DJ, Pace B, Olsen GJ, Stahl DA, Sogin ML, Pace NR. 1985. Rapid determination of 16S ribosomal RNA sequences for phylogenetic analyses. Proc. Natl. Acad. Sci. 82:6955 Ð 6959. Lang K, Schuldes J, Klingl A, Poehlein A, Daniel R, Brune A. 2015. New mode of en ergy metabolism in the seventh order of methanogens as revealed by comparative genome analysis of "Candidatus Methanoplasma termitum." Appl. Environ. Microbiol. 81:1338 Ð 1352. Langmead B, Salzberg SL. 2012. Fast gapped read alignment with Bowtie 2. Nat Meth ods 9:357 Ð 359. Laso PŽrez R, Wegener G, Knittel K, Widdel F, Harding KJ, Krukenberg V, Meier D V., Richter M, Tegetmeyer HE, Riedel D, et al. 2016. Thermophilic archaea activate butane via alkyl coenzyme M formation. Nature 539:396 Ð 401. Lazar CS, Baker BJ, Seitz K, Hyde AS, Dick GJ, Hinrichs K UU, Teske AP. 2016. Genomic evidence for distinct carbon substrate preferences and ecological niches of Bathyarchaeota in estuarine sediments. Environ. Microbiol. 18:1200 Ð 1211. Lazar CS, Baker BJ, Seitz KW, Teske AP. 2017. Genomic reconstruction of multiple lineages of uncultured benthic archaea suggests distinct biogeochemical roles and ecological niches. ISME J.:10.1038/ismej.2016.189. Lazar CS, Biddle JF, Meador TB, Blair N, Hinrichs K U, Teske AP. 2015. Environment al controls on intragroup diversity of the uncultured benthic archaea of the miscellaneous Crenarchaeotal group lineage naturally enriched in anoxic sediments of the White Oak River estuary (North Carolina, USA). Environ. Microbiol. 17:2228 Ð 38. Lee HJ, Jeo ng SE, Kim PJ, Madsen EL, Jeon CO. 2015. High resolution depth distribution of Bacteria, Archaea, methanotrophs, and methanogens in the bulk and rhizosphere soils of a flooded rice paddy. Front. Microbiol. 6:639. Lee HJ, Kim SY, Kim PJ, Madsen EL, Jeon CO. 2014. Methane emission and dynamics of methanotrophic and methanogenic communities in a flooded rice field ecosystem. FEMS Microbiol. Ecol. 88:195 Ð 212. Letunic I, Bork P. 2007. Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics 23:127 Ð 8. Ley RE, Harris JK, Wilcox J, Spear JR, Miller SR, Bebout BM, Maresca JA, Bryant DA, Sogin ML, Pace NR. 2006. Unexpected diversity and complexity of the Guerrero Negro hypersaline microbial mat. Appl. Environ. Microb iol. 72:3685 Ð 95.

PAGE 127

! $%% ! Li D, Liu C M, Luo R, Sadakane K, Lam T W. 2015. MEGAHIT: An ultra fast single node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics:btv033 . Li L, Hong Y, Luan G, Mosel M, Malik M, Drlica K , Zhao X. 2014. Ribosomal elongation factor 4 promotes cell death associated with lethal stress. MBio 5:e01708. Li Q, Wang F, Chen Z, Yin X, Xiao X. 2012. Stratified active archaeal communities in the sediments of Jiulong River estuary, China. Front. Micro biol. 3:311. Lin HH, Liao YC. 2016. Accurate binning of metagenomic contigs via automated clustering sequences using information of genomic signatures and marker genes. Sci Rep 6:24175. Liu S, Wiggins JF, Sreenath T, Kulkarni AB, Ward JM, Leppla SH. 2006. Dph3, a small protein required for diphthamide biosynthesis, is essential in mouse development. Mol Cell Biol 26:3835 Ð 3841. Liu Y, Zhou Z, Pan J, Baker BJ, Gu J D, Li M. 2018. Comparative genomic inference suggests mixotrophic lifestyle for Thorarchaeota. ISME J. Llir—s M, Gich F, Plasencia A, Auguet J CC, Darchambeau F, Casamayor EO, Descy J PP, Borrego C. 2010. Vertical distribution of ammonia oxidizing crenarchaeota and methanogens in the epipelagic waters of lake kivu (rwanda democratic republi of the c ongo). Appl. Environ. Microbiol. 76:6853 Ð 6863. Lloyd KG, Schreiber L, Petersen DG, Kjeldsen KU, Lever MA, Steen AD, Stepanauskas R, Richter M, Kleindienst S, Lenk S, et al. 2013. Predominant archaea in marine sediments degrade detrital proteins. Nature 496 :215 Ð 8. Lozupone CA, Knight R. 2007. Global patterns in bacterial diversity. Proc. Natl. Acad. Sci. U. S. A. 104:11436 Ð 40. Lozupone C, Lladser ME, Knights D, Stombaugh J, Knight R. 2011. UniFrac: an effective distance metric for microbial community compari son. ISME J. 5:169 Ð 172. Ludwig W, Strunk O, Westram R, Richter L, Meier H, Yadhukumar, Buchner A, Lai T, Steppi S, Jobb G, et al. 2004. ARB: a software environment for sequence data. Nucleic Acids Res. 32:1363 Ð 71. Luton PE, Wayne JM, Sharp RJ, Riley PW. 20 02. The mcrA gene as an alternative to 16S rRNA in the phylogenetic analysis of methanogen populations in landfill b. Microbiology 148:3521 Ð 3530. Makarova KS, Wolf YI, Koonin E V. 2015. Archaeal Clusters of Orthologous Genes (arCOGs): An Update and Application for Analysis of Shared Features between Thermococcales, Methanococcales, and Methanobacteriales. Life 5:818 Ð 840.

PAGE 128

! $%& ! Martiny AC, Treseder K, Pusch G. 2013. Phylogenetic conservatism of functional traits in microorganisms. ISME J. 7:830 Ð 8. Martiny JBH, Eisen JA, Penn K, Allison SD, Horner Devine MC. 2011. Drivers of bacterial beta diversity depend on spatial scale. Proc. Natl. Acad. Sci. U. S. A. 108:7850 Ð 4. Martiny JBH, Jones SE, Lennon JT, Martiny AC. 2015. Microbiomes in light of traits: A phylog enetic perspective. Science 350:aac9323. McKay LJ, Hatzenpichler R, Inskeep WP, Fields MW. 2017. Occurrence and expression of novel methyl coenzyme M reductase gene (mcrA) variants in hot spring sediments. Sci. Rep. 7. McLaren MR, Callahan BJ. 2018. In Nat ure, There Is Only Diversity. MBio 9:e02149 17. McMurdie PJ, Holmes S. 2013. phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data.Watson M, editor. PLoS One 8:e61217. McMurdie PJ, Holmes S. 2014. Waste not, wa nt not: why rarefying microbiome data is inadmissible. PLoS Comput. Biol. 10:e1003531. Meng J, Wang F, Wang F, Zheng Y, Peng X, Zhou H, Xiao X. 2009. An uncultivated crenarchaeota contains functional bacteriochlorophyll a synthase. ISME J. 3:106 Ð 116. Meng J, Xu J, Qin D, He Y, Xiao X, Wang F. 2014. Genetic and functional properties of uncultivated MCG archaea assessed by metagenome and gene expression analyses. ISME J. 8:650 Ð 9. Meng L, Hess PGM, Mahowald NM, Yavitt JB, Riley WJ, Subin ZM, Lawrence DM, Swens on SC, Jauhiainen J, Fuka DR. 2012. Sensitivity of wetland methane emissions to model assumptions: application and model testing against site observations. Biogeosciences 9:2793 Ð 2819. Miller CS, Baker BJ, Thomas BC, Singer SW, Banfield JF. 2011. EMIRGE: re construction of full length ribosomal genes from microbial community short read sequencing data. Genome Biol. 12:R44. Miller CS, Handley KM, Wrighton KC, Frischkorn KR, Thomas BC, Banfield JF. 2013. Short read assembly of full length 16S amplicons reveals bacterial diversity in subsurface sediments.Gilbert JA, editor. PLoS One 8:e56018. Mitsch W, Reeder B. 1992. Nutrient and hydrologic budgets of a great lakes coastal freshwater wetland during a drought year. Wetl. Ecol. Manag. 1:211 Ð 222. Mondav R, Woodcroft BJ, Kim E H, McCalley CK, Hodgkins SB, Crill PM, Chanton J, Hurst GB, VerBerkmoes NC, Saleska SR, et al. 2014. Discovery of a novel methanogen prevalent in thawing permafrost. Nat. Commun. 5.

PAGE 129

! $%' ! Morrison HG, McArthur AG, Gillin FD, Aley SB, Adam RD, Olsen GJ, Best AA, Cande WZ, Chen F, Cipriano MJ, et al. 2007. Genomic minimalism in the early diverging intestinal parasite Giardia lamblia. Science (80 . ). 317:1921 Ð 1926. Murray J, Savva CG, Shin BS, Dever TE, Ramakrishnan V, Fernandez IS. 2016. Struct ural characterization of ribosome recruitment and translocation by type IV IRES. Elife 5. Nahlik AM, Mitsch WJ. 2010. Methane Emissions From Created Riverine Wetlands. Wetlands 30:783 Ð 793. Narrowe AB, Angle JC, Daly RA, Stefanik KC, Wrighton KC, Miller CS. 2017. High resolution sequencing reveals unexplored archaeal diversity in freshwater wetland soils. Environ. Microbiol. 19. Nazaries L, Murrell JC, Millard P, Baggs L, Singh BK. 2013. Methane, microbes and models: fundamental understanding of the soil met hane cycle for future predictions. Environ. Microbiol. 15:2395 Ð 417. Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ. 2015. IQ TREE: a fast and effective stochastic algorithm for estimating maximum likelihood phylogenies. Mol Biol Evol 32:268 Ð 274. Nobu MK, N arihiro T, Kuroda K, Mei R, Liu W TT. 2016. Chasing the elusive Euryarchaeota class WSA2: Genomes reveal a uniquely fastidious methyl reducing methanogen. ISME J. 10:2478 Ð 2487. O'Brien SL, Gibbons SM, Owens SM, Hampton Marcell J, Johnston ER, Jastrow JD, G ilbert JA, Meyer F, Antonopoulos DA. 2016. Spatial scale drives patterns in soil bacterial diversity. Environ. Microbiol. 18:2039 Ð 2051. Oksanen J, Blanchet FG, Kindt R, Legendre P, Minchin PR, O'Hara RB, Simpson GL, Solymos P, Stevens MHH, Wagner H. 2013. Package "vegan." R Packag. ver. 2.0 Ð 8:254. Ong SH, Kukkillaya VU, Wilm A, Lay C, Ho EXP, Low L, Hibberd ML, Nagarajan N. 2013. Species identification and profiling of complex microbial communities using shotgun Illumina sequencing of 16S rRNA amplicon sequ ences. PLoS One 8:e60811. Ortiz Alvarez R, Casamayor EO. 2016. High occurrence of Pacearchaeota and Woesearchaeota (Archaea superphylum DPANN) in the surface waters of oligotrophic high altitude lakes. Environ. Microbiol. Rep. 8:210 Ð 217. Ortiz PA, Ulloque R, Kihara GK, Zheng H, Kinzy TG. 2006. Translation elongation factor 2 anticodon mimicry domain mutants affect fidelity and diphtheria toxin resistance. J Biol Chem 281:32639 Ð 32648. Ounit R, Wanamaker S, Close TJ, Lonardi S. 2015. CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k mers. BMC Genomics 16:236.

PAGE 130

! $%( ! Pace NR. 1997. A Molecular View of Microbial Diversity and the Biosphere. Science (80 . ). 276:734 Ð 740. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. 2015. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25:1043 Ð 1055. Parks DH, Rinke C, Chuvochina M, Chaumeil P AA, Woodcroft BJ, Evans PN, Hugenholtz P, Tyson GW. 2017. Recov ery of nearly 8,000 metagenome assembled genomes substantially expands the tree of life. Nat Microbiol 2:1533 Ð 1542. Paul K, Nonoh JO, Mikulski L, Brune A. 2012. "Methanoplasmatales," thermoplasmatales related archaea in termite guts and other environments, are the seventh order of methanogens. Appl. Environ. Microbiol. 78:8245 Ð 8253. Peng Y, Leung HCM, Yiu SM, Chin FYL. 2012. IDBA UD: a de novo assembler for single cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28:1420 Ð 8. Pette rsen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE. 2004. UCSF Chimera -a visualization system for exploratory research and analysis. J Comput Chem 25:1605 Ð 1612. Pinto AJ, Raskin L. 2012. PCR biases distort bacterial and archaeal co mmunity structure in pyrosequencing datasets. PLoS One 7:e43093. Podar M, Anderson I, Makarova KS, Elkins JG, Ivanova N, Wall MA, Lykidis A, Mavromatis K, Sun H, Hudson ME, et al. 2008. A genomic analysis of the archaeal system Ignicoccus hospitalis Nanoar chaeum equitans. Genome Biol 9:R158. Prasse CE, Baldwin AH, Yarwood SA. 2015. Site history and edaphic features override the influence of plant species on microbial communities in restored tidal freshwater wetlands. Appl. Environ. Microbiol. 81:3482 Ð 91. Pr eston MD, Smemo KA, McLaughlin JW, Basiliko N. 2012. Peatland Microbial Communities and Decomposition Processes in the James Bay Lowlands, Canada. Front. Microbiol. 3:70. Probst A, Moissl Eichinger C. 2015. "Altiarchaeales": Uncultivated Archaea from the S ubsurface. Life 5:1381 Ð 1395. Pruesse E, Peplies J, Glšckner FO. 2012. SINA: accurate high throughput multiple sequence alignment of ribosomal RNA genes. Bioinformatics 28:1823 Ð 9. Qin Y, Polacek N, Vesper O, Staub E, Einfeldt E, Wilson DN, Nierhaus KH. 2006 . The Highly Conserved LepA Is a Ribosomal Elongation Factor that Back Translocates the Ribosome. Cell 127:721 Ð 733.

PAGE 131

! $%) ! Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glšckner FO. 2013. The SILVA ribosomal RNA gene database project: Improved data processing and web based tools. Nucleic Acids Res. 41:D590 Ð D596. R Core Team. 2014. R: A language and environment for statistical computing. Raghoebarsing AA, Pol A, van de Pas Schoonen KT, Smolders AJP, Ettwig KF, Rijpstra WIC, Scho uten S, DamstŽ JSS, Op den Camp HJM, Jetten MSM, et al. 2006. A microbial consortium couples anaerobic methane oxidation to denitrification. Nature 440:918 Ð 21. Rastogi G, Sani RK, Peyton BM, Moberly JG, Ginn TR. 2009. Molecular Studies on the Microbial Div ersity Associated with Mining Impacted Coeur d'Alene River Sediments. Microb. Ecol. 58:129 Ð 139. Raymann K, Brochier Armanet C, Gribaldo S. 2015. The two domain tree of life is linked to a new root for the Archaea. Proc Natl Acad Sci U S A 112:6670 Ð 6675. Ri ley WJ, Subin ZM, Lawrence DM, Swenson SC, Torn MS, Meng L, Mahowald NM, Hess P. 2011. Barriers to predicting changes in global terrestrial methane fluxes: analyses using CLM4Me, a methane biogeochemistry model integrated in CESM. Biogeosciences 8:1925 Ð 195 3. Rinke C, Schwientek P, Sczyrba A, Ivanova NN, Anderson IJ, Cheng J F, Darling AE, Malfatti S, Swan BK, Gies E a, et al. 2013. Insights into the phylogeny and coding potential of microbial dark matter. Nature 499:431 Ð 437. Rodnina M V, Savelsbergh A, Katu nin VI, Wintermeyer W. 1997. Hydrolysis of GTP by elongation factor G drives tRNA movement on the ribosome. Nature 385:37 Ð 41. Roy A, Yang J, Zhang Y. 2012. COFACTOR: an accurate comparative algorithm for structure based protein function annotation. Nucleic Acids Res 40:W471 7. Sams— R, Garc’a J. 2013. Bacteria distribution and dynamics in constructed wetlands based on modelling results. Sci. Total Environ. 461:430 Ð 440. Schaffrath R, Abdel Fattah W, Klassen R, Stark MJ. 2014. The diphthamide modification pathway from Saccharomyces cerevisiae -revisited. Mol Microbiol 94:1213 Ð 1226. Schloss PD, Girard R, Martin T, Edwards J, Thrash JC, Arbor A, Rouge B. 2016. The status of the microbial census: an update. bioRxiv Prepr.:38646. Schloss PD, Handelsman J. 2004. Status of the microbial census. Microbiol Mol Biol Rev 68:686 Ð 691. Schubert CJ, Vazquez F, Lšsekann Behrens T, Knittel K, Tonolla M, Boetius A. 2011. Evidence for anaerobic oxidation of methane in sediments of a freshwater system (Lago di Cadagno). FEMS M icrobiol. Ecol. 76:26 Ð 38.

PAGE 132

! $%* ! Schulz F, Eloe Fadrosh EA, Bowers RM, Jarett J, Nielsen T, Ivanova NN, Kyrpides NC, Woyke T. 2017. Towards a balanced view of the bacterial tree of life. Microbiome 5:140. Schwarz JIK, Eckert W, Conrad R. 2007. Community structure of Archaea and Bacteria in a profundal lake sediment Lake Kinneret (Israel). Syst. Appl. Microbiol. 30:239 Ð 254. Segarra KEA, Schubotz F, Samarkin V, Yoshinaga MY, Hinrichs K U, Joye SB. 2015. High rates of anaerobic methane oxidation in freshwater wetland s reduce potential atmospheric methane emissions. Nat. Commun. 6:7477. Seitz KW, Lazar CS, Hinrichs K U, Teske AP, Baker BJ. 2016. Genomic reconstruction of a novel, deeply branched sediment archaeal phylum with pathways for acetogenesis and sulfur reducti on. ISME J. Seyler LM, McGuinness LM, Kerkhof LJ. 2014. Crenarchaeal heterotrophy in salt marsh sediments. ISME J. 8:1534 Ð 1543. Shapiro BJ, Polz MF. 2014. Ordering microbial diversity into ecologically and genetically cohesive units. Trends Microbiol. 22:2 35 Ð 247. Sharon I, Morowitz MJ, Thomas BC, Costello EK, Relman DA, Banfield JF. 2012. Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization. Genome Res. 23:111 Ð 120. Shen L D, Wu H S, Gao Z Q, Liu X, Li J. 2016. Comparison of community structures of Candidatus Methylomirabilis oxyfera like bacteria of NC10 phylum in different freshwater habitats. Sci. Rep. 6:25647. Singer E, Bushnell B, Coleman Derr D, Bowman B, Bowers RM, Levy A, Gies EA, Cheng J F, Copeland A, Klenk H P, et al. 2016. High resolution phylogenetic microbial community profiling. ISME J. 10:2020 Ð 2032. Sivan O, Adler M, Pearson A, Gelman F, Bar Or I, John SG, Eckert W. 2011. Geochemical evidence for iron mediated anaer obic oxidation of methane. Limnol. Oceanogr. 56:1536 Ð 1544. Sivan O, Antler G, Turchyn A V., Marlow JJ, Orphan VJ. 2014. Iron oxides stimulate sulfate driven anaerobic methane oxidation in seeps. Proc. Natl. Acad. Sci. 111:E4139 4147. Smemo KA, Yavitt JB. 2 011. Anaerobic oxidation of methane: an underappreciated aspect of methane cycling in peatland ecosystems? Biogeosciences 8:779 Ð 793. S¿rensen KB, Teske A. 2006. Stratified communities of active archaea in deep marine subsurface sediments. Appl. Environ. Mi crobiol. 72:4596 Ð 4603.

PAGE 133

! $%+ ! Sorokin DiY, Makarova KS, Abbas B, Ferrer M, Golyshin PN, Galinski EA, Ciordia S, Mena MC, Merkel AY, Wolf YI, et al. 2017. Discovery of extremely halophilic, methyl reducing euryarchaea provides insights into the evolutionary origin of methanogenesis. Nat. Microbiol. 2. Spahn CM, Gomez Lorenzo MG, Grassucci RA, Jorgensen R, Andersen GR, Beckmann R, Penczek PA, Ballesta JP, Frank J. 2004. Domain movements of elongation factor eEF2 and the eukaryotic 80S ribosome fac ilitate tRNA translocation. EMBO J 23:1008 Ð 1019. Spang A, Caceres EF, Ettema TJG. 2017. Genomic exploration of the diversity, ecology, and evolution of the archaeal domain of life. Science (80 . ). 357. Spang A, Eme L, Saw JH, Caceres EF, Zaremba Niedzwied zka K, Lombard J, Guy L, Ettema TJG. 2018. Asgard archaea are the closest prokaryotic relatives of eukaryotes. PLOS Genet. 14:e1007080. Spang A, Ettema TJG. 2017. Archaeal evolution: The methanogenic roots of Archaea. Nat. Microbiol. 2:17109. Spang A, Saw JH, Jorgensen SL, Zaremba Niedzwiedzka K, Martijn J, Lind AE, van Eijk R, Schleper C, Guy L, Ettema TJG. 2015. Complex archaea that bridge the gap between prokaryotes and eukaryotes. Nature 521:173 Ð 179. Staley JT, Konopka A. 1985. Measurement of in Situ Ac tivities of Nonphotosynthetic Microorganisms in Aquatic and Terrestrial Habitats. Annu. Rev. Microbiol. 39:321 Ð 346. Stamatakis A. 2014. RAxML version 8: A tool for phylogenetic analysis and post analysis of large phylogenies. Bioinformatics 30:1312 Ð 1313. S tein LY, La Duc MT, Grundl TJ, Nealson KH. 2001. Bacterial and archaeal populations associated with freshwater ferromanganous micronodules and sediments. Environ. Microbiol. 3:10 Ð 18. Stewart RD, Auffret MD, Warr A, Wiser AH, Press MO, Langford KW, Liachko I, Snelling TJ, Dewhurst RJ, Walker AW, et al. 2018. Assembly of 913 microbial genomes from metagenomic sequencing of the cow rumen. Nat. Commun. 9:870. Stoddard SF, Smith BJ, Hein R, Roller BRK, Schmidt TM. 2015. rrnDB: Improved tools for interpreting rRN A gene abundance in bacteria and archaea and a new foundation for future development. Nucleic Acids Res. 43:D593 Ð D598. Su X, Chen W, Lee W, Jiang H, Zhang S, Lin H. 2012. YBR246W is required for the third step of diphthamide biosynthesis. J Am Chem Soc 134 :773 Ð 776. Su X, Lin Z, Chen W, Jiang H, Zhang S, Lin H. 2012. Chemogenomic approach identified yeast YLR143W as diphthamide synthetase. Proc Natl Acad Sci U S A 109:19983 Ð 19987.

PAGE 134

! $%, ! Suematsu T, Yokobori S, Morita H, Yoshinari S, Ueda T, Kita K, Takeuc hi N, Watanabe Y. 2010. A bacterial elongation factor G homologue exclusively functions in ribosome recycling in the spirochaete Borrelia burgdorferi. Mol Microbiol 75:1445 Ð 1454. Sun CL, Brauer SL, Cadillo Quiroz H, Zinder SH, Yavitt JB. 2012. Seasonal cha nges in methanogenesis and methanogenic community in three peatlands, new york state. Front. Microbiol. 3:81. Sun D LD L, Jiang X, Wu QL, Zhou N YN Y. 2013. Intragenomic Heterogeneity of 16S rRNA Genes Causes Overestimation of Prokaryotic Diversity. Appl. Environ. Microbiol. 79:5962 Ð 5969. Suzek BE, Wang Y, Huang H, McGarvey PB, Wu CH, UniProt C. 2015. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 31:926 Ð 932. Teske A, S¿rensen KB. 2008. U ncultured archaea in deep marine subsurface sediments: have we caught them all? ISME J. 2:3 Ð 18. Tholen A, Pester M, Brune A. 2007. Simultaneous methanogenesis and oxygen reduction by Methanobrevibacter cuticularis at low oxygen fluxes. FEMS Microbiol. Ecol . 62:303 Ð 312. Timmers PH, Suarez Zuluaga DA, van Rossem M, Diender M, Stams AJ, Plugge CM. 2015. Anaerobic oxidation of methane associated with sulfate reduction in a natural freshwater gas source. ISME J. 10:1400 Ð 1412. Tringe SG, von Mering C, Kobayashi A , Salamov AA, Chen K, Chang HW, Podar M, Short JM, Mathur EJ, Detter JC, et al. 2005. Comparative metagenomics of microbial communities. Science 308:554 Ð 7. Tsuboi M, Morita H, Nozaki Y, Akama K, Ueda T, Ito K, Nierhaus KH, Takeuchi N. 2009. EF G2mt Is an E xclusive Recycling Factor in Mammalian Mitochondrial Protein Synthesis. Mol. Cell 35:502 Ð 510. Tully BJ, Graham ED, Heidelberg JF. 2018. The reconstruction of 2,631 draft metagenome assembled genomes from the global oceans. Sci. Data 5. Tyson GW, Chapman J, Hugenholtz P, Allen EE, Ram RJ, Richardson PM, Solovyev V V., Rubin EM, Rokhsar DS, Banfield JF. 2004. Community structure and metabolism through reconstruction of microbial genomes from the environment. 428:37 Ð 43. Uthman S, BŠr C, Scheidt V, Liu S, ten H ave S, Giorgini F, Stark MJR, Schaffrath R. 2013. The Amidation Step of Diphthamide Biosynthesis in Yeast Requires DPH6, a Gene Identified through Mining the DPH1 DPH5 Interaction Network.Andersen GR, editor. PLoS Genet. 9:e1003334.

PAGE 135

! $&! Vaksmaa A, Jet ten MSM, Ettwig KF, LŸke C. 2017. McrA primers for the detection and quantification of the anaerobic archaeal methanotroph "Candidatus Methanoperedens nitroreducens." Appl. Microbiol. Biotechnol. 101:1631 Ð 1641. Vanwonterghem I, Evans PN, Parks DH, Jensen PD, Woodcroft BJ, Hugenholtz P, Tyson GW. 2016. Methylotrophic methanogenesis discovered in the archaeal phylum Verstraetearchaeota. Nat. Microbiol. 1:16170. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, et al. 2001. The sequence of the human genome. Science 291:1304 Ð 51. Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, Wu D, Paulsen I, Nelson KE, Nelson W, et al. 2004. Environmental Genome Shotgun Sequencing of the Sargasso Sea. Scien ce (80 . ). 304:66 Ð 74. Walters WA, Caporaso JG, Lauber CL, Berg Lyons D, Fierer N, Knight R. 2011. PrimerProspector: De novo design and taxonomic analysis of barcoded polymerase chain reaction primers. Bioinformatics 27:1159 Ð 1161. Wang J, Krause S, Muyzer G, Meima Franke M, Laanbroek HJ, Bodelier PLE. 2012. Spatial patterns of iron and methane oxidizing bacterial communities in an irregularly flooded, riparian wetland. Front. Microbiol. 3:64. Wang Q, Garrity GM, Tiedje JM, Cole JR. 2007. Naive Bayesian cla ssifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl. Environ. Microbiol. 73:5261 Ð 5267. Wang Y, Sheng H F, He Y, Wu J Y, Jiang Y X, Tam NF Y, Zhou H W. 2012. Comparison of the levels of bacterial diversity in freshwater, int ertidal wetland, and marine sediments by using millions of illumina tags. Appl. Environ. Microbiol. 78:8264 Ð 71. Warnes GR, Bolker B, Bonebakker L, Gentleman R, Huber W, Liaw A, Lumley T, Maechler M, Magnusson A, Moeller S, et al. 2016. gplots: Various R Pr ogramming Tools for Plotting Data. Waterhouse AM, Procter JB, Martin DM, Clamp M, Barton GJ. 2009. Jalview Version 2 -a multiple sequence alignment editor and analysis workbench. Bioinformatics 25:1189 Ð 1191. Webb TR, Cross SH, McKie L, Edgar R, Vizor L, Ha rrison J, Peters J, Jackson IJ. 2008. Diphthamide modification of eEF2 requires a J domain protein and is essential for normal development. J Cell Sci 121:3140 Ð 3145. Webster G, O'Sullivan LA, Meng Y, Williams AS, Sass AM, Watkins AJ, Parkes RJ, Weightman A J. 2015. Archaeal community diversity and abundance changes along a natural salinity gradient in estuarine sediments. FEMS Microbiol. Ecol. 91:1 Ð 18. Welte C, Deppenmeier U. 2014. Bioenergetics and anaerobic respiratory chains of aceticlastic methanogens. B iochim. Biophys. Acta Bioenerg. 1837:1130 Ð 1147.

PAGE 136

! $&$ ! Welte CU, Rasigraf O, Vaksmaa A, Versantvoort W, Arshad A, Op den Camp HJM, Jetten MSM, LŸke C, Reimann J. 2016. Nitrate and nitrite dependent anaerobic oxidation of methane. Environ. Microbiol. Rep. 8:941 Ð 955. Werner JJ, Koren O, Hugenholtz P, DeSantis TZ, Walters WA, Caporaso JG, Angenent LT, Knight R, Ley RE. 2012. Impact of training sets on classification of high throughput bacterial 16s rRNA gene surveys. ISME J. 6:94 Ð 103. Wickham H. 2009. ggplot2: ele gant graphics for data analysis. New York: Springer. Wilhelm LJ, Tripp HJ, Givan SA, Smith DP, Giovannoni SJ. 2007. Natural variation in SAR11 marine bacterioplankton genomes inferred from metagenomic data. Biol. Direct 2:27. Williams TA, Foster PG, Nye TM , Cox CJ, Embley TM. 2012. A congruent phylogenomic signal places eukaryotes within the Archaea. Proc Biol Sci 279:4870 Ð 4879. Williams TA, Szollosi GJ, Spang A, Foster PG, Heaps SE, Boussau B, Ettema TJG, Embley TM. 2017. Integrative modeling of gene and g enome evolution roots the archaeal tree of life. Proc Natl Acad Sci U S A 114:E4602 Ð E4611. Woese CR, Fox GE. 1977. Phylogenetic structure of the prokaryotic domain: The primary kingdoms. Proc. Natl. Acad. Sci. 74:5088 Ð 5090. Woese CR, Fox GE, Zablen L, Uchi da T, Bonen L, Pechman K, Lewis BJ, Stahl D. 1975. Conservation of primary structure in 16S ribosomal RNA. Nature 254:83 Ð 86. Wrede C, Brady S, Rockstroh S, Dreier A, Kokoschka S, Heinzelmann SM, Heller C, Taviani M, Daniel R, Hoppert M. 2012. Aerobic and a naerobic methane oxidation in terrestrial mud volcanoes in the Northern Apennines. Sediment. Geol. 263:210 Ð 219. Wright ES, Yilmaz LS, Noguera DR. 2012. DECIPHER, a search based approach to chimera identification for 16S rRNA sequences. Appl. Environ. Micro biol. 78:717 Ð 25. Wrighton KC, Thomas BC, Sharon I, Miller CS, Castelle CJ, VerBerkmoes NC, Wilkins MJ, Hettich RL, Lipton MS, Williams KH, et al. 2012. Fermentation, Hydrogen, and Sulfur Metabolism in Multiple Uncultivated Bacterial Phyla. Science (80 . ). 337:1661 Ð 1665. Wu D, Hugenholtz P, Mavromatis K, Pukall R, Dalin E, Ivanova NN, Kunin V, Goodwin L, Wu M, Tindall BJ, et al. 2009. A phylogeny driven genomic encyclopaedia of Bacteria and Archaea. Nature 462:1056 Ð 1060. Wu M, Scott AJ. 2012. Phylogenomic a nalysis of bacterial and archaeal sequences with AMPHORA2. Bioinformatics 28:1033 Ð 1034. Wu YW, Simmons BA, Singer SW. 2016. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32:605 Ð 607.

PAGE 137

! $&% ! Xiang X, Wang R, Wang H, Gong L, Man B, Xu Y. 2017. Distribution of Bathyarchaeota Communities Across Different Terrestrial Settings and Their Potential Ecological Functions. Sci. Rep. 7. Yang J, Yan R, Roy A, Xu D, Poisson J, Zhang Y. 2015. The I TASSER Suite: protein structure and function prediction. Nat Methods 12:7 Ð 8. Ye W, Liu X, Lin S, Tan J, Pan J, Li D, Yang H. 2009. The vertical distribution of bacterial and archaeal communities in the water and sediment of Lake Taihu. FEMS Microbiol. Ecol. 70:2 63 Ð 276. Youngblut ND, Wirth JS, Henriksen JR, Smith M, Simon H, Metcalf WW, Whitaker RJ. 2015. Genomic and phenotypic differentiation among Methanosarcina mazei populations from Columbia River sediment. ISME J. 9:2191 Ð 2205. Yu T, Liang Q, Niu M, Wang F. 20 17. High occurrence of Bathyarchaeota (MCG) in the deep sea sediments of South China Sea quantified using newly designed PCR primers. Environ. Microbiol. Rep. 9:374 Ð 382. Yu YR, You LR, Yan YT, Chen CM. 2014. Role of OVCA1/DPH1 in craniofacial abnormalities of Miller Dieker syndrome. Hum Mol Genet 23:5579 Ð 5596. Zaremba Niedzwiedzka K, Caceres EF, Saw JH, BŠckstršm D, Juzokaite L, Vancaester E, Seitz KW, Anantharaman K, Starnawski P, Kjeldsen KU, et al. 2017. Asgard archaea illuminate the origin of eukaryotic cellular complexity. Nature 541:353 Ð 358. Zhang G, Tian J, Jiang N, Guo X, Wang Y, Dong X. 2008. Methanogen community in Zoige wetland of Tibetan plateau and phenotypic characterization of a dominant uncultured methanogen cluster ZC I. Environ. Microbiol. 10:1850 Ð 1860. Zhang Y, Liu S, Lajoie G, Merrill AR. 2008. The role of the diphthamide containing loop within eukaryotic elongation factor 2 in ADP ribosylation by Pseudomonas aeruginosa exotoxin A. Biochem. J. 413.

PAGE 138

! $&& ! APPENDIX A Supplementary Material For Chapter II Figure S2.1 Geochemical measures correspond to depth gradient and site trends. Points represent individual sample geochemical measurements with trendlines connecting mean values for each set of site/dept h replicates showing trends with increasing soil depth (left to right). Symbols are as in Figure 2.

PAGE 139

! $&' ! Figure S2.2 Soil samples cluster by geochemical parameters by water cover and depth. Shown is NMDS ordination of Euclidean intersample distances of centered geochemical measures. Plots are arranged as in Figure 2 with the same ordination plot shown twice to emphasize the samples from Transect 2 (left panel) and from Transect 3 (right panel). Samples from the opposite transect are shown in gray.

PAGE 140

! $&( ! Figure S2.3 Validation of domain specificity of primers. a) In silico amplification of the Silva NR99 (v119) database using pan domain primers (V4; F515, 806R) produced predicted amplicons with a domain distribution largely matching the template database. Domain specific V3 V6 primers used in this work produced predicted amp licons almost exclusively of the target domain, permitting deeper sequencing of the archaea in a mixed community than is possible using the V4 primer set. b) Actual sequencing of V3 V6 domain specific amplicons produced reads which map almost exclusively t o the expected domain. The percent of reads mapping to each domain for each sequencing run for 84 samples is plotted, with median and quartile box and whisker plots summarizing the distributions. Note the log 10 scale, and the almost exclusive mapping of reads to the expected domain. 0.00 0.25 0.50 0.75 1.00 Unamplified Silva database V4 P r imer set P ercent of sequences Domain Archaea Bacte r ia Euka r y ota In silico r imers Archaeal sequencing r un Bacte r ial sequencing r un 0 Archaea Bacte r ia Euka r y ota Archaea Bacte r ia Euka r y ota log10 percent reads mapped to domain In vitro r imers A B

PAGE 141

! $&) ! Figure S2.4 Many bacterial and archaeal OTUs are shared across sites. Occurrence of most OTUs over multiple sites suggests that dispersal limitation is not responsible for differences in community structure. V enn diagram showing shared archaeal (top) and bacterial (bottom) OTUs across the four most spatially distant sites. The most distinct site (O3) harbors a small proportion of unique OTUs. M2 transect 2 mud; O2 transect 2 water cover; M3 transect 3 mud; O3 transect 3 water covered. The total number of OTUs at each site is shown in parentheses.

PAGE 142

! $&* ! Figure S2.5 Similarity of microbial communities within replicate cores. Shown is NMDS ordination of Bray Curtis dissimilarity for soil archaeal (top) and bacterial (bottom) communities. Identical single ordinations are shown as in Figure 2, with each soil type displayed separately showing similarity of samples from like depths and across transects.

PAGE 143

! $&+ ! Figure S2.6 Bacterial taxa display abundance patterns corresponding to geochemical parameters and sediment depth. Heatmap of relative abundance of bacterial OTUs summarized at order level, according to Greengenes taxonomy. Shown are those taxa with a minimum 0.02% m ean relative abundance across samples. Equivalent depth samples are grouped together within each site, and presented in order of increasing soil depth left to right. Names of taxa emphasized in text are shown in bold.

PAGE 144

! $&, ! Figure S2.7 Habitat con servation despite high phylogenetic diversity among Woesearchaeota . Maximum likelihood tree of Woesearchaeota 16S rRNA gene sequences in the ARB guide tree (Silva SSU Ref NR99 v. 123) and OWC Woesearchaeota OTUs. Available Woesearchaeota genomes are larg ely restricted to one part of the tree, and all are adjacent to OWC OTUs enriched in deep soils. Black circles show those OWC OTUs, which were estimated to be reported as a result of a simulated V4 sequencing and OTU picking, severely reducing the visible diversity within this phylum. Of the 157 Woesearchaeota V3 V6 OTUs, 26 had a best blast hit greater than 97% nucleotide identity to one of 11 V4 OTUs (black dots). Shallow enriched Sulfate Deep enriched Fe (II) Available genome AR20 genome OTU detectable with V4 protocol Abundance fold-change shallow vs. deep soils 1 59 Abundance fold-change deep vs. shallow soils 1 137 Spearman correlation -0.4 0.6

PAGE 145

! $'! The following supplementary files are permanently hosted online at: DOI: 10.1111/1462 2920.13703 Narrowe, A.B., Angle, J.C., Daly, R.A., Stefanik, K.C., Wrighton, K.C., and Miller, C.S. (2017) High resolution sequencing reveals unexplored archaeal diversity in freshwater wetland soils. Environ. Microbiol. 19 : 2192 Ð 2209. Table S2.1 Geochemical measures and sample metadata Table S2.2 Ð Summary of sequencing data Table S2.3 Mantel test results Table S2.4 PERMANOVA results clustering by depth vs. core for each site Table S2.5 PCR primer sequences and reaction conditions Supplemental file 2.1 Archaeal OTU table Supplemental file 2.2 Bacterial OTU table Supplemental file 2.3 Archaeal OTUs fasta Supplemental file 2.4 Bacterial OTUs fasta Supplemental file 2.5 Commands for analyses in R

PAGE 146

! $'$ ! APPENDIX B Supplementary Material For Chapter I II Supplementary Fig. S3.1 Phylogenetic analysis of at least 11 out of 15 concatenated archaeal ribosomal proteins (2416 AA) based on maximum likelihood analyses performed with IQ tree . The diverse metagenome assembled genomes (MAGs) belonging to Asgard archaea and included in our analyses are shaded in colors according to phylum. MAGs that were not part of the initial description of the Asgard superphylum ( Zaremba Niedzwiedzka K, Caceres EF, Saw JH, Backstrom D, Juzokai te L, Vancaester E, Seitz KW, Anantharaman K, Starnawski P, Kjeldsen KU, Stott MB, Nunoura T, Banfield JF, Schramm A, Baker BJ, Spang A, Ettema TJ. Nature 541:353 358, 2017, doi:10.1038/nature21031) are shown in boldface. Naming of the respective archaeal groups based on a recent suggestion by Adam et al. (ISMEJ 11:2407 2425,2017,doi:10.1038/ismej.2017.122). Branch support values are based on ultrafast bootstrap approximation as well as single branch tests, respectively. Scale bar indicates the n umber of su bstitutions per site. 0 . 2 Lo k iar c haeo t e CR _ 4 Lo k i arc h ae o te ABR 0 4 Lo k i arc h ae o te ABR 0 6 B a t hyarchaeo t a M e t han o f a s t idios a / T heionarcha e a A enigmar c haeo t a Lo k i arc h ae o te ABR 0 8 T h o r a r c hae o t e A B _ 2 5 K orar c haeo t e M A G 3 Di a f o r a r c ha e a Me t hanococ c i Lo k i arc h ae o te ABR 1 1 Ca . K orarchaeum cr y p t o f ilum O P F 8 Me t hano t e c t a Tho rarc h ae o te OW C_B i n 3 Crenarchae o t a / G eoarchae o t a / V er s t ra e t earchae o t a M e t hanob a c t e r i a / M e t hano p y r i K orar c haeo t e M A G 2 Heimdallar c haeo t e AB _1 2 5 He i m d a ll arc h ae o te ABR 1 4 O dinarchae o t e LC B 4 Lo k i arc h ae o te ABR 0 5 Diaphero t ri t es / Mi c rarchaeo t a Tho rarc h ae o te ABR 1 0 K orarchaeo t e LHC 4 Heimdallar c haeo t e LC _ 3 K orar c haeo t e M A G 1 Lo k i arc h ae o te ABR 0 1 Lo k i arc h ae o te ABR 0 2 Tho rarc h ae o te OW C_B i n 2 Tho rarc h ae o te OW C_B i n 5 A l t iar c ha e a Sty g i a He i m d a ll arc h ae o te B 3 A ig/T haumarchaeo t a Lo k i arc h ae o te ABR 1 5 Ca . T h o r a r c hae o t a a r c haeon S M T Z 1_ 8 3 Nanoarchaeo t a / P ar v ar c haeo t a Nanohaloar c haeo t a Lo k i arc h ae o te ABR 1 3 T hermococ c i Heimdallar c haeo t e LC _ 2 W o e s e / P a c e a r c hae o t a Lo k iar c haeum G C14_7 5 100 / 1 0 0 100 / 1 0 0 10 0 / 9 8 100 / 1 0 0 100 / 1 0 0 9 9 . 5 / 1 0 0 100 / 1 0 0 9 3 . 6 / 9 8 9 9 . 9 / 9 9 9 8 . 8 / 9 7 100 / 1 0 0 9 3 . 4 / 1 0 0 9 9 . 7 / 1 0 0 100 / 1 0 0 100 / 1 0 0 100 / 1 0 0 100 / 1 0 0 100 / 1 0 0 9 1 . 5 / 9 8 9 9 . 9 / 1 0 0 9 9 . 5 / 9 3 100 / 1 0 0 100 / 1 0 0 100 / 1 0 0 100 / 1 0 0 9 9 . 6 / 1 0 0 100 / 1 0 0 9 9 . 5 / 1 0 0 100 / 1 0 0 9 9 . 8 / 1 0 0 100 / 1 0 0 9 7 . 5 / 1 0 0 9 2 . 8 / 9 9 9 9 . 8 / 1 0 0 9 9 . 8 / 1 0 0 100 / 1 0 0 100 / 1 0 0 9 9 . 1 / 1 0 0 100 / 1 0 0 9 9 . 9 / 1 0 0 9 9 . 7 / 9 9 9 8 . 8 / 1 0 0 96 / 9 7 9 9 . 8 / 1 0 0 8 2 . 5 / 9 7 9 9 . 9 / 9 9 Lo k iarchaeo t a T horarchaeo t a O dinarchaeo t a Heimdallar c haeo t a

PAGE 147

! $'% ! Supplementary Fig. S3.2

PAGE 148

! $'& ! Supplementary Fig. S3.2 Multiple sequence alignment of archaeal and eukaryotic EF 2 and EF 2 paralogs showing domain IV sequence motifs. (a) Multiple sequence alignment of a selected set of EF 2 from representative organisms, showing domain IV sequence motifs as in Figure 3.3. Bona fide EF 2 homologues are shaded in grey. Organisms lacking diphthamide biosynthesis genes are indicated with 'a'. (b) Diphthamide modification motifs are not conserved in parabasalid EF 2. EF 2 sequences were mined and aligned from representative genomes or transcriptomes from each of the major lineages of eukaryotes and diphthamide interacting residues are colored. Here, we show a representative subset of eukaryotes, all surveyed genomes can be found in File S3.1. Eukaryotic relationships are shown with a schematic cladogram. Bona fide diphthamidylated EF 2 sequences are shaded in purple. Boxed region indicates the region that is not conserved in most parabasalids. Parabasalid EF 2 paralogs with unsubstituted diphthamide modification motifs are shaded in yellow. All parabasalids to not encoded diphthamide biosynthesis genes as indicated with the Ôno DPH' icon. SAR, Stramenopilia, Alveolata, Rhizaria; DPH, diphthamide biosynthesis genes. The Pentatrichomonas hominis and Tetratrichomonas gallinarum sequences were retrieved by assembling the sequencing projects available at the indicated SRA accession numbers. Sequences and assembly are available upon request.

PAGE 149

! $'' ! Supplementary Fig. S3.3 EF 2 gene from Dph lacking Trichomonas vaginalis shown aligned to D. melanogaster eEF 2 structure. (a) Panels are as in Figure 3. T. vaginalis EF 2 fits closely to D. melanogaster structure (RMSD of 1.589  across all 830 residues). While overall structure is maintained, certain key residues in domain IV loops are not conserved. (b) Structure of the three last amino acids comprising the diphthamide loop in EF 2 of T. vaginalis com pared to canonical eukaryotic EF 2. The amino acids comprising the DRG motif of canonical EF2 (with D referring to diphthamide) have a backbone highly similar to the HRN motif of T. vaginalis (with the histidine being not modified to diphthamide). The muta tion of the canonical G to N, which provides an amide group, may compensate for the lack of the modification of the histidine. Trichomonas vaginalis G3 XP_001321791_1 H A D AA H R N D. melanogaster eEF-2 PDB : 4v6w Az 0 1 2 3 4 bits S Q A P N K L H N G703 D698 H701 H696 K584 P582 H585 diphthamide loop second loop SAN K LN H693 D695 H698 N700 K581 L582 A579 diphthamide loop second loop * 0 1 2 3 4 bits H G S T A D S A L I H R G 0 1 2 3 4 bits S A Q P N K H N ! " T richomonas (H R N) Canonical EF2 (H dph R G) G: Gl y cine NH 3 + O O EF2 Dip h thamide NH 2 N + O N N H NH 3 + O O EF2 H: H istidine N NH 3 + O O HN EF2 H 2 N O NH 3 + O N: A spa r a g ine O EF2 Dph + Eukaryota canonical eEF-2 Dph Trichomonas

PAGE 150

! $'( ! Supplementary Fig. S3.4 A universally conserved EF 2 domain IV salt bridge is replaced by conserved correlated mutations in EF 2p containing genomes. (a) EF 2 from the D. melanogaster EF2 cryo EM structure ( Anger AM, Armache JP, Berninghausen O, Habeck M, Subklewe M, Wilson DN, Beckmann R. Nature 497:80 5, 2013, doi:10.1038/nature12104) shows that Glu660 and Arg702, which are universally conserved in all archaeal and eukaryotic genomes lacking aEF 2p (Figure 3), form a salt bridge that stabilizes the diphthamide containing loop of domain IV. (b) Representative modeled EF 2 structure of T horarchaeota OWC Bin 2, with correlated mutations to Arg557 and Thr599 highlighted. (c) Thr599 is conserved in all EF 2p containing genomes, and the correlated mutation at the Arg557 position is almost always positive or polar. Glu660 Arg702 His701 His585 Arg557 Thr599 His486 His598 3.81  4.88  a b c !"#$$% &'()%* +,-./.01,234 567/.35,54 -0/.806 9:*4-0'0";(4,74 (37;13 9#<0'6;/35 9 = ! " &'>.030 9?+ = ! " @3,120""0'>.03;/04ABC 9 = ! " D ',>.;1;705 9 = " " E;'0'>.03#14*?F?>'6-/;G,"#14H4!3;0'>.030 9?+ = " " E;'0'>.03#14I?C = D " ! &5(0'2 E?= D " ! @3,120"" 4 &JI*K4LAMCC*NC O D " !

PAGE 151

! $') ! Supplementary Fi g. S3.5: Multiple sequence alignment showing conservation of GTP binding region motifs in Asgard aEF 2 and EF 2 paralogs. Multiple sequence alignment of eEF 2, eEF 2 paralogs, aEF 2, aEF 2 paralogs and bacterial EF G. Conserved GTP binding motifs G1 G5 are shown in color in the alignment. Archaeal 60% consensus motif sequences as identified by Atkinson ( BMC Genomics 16:78, 2015 , doi:10.1186/s12864 015 1289 7) are shown outside the alignment and residues associated with cation binding are shown in red. !"#$$%&'(#%#)*+*#,-./01,2%3(%& 4 "#$$56'6('7#%#89-:/;<,#=/,.*:<,#>-:#=/,.*:<,#)&72%3(%5 ?@A%'&'5#%#A:-+0B0,#.<=0,;01,2%3C$% DE"(F&75#%#G/=-9/1-H/,#NC2%3(%' 4 "#$$%C%7%C&#%#A:/;I*+*1-,#>-9/1-H/,#NC2%3(%' O8P&&$F7#%#)*+*#,-./01,2%3($' 4 "#$$565(5($#%#89-:/;<,#=/,.*:<,#>-:#=/,.*:<,#)&72%3('$ 4 "#$$F$C7F7C#%#A:-+0B0,#>0:,/;*H*:#G"#%$%556#MM%2%3(66 D4!7$$$7#%#K*1/Q/*=*H<,#;*:*1-B<,#!RRS#(F5CF2%3(6C L8GF'7&7#%#)*+*#,-./01,2%3C%6 4 "#$$56'C&F7#%#89-:/;<,#=/,.*:<,#>-:#=/,.*:<,#)&72%3C(F KT?7C%C%#%#A:-+0B0,#;/11-=-:/1-2%3C(' DE"(F&C(#%#G/=-:#=/,.*:<,#)&72%3(%5 ?@A%'&'5#%#A:-+0B0,#.<=0,;01,2%3C$% DE"(F&75#%#G/=-9/1-H/,#NC2%3(%' 4 "#$$%C%7%C&#%#A:/;I*+*1-,#>-9/1-H/,#NC2%3(%' O8P&&$F7#%#)*+*#,-./01,2%3($' 4 "#$$565(5($#%#89-:/;<,#=/,.*:<,#>-:#=/,.*:<,#)&72%3('$ 4 "#$$F$C7F7C#%#A:-+0B0,#>0:,/;*H*:#G"#%$%556#MM%2%3(66 D4!7$$$7#%#K*1/Q/*=*H<,#;*:*1-B<,#!RRS#(F5CF2%3(6C L8GF'7&7#%#)*+*#,-./01,2%3C%6 4 "#$$56'C&F7#%#89-:/;<,#=/,.*:<,#>-:#=/,.*:<,#)&72%3C(F KT?7C%C%#%#A:-+0B0,#;/11-=-:/1-2%3C(' DE"(F&C(#%#G/=5 43 +; 22:3 6-7 03 6+ 57 333 +8 325 38 3< 6: 63 70 3 7 +02 6. / 0.1 23 /43 +45 ,2 67 6+ 27 32,-5 9/ /29 9 -,-5 +1 0; 6+ 60 :+ :,:0 5/ 6/ ,2 6/2 1= ;: 39 + ,:: /6 2/ <<, 6: 5. :; 7/ .7 /+ 2>5 43 +; 22:3 6-7 03 6+ 57 333 +8 3:5 38 3< 6: 63 70 <79 6: 0/ ,>33 +0> 6. / 0.1 23 /43 +45 ,2 67 6+ 27 32,-5 9/ /59 9 2,-5 +1 0; 6+ 6 0++ :,:0 5/ 6/ ,2 6/2 1= ;: 39 + ,:+ /5 -/ ,< ,6 :5 6: ;7 /. 7/ + 2>5 43 +; 22:3 6-7 03 6+ 57 333 +8 3:5 38 3< 6: 63 70 <79 6: 0/ ,>33 +, >6 ./ 0.1 23 /43 +45 ,2 67 6+ 27 32,-5 9/ /29 9 -,-5 +1 0; 6+ 60 :+ :,:0 5/ 6/ ,2 6/2 1= ;: 19 + ,:+ 7: -/ ,< ,6 +5 6: ;7 /. 7/ + 2>5 43 +; 22:3 6-7 03 6+ 57 333+ 8/ +5 38 3< 6: 63 70 <79 6: 0/ ,>33 ., 04. /0 .7 23 /43 +45 ,2 67 6+ 27 /059 // 2< 99 :. -5 :1 0; 6+ 60 >+ :< :0 8/ 6/ ,2 65 32 7= =6 19 >,:+ 7 999 >: +.3 +.5 ;7 /. 7/ + 2>5 4/ +; 2-:3 6-7 03 6+ 57 333+ 8/ :5 38 3< 6: 63 70 <79 2:0 / ,>33 ., 04. /0 .7 23 /43 +45 ,2 67 6+ 27 /059 // 2< 99 :. 5<1 01+ 60 :+ :< :0 8/ 6/ ,2 65 /2 7= =6 19 >. :: / 999 >-+ 2:5 .5 ;7 /. 7/ + 2>5 4/ +; 2-:3 6-7 03 6+ 57 333+ 8/ :5 38 3< 6: 63 70 <79 2:0 / ,>33 ,. 6./ 0. /8 37 -4 3 +45 , 667 -+ 87 / 22. 59 // 22 99 07 -5 ,7 0= 1+ 20 :+ :< /0 5/ 61 ,22/2 74 =6 9999999999999 5. ::= 7/ .7 /+ 2>5 43 +; 22:32 6 -30 / 8+5 8/ / 33+ -3:5 38 >< 6< -3 70 37 <+ >0 ./ 0/ /6 61 54 3 +45 , 667 1+ -7 7 --. .9 // 22 99 01 -5 ,7 0= 7+ 20 :+ :< :0 5/ 61 :22-32 70 ;< 39 1: 0++ 65 999 5, /> ,6 =6 /. 1/ +6 >5 43 +; 22:32 6 -20 73 +5 -7 37 3+ 33:5 38 6< 6/ -3 70 /7 :-3+ 3 0.3 66 /5 43 +45 ,6 6 11+ -7 7 --. 29 // 26 99 01 -5 ,/ 0= 7+ 20 :+ :< :0 5/ 61 :22-32 70 ;. 39 1: 06 2:5 999 :20 >, 6 =33. 1/ +6 >5 43 +; 22:32 6 -20 7 8+5 -7 37 3+ 33:5 38 6 <6 /6 37 0< -? 9< +0 70 >/ 7 2. >2 7/ 0. 78 /7 -4 3 +45 , 667 2+ 27 7 -2. 59 3/ 22 99 0/ -5 ,30 =7 +2 0: +: <7 05 /6 1 :22/2 74 ;> 39 076 ,+ 5 999 66 : +0+= 7/ .7 /+ 2>5 43 +; 25 :322-20 / 8+5 -7 37 3+ 23:5 38 6< 64 63 70 / 7 +.2 :7 / 0.3 67 85 47 445 ,6 8; 3+ 87 /: <6 4> :/ 0, 99 0= +< +7 8= 6+ /7 ;6 :< :0 53 5/ ,2 6> 36 33 7> +6 , 999999999999 5 ,2= 7; ./ 1+ 6> 54 3. ;2 +: 36 -5 70 /2 +5 33 7; /+ --:5 31 7. 6: 07 /, 439 <: 07 -3 63 ; >:1 / 0. 3-335 47 445 ,6 -7 1+ 17 3; :6 49 ,7 /? 99 ++, >6 0= 6+ 64 37 20 :0 :/ 2/ ,22>1 27 37 <6 62 999999999999 5 ,24 77 47 /+ 6> 54 3. ;7 + :3--/0 7+5 /7 7 33+ 33:5 71 35 -:= // 0479 <: .7 ,/ 67 ;> :; 3 0. 3-335 47 445 ,6 -7 1+ 17 3; :6 49 ,7 +? 99 +2 +< 1 27 /7 2. 2= 999999999999 5 ,24 7/ 47 /+ 6> 54 3. ;3 + :3-8 -/ 07 3+ 5/ /7 33+ 33:5 31 8. 6: -/ / 0479 <: 53 ,/ 67 .; >:1 / 0. 3-335 47 445 ,6 -7 1+ 17 3; :6 49 ,7 6? 99 ++6 >30 =6 +6 4/ 72 0: 0, /2 / ,22>1 27 37 26 62 999999999999 5 ,24 7/ 4; /+ 6> 54 3. ;3 + :3-221 07 3+ 5/ 37 33+ 3/ :5 71 35 6: -/ / 0479 <: ./ ,/ 67 :=>: 7/ 0.3 6 3-5 47 445 , 66;7 +1 71 :: 64 >6 7+ 79 96 <: ./ ,0 ;6 += 4. 7: 0< 05 82 / ,22> /2 777 >+ 7, 999999999999 2, 64 7/ 47 /+ 6> 54 3. ;/ +: 36 -2 70 32+ 53 37 /3 +/: 53 18 .6 ,4 // :4 -/ 92 7. 1> /3 7 1 +0 ,=30 .7 5/ /4/ +45 ,2 67 -+ 27 7 --25 97 3,+ 99 +/ -5 006+ 60 :+ :< :0 5/ 6/ ,6 65 /2 74 4/ =9 ,5 5 999999 ., / >:5 .= 7/ .7 <+ 6> 54 3+ ;2 5: 36 -70 33+ 57 333+ -3:231 3< 6: 6 332< -7 9< :4 30 >3 7 / 40: ;3 0. 75 3/ -4 / +45 ,2 67 6+ 27 7 --25 97 7+9 9, /5 :-0 -6 +6 0++: <: 05 /6 /, 66 5/ 27 34 :7 9. 999999999999 ,< 6= 73 .7 <+ 6> 54 3+ ;2 5: 36 270 33+ 57 333+ -3:5 31 3< 6 :33 6 <<79 -:0 30 >3 7 0. >-< / 0.56 743 +45 , 666 2+ 27 71 559 77 2> 99 ,3-5 ,71+ =3> /: <7 0< 16 3,--. /2 7= ;: =9 5 999999999999 5 ,>= 7/ .; 3+ 6> 54 3+ ;6 54 36 02 70 31 +5 57 333+ -3:5 31 6< 6: 6 330 <79 ::=30 >3 7 ,. ,+ 9 9 2--5 ,71+ ;+ ,:: <<0 5/ 6/ =< -. 36 77; << 9, 999999999999 + ,:=3 /. 1/ +6 >5 43 +; 25 03 /0 27 0/+ 5 -333 8+ -3:5 /1 6< 6: 63 60 179 ::0 3,>3 7 =< >,= /0 ./ 5/ 3-4 / +45 , 667 2+ .7 7559 1/ 22 99 :7 -5 +< 0; 7+ ;+ :< :< -0 5/ 6/ + --. 321 34 ,= 9, 999999999999 +: += 7/ .7 /+ 6> 54 3+ ; 55+ 36 010 -3+ 5 -33338 -3:5 /1 >< 6: 63 70 <79 ,:. 30 >3 7 4+< :/ 30 .6 2/ /43 +45 , 667 2+ 27 7 2--5 9/ /2 :9 9, 65<, 7; 7+ 2? :7 :< ,0 <1 63 ; -2. /2 73 46 79 , 999999999999 :,:= 7/ .7 /+ 6> 54 /+ ;2 536 02 70 -3+ 57 333+ -3:5 >1 6< 6: 63 71 <79 0: 0 3,> /7 ..> :/ /0 .6 2/ /43 +45 , 667 2+ 27 7 ---5 9/ /2 ,9 9, 65<, 7; 7+ 2? +7 :< ,0 <1 63 ; -2. 32 746 ;9 < 999999999999 +, +; 7/ .7 /+ 6> 54 /+ ;2 536 02 70 -3+ 57 333+ -3:5 >1 6< 6: 63 71 <79 0: 0 3,> /7 0+> 0. /0 ./ 5/ /43 +45 , 667 2+ .7 7 2--5 91 /2 +9 9, 15: 10 -7 += 4: /: <<0 5/ 6/ ,--. /2 7 ==< 09 + 999999999999 5, :; -/ .7 3+ 6> 54 /+ ;6 54 36 02 70 3/ +5 -/ 333+ 23::31 3< 6: 63 60 <79 ::0 30 >7 7 0+> 0. /0 ./ 5/ /43 +45 , 667 2+ .7 7 2--5 91 /2 +9 9, 15: 10 -7 += 4: /: <<0 5/ 6/ ,--. /2 7 ==< 09 + 999999999999 5, :; -/ .7 3+ 6> 54 /+ ;6 54 36 02 70 3/ +5 -/ 333+ 23::31 3< 6: 63 60 <79 ::0 30 >7 7 7+ >. ./ ,. /5 // -4 / +45 , 667 2+ .7 7 22-5 91 /2 69 9, 15: 10 ?7 += 4: /: <05 /6 / ,--. 32 7= ;< -9 + 999999999999 5 ,:=2 /. 73 +6 >5 4/ +; 65 43 60 27 0/+ 5/ 333+ -3::31 3< 6: 63 60 <79 ::0 30 >7 7 999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 1/ +5 -/ 333+ -3::31 3< 6: 63 60 <79 ::0 30 >7 7 999999999999999999999999999999999999999999999999999999999999 ;6 233 77 -; .+ 9> 0 9999999 ,>+ :::>= // <7 .+ 6> 54 /2 ;6 5 :320 -7 05 2+ 5// 73 +7: 53 16 <6 :6 ./ 07 235 :: ;8 ,>3 7 5 2=:,,0 /; 2/ 64/ +45 ,6 66+ =7 7 0059 71 0> 99 :+ -5 <7 <1 6+ 2 ++ ::< :0 5/ 6/ ;6 233 77 -; .+ 92 0 9999999 ,> ++ ::>= // <7 .+ 6> 54 /2 ;6 5 :320 -7 05 2+ 5// 73 +7: 53 16 <6 :6 ./ 07 235 :: ;8 ,>3 7 5 2=:,,0 /; 2/ 64/ +45 ,6 66+ =7 7 0059 71 0> 99 :+ -5 <7 <1 6+ 2 ++ ::< :0 5/ 6/ ;6 233 77 -; <+ 9> 0 9999999 ,6 +++ :>= // <7 .+ 6> 54 /2 ;6 5 :320 -7 05 2+ 5// 73 +7: 53 16 <6 :6 ./ 07 235 ::=8 ,>3 7 5 2=:,,0 7; 2/ 3-4 / +45 ,6 6 -2+ =7 7 0059 71 0: 99 :+ -5 <7 <1 6+ 2+ :::< -0 5/ 6/ ;6 233 77 -; .+ 9> 0 9999999 ,> +++ :>= /7 54 /2 ;6 5 :320 -7 05 2+ 5// 73 +7: 53 16 <6 :6 ./ 07 -35 ::=8 ,>3 7 52 ; :,,0 7; 2/ 3-4 / +45 ,6 6 -2+ =7 7 0059 71 0: 99 :+ -5 <7 <1 6+ 2+ :::< -0 5/ 6/ ;6 233 77 -; .+ 9> 0 9999999 ,> +++ :>= /7 54 /2 ;6 5 :320 -7 05 2+ 5// 73 +7: 53 16 <6 :6 ./ 07 235 ::=8 ,>3 7 999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 1 6< 6: 6. /0 7 235 :: ;8 ,>3 7 9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 99999999999999999999 9999999999999999999999999 77 0059 71 0: 99 :+ -5 <7 <1 6+ 2+ :::< -0 5/ 6/ ;6 233 77 -; .+ 9> 0 9999999 ,> +++ :>= /7 54 /2 ;6 5 :320 -7 05 2+ 5// 73 +7: 53 16 <6 :6 ./ 07 235 :: ;8 ,>3 7 9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 99999999999999999999 5 >=:,,0 /; 2/ 3-4 / +45 , 6666 += 77 :0 -5 97 10 :9 9: +5 777 -6 +2 + :::< :0 5/ 6/ ;6 233 77 -; .+ 92 0 9999999 6> ++ ,:>= /; 54 /2 ;6 5 :320 -7 0 -2+ 58 // 73 +7: 53 16 <6 :6 ./ 07 235 :: 78 ,>3 7 5 2= +0, 03 ;2 // -4 / +45 ,6 66+ ;7 7 0059 71 0> 99 :+ -5 <7 <1 6+ 2+ :::< -0 5/ 6/ ;6 233 77 -; .+ 97 0 9999999 : << :,:>= /7 54 /2 ;6 5 :320 -7 05 2+ 5// 7/ +7: 53 16 <6 :6 ./ 07 -35 ::=8 ,>3 7 6+ =2 7, 0/ ;2 /6 -4 / +45 , 666 2+ =7 7, 059 71 0> 99 :+ -5 37 86+ ;+ < ::, +05 /6 /; 64 33 77 -; :8 9< 999999999999 5, <= /; <3 .+ 6> 54 /2 ;6 5 :320 -7 05 2+ 5/7 73 +7: 53 16 <6 :6 ./ 07 -35 ::1 80> 37 ++ ;: 4, 0/ /2 / 2-4 / +45 , 6666 += 71 0059 71 2+ 99 --5<71 6+ 2+ :::< :0 5/ 6/ ;6 233 77 .; :3 9: 999999999999 5 ::= 73 <7 2+ 6> 54 72 ;6 5 :320 -7 0 -2+ 53/ 73 +7: 53 16 <6 :6 ./ 07 -3 55 :-8 ,>3 7 ++ =-4 ,0 // 2/ 2-4 / +45 , 6666 += 77 0059 71 2+ 99 -65 ,71 6+ 2+ -::< :0 5/ 6/ ;6 2331 74 ;: 79 : 999999999999 5 ::= 73 <7 2+ 6> 54 72 ;6 5 :320 -7 0 -2+ 53/ 73 +7: 53 16 <6 :6 ./ 57 -35 2:-8 ,>3 7 ++ =:4 ,0 // 2/ 2-4 / +45 , 6666 += 71 0059 71 2+ 99 --5<71 6+ 2 ++ ::< :0 5/ 6/ ;6 233 77 .; +3 9+ 999999999999 5 ::= 73 <7 2+ 6> 54 72 ;6 5 :320 -7 0 -2+ 53/ 73 +7: 53 16 <6 :6 ./ 07 -3 55 :-8 ,>3 7 .+ =-4 ,0 // 2/ 2-4 / +45 , 666 2+ ;7 70 2-5 97 12 +9 9 --5<3 7 11 22+ ? ::< ,0 5/ 6/ ;6 233 77 .8: 39 + 999999999999 5 ::= 7/ <7 2+ 6> 54 72 ;6 5 :320 -7 0 -2+ 53 3/ 73 +7: 53 06 <6 :6 ./ 07 23 55 :80> 37 ++ =-4 ,0 // 2/ 2-4 3 +45 , 666 2+ =7 7 0459 71 2+ 99 --5<3 71 6+ 2+ ? :: <<0 5/ 6/ ;6 233 77 :8 :3 9+ 999999999999 5 ::= 7/ <7 2+ 6> 54 72 ;6 5 :320 -7 0 -2+ 53 3/ 73 +7: 53 06 <6 :6 ./ 07 23 55: 58 ,>3 7 8 >=:< 00 /; 2/ 64/ +45 ,6 61 6+ =7 7, 059 71 2+ 99 + --5 13 71 6+ 2+ >::< 70 ./ 6/ ;6 23-< 7 2=:= 9, 999999999999 5 ,:= 7; 54 /2 ;6 5 :320 -7 05 2+ 53/ 73 +7: 53 16 <6 :6 ./ 07 -35 -: 78 ,>3 7 4> 1: 7 003 ;6 / --4 / +45 , 666 2+ =7 70 ,-5 97 12 +9 9+5 ,,3 11+ 2+ :::< :0 5/ 6/ ;6 2331 7. =:= 9: 6 9999999 ,:5 :,. 2= 7; :/ .+ 6> 54 /2 ;6 5 :320 -7 05 2+ 53 /7 73 +7: 53 16 <6 :6 ./ 07 27 9. :,8 ,>3 7 999 :7 00 /; 2/ 2-4 / +45 ,2 66 -+ =7 7 2,-5 97 12 -9 9: 65 :,0 16 +2 + :::< :0 5/ 6/ ;66 33 /7 2=:= 9: 66 +: 999 5. 60 :+ 6= 7; 54 72 ;6 5 :320 -7 05 2+ 57/ 73 +7: 51 1 6< 6: 6. /< 7 -35 :: 08 ,>3 7 999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 1 6< 6: 6. /0 7 -35 ::-8 ,>3 7 99999999999999999999999999999999999999999999999999999999999999999 17 2=+ =9 < 999999999999 5<. =7 ;< 7. +6 >5 4/ 2; 65 :320 -7 05 2+ 53 37 73 +7: 53 16 <6 :6 ./ 07 -35 :: 08 ,>3 7 :, 6: ./ 0. ;5 73 54 / +45 , 667 2+ 27 7 2:-5 9; 72 >9 9+ 75 :-0 -7 +; 7 :::< 005 /6 1 ,2-. /2 7 ==:, 92 7: 999999999 5 4+> ;7 /. 73 +6 >5 47 +; 25 ,3 60 -7 07 3+ 5 33333+ -3::3 /6 <2 :6 3/ ,< -7 9< :5 3,>3 7 :+ >6 0/ 081 -7 35 4/ +45 , 667 2+ 27 7 2:-5 9; 72 >9 9+ 75 :-0 -7 +; 7 :::< 005 /6 1 ,2-. /2 7 ==:, 92 7: 999999999 54 :> ;7 /. 73 +6 >5 47 +; 25 ,3 60 -7 07 3+ 5 33333+ -3::3 /6 <2 :6 3/ ,< -7 9< :5 3,>3 7 :7 6, ./ 0. 6754 / +45 , 667 2+ 27 7 2:-5 9; 72 29 9+ 75 :-0 -7 += 7 ::: <<0 5/ 61 ,2-. /2 7 ==:, 92 7: 999999999 54 :> ;7 /. 73 +6 >5 47 +; 25 ,3 60 -7 07 3+ 5 33333+ -3::3 /6 <2 :6 3/ ,< -7 9< :5 3,>3 7 :, 6< ./ 0. /5 ;3 54 / +45 , 667 2+ 27 7 2:-5 97 72 >9 9+ 75 :-0 -7 += 7 :::< 005 /6 1, 26 ./ 27 ==:, 92 7, 999999999 +4+> ;7 /. 73 +6 >5 47 +; 25 ,3 60 -7 07 /+ 5 33333+ -3:: // 6< 2: 63 /, <79 <: 5 3,>3 7 :, 6< ./ 0. /5 ;3 54 / +45 , 667 2+ 27 7 2:-5 97 72 >9 9+ 75 :-0 -7 += 7 :::< -0 5/ 61 ,2 6. /2 7 ==:, 92 7: 999999999 +4: >; 7/ .7 3+ 6> 54 7+ ;2 5, 36 070 7/ +5 33333+ -3:: // 6< 2: 63 /, <79 <: 5 3,>3 7 :, 6: ./ 0. /5 ;3 54 / +45 , 667 2+ 27 7 2:-5 97 72 >9 9+ 75 :-0 -7 += 7 :::< ,0 5/ 61 ,2 6. /2 7 ==:, 96 7: 999999999 4+> ;7 /. 73 +6 >5 47 +; 25 ,3 60 -7 07 /+ 5 33333+ -3:: // 6< 2: 63 /, <79 <: 5 3,>3 7 :,-:. /0 ./ 5; 35 4/ +45 , 667 2+ 27 7 2:-5 97 72 >9 9+ 75 :-0 -7 += 7 :::< ,0 5/ 61 ,2 6. /2 7 ==:, 92 7: 999999999 4+> ;7 /. 73 +6 >5 47 +; 25 ,3 60 -7 07 /+ 5 33333+ -3:: // 6< 2: 63 /, <79 <: 5 3,>3 7 :, 6./ 0. /5 ;3 54 / +45 , 667 2+ 27 7 2:-5 97 72 >9 9+ 75 :-0 -7 +; 7 :::< ,0 5/ 61 ,2-. /2 7 ==:, 92 7: 999999999 54 :> ;7 /. 73 +6 >5 47 +; 25 ,3 60 -7 07 3+ 5 33333+ -3:: // 6< 2: 63 /0 <79 <: 5 3,>7 4 ,3:, /0 ./ 57 35 4/ +45 , 667 2+ 27 7 2:-5 97 72 >9 9+ 75 :-0 37 += 7 :::< 005 /6 1, 26 ./ 27 ==:, 93 75 999999999 0.: :; 7/ .7 3+ 6> 54 7+ ;2 5, 36 070 7/ +5 3/ 3/ 3+ -3:: // 6< 2: 63 /, <79 <: 5 3,>3 7 :, 6< ./ 0. /5 ;3 54 / +45 , 667 2+ 27 7 2:-5 9; 72 >9 9+ 75 :-0 -7 += 7 :::< -0 5/ 61 ,2-. /2 7 ==:, 92 7: 999999999 54 :> ;7 /. 73 +6 >5 47 +; 25 ,3 60 -7 07 /+ 5 33333+ -3:: // 6< 2: 63 /, <79 <: 5 3,>3 7 6. <, ,/ 0. /3 7 3-4 / +45 , 667 2+ 27 7 32-5 93 7>9 92 /5 :-0 -7 +; 7 >::< 005 /6 1, 6./ 27 / 3,, 9: 999999999999 +: :; 73 .7 /+ 2>5 43 +; 25 ,3-0 -7 0/ 3+ 5/3 3/ + -3:< 31 -< 6: 63 /0 <8 39 6: 5 3,> /7 :+ <2 ,/ 0.3 5/ 3-4 / +45 , 667 2+ 27 7 2-25 9= 72 +9 9: 75 230 -7 += 7 :::< 70 5/ 6/ ,--. /2 77; .= 9. 999999999999 -+ 2= 73 .7 3+ 6> 54 3+ ;6 5. 36 070 3/ +5 3/ / 33+ 23:5 31 -< 6: 63 60 <79 2:. / ,>3 7 2: 6: ./ 0.3 6/ /4/ +45 , 667 2+ 27 / --25 97 72 ,9 9 :3--2-0 77 += += /: <<0 5/ 6/ ,-25 /2 7/ 42 ;9 < 999999999999 53 +4 7/ .7 3+ 6> 54 /+ ; 224 36 05 70 76 +5 -3 / 33+ -/ :5 /1 3< 6: 63 60 <19 1: 73 0> 37 2< 6< ./ 0.3 6/ /4/ +45 , 667 2+ 27 / --25 97 72 ,9 9 :3--2-0 77 += +7 /: <<0 5/ 6/ ,-25 /2 7/ 42 ;9 + 999999999999 5: +4 7/ .7 3+ 6> 54 /+ ; 224 36 05 70 76 +5 -/ / 33+ -/ :5 /1 3< 6: 63 60 <19 1: 73 0> 37 2< 6: ./ 0.3 6/ /4/ +45 , 667 2+ 27 / --25 97 72 ,9 9+ 3--2-0 77 += +7 /: <: 05 /6 / ,-25 /2 7/ 45 ;9 + 999999999999 :,+ =7 /. 73 +6 >5 4/ +; 224 36 05 70 76 +5 -/ / 33+ -/ :5 /1 3< 6: 63 60 <19 1: 73 0> 37 :6 :0 / 0.3 63 /4/ +45 , 667 6+ 27 /+2 59 77 2, 99 :3--2-0 77 += +< /: <<0 5/ 6/ ,-25 /2 77 46 =9 , 999999999999 ,< := 7/ .7 /+ 6> 54 /+ ; 224 36 05 70 76 +5 -/ / 33+ -/ :5 /1 3< 6: 63 60 <19 <: 7 3,>3 7 9999999999999999999999999999999999999999999999 1+ =+ >/ : <<0 5/ 6/ ,-25 /2 7/ 4: /9 999999999999 5, += 7/ 47 /+ 6> 54 /+ ; 224 36 05 70 76 +5 -3333+ // :5 /1 3< 6: 63 60 <19 :: 73 0> 37 9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 99999999999999999999 :. ,< ./ 0. 323324 3 +45 , 667 2+ 47 7< -5 59 7/ 2, 99 21 -5 2-0 -7 += 7 :::< ,0 5/ 6/ ,6 -. /2 ;/ ;+ <9 : 999999999999 . :2= 7/ .7 3+ 6> 54 3+ ;2 51 32< -7 07 3+ 53 // 33+ -3:< /1 -< 6 :23 /, <2 19 ,:8 7, >/ 7 ,,,:/0 ./ 5/ 3-4 3 +45 , 667 2+ 07 7 2-5 59 3/ 2+ 99 67 -5 -7 7= 7+ =7 :3:, 0001 6 3,2-28 6; / =2+ 92 9999999999 25 ,1 07 73 .7 3+ 6> 54 /+ ;2 5 =320 27 07 /+ 2/7 33+ -3:5 31 3< 6: .= 70 779 ,:1 3,> 77 ; ,3:, / 0.3 5/ /43 +45 , 667 6+ 07 7: 52 59 // 2: 99 ,6 -5 :6 777 + =:+ 3:,: 001 6 3,--36/ >8 0; 9. 999999999999 0: +; /7 .7 3+ 6> 54 /+ ;2 54 3-0 21 0/+ 5/8 33+ -3:5 31 -< 6< 73 61 <79 ,: / 3,>3 7 5 ,,:2 /0 .7 5/ 74/ +45 , 667 6+ 27 756 59 77 2> 99 < 3-5 2-0 37 += 7 :::< ,0 5/ 67 ,6 -. /2 77 =< 69 999999999999 +5 -7 // .7 3+ 6> 54 3+ ;6 5, 36 070 -/ +5 -33 73 + -3:: /1 -< 6: 7 330 <79 ::0 30 >7 7 9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 9999999999999999999 7 ;2 7: ,6 0. /5 /1 -4 /+ -5 , 6666 :0 /7 ;= 65 90 /4 ,/ 5: 64 :5 -2< 1+ ?1 :< :< :0 5/ 6/ 6 2-66 -< ? 99999999999999999 ,5 =0 3. // +6 >5 43 +; 6 3:3:0 27 03 7+ 5 -3-3 7+ -< 25 3:>< 6: 63 ?0 <69 6= 5 3>0 /3 %&# %'# %## %(# %)# %*# &"# &$# &!# &%# &&# 11. ,1 +077 :7 <7 :>:: 7= <6 ;< 0/ 3:. 3. 3/ /2 6= 5: 5 :25 >1 99999999999999999999999999999999999999999 59 ./ 1/ +> 37 56 35 ;5 25 7 // . ,3 +077 :7 <3 + ,:2 7= <2 ;< 06 / :23. 3/ /2 6= 4+: 999 -7 99999999999999999999999999999999999999999 59 +3 < 3=>+ ,5 63 -; 52 57 3/ . ,3 +077 :7 < 32,:+ 7; <2 ;2 06 / :23. 33 /2 6= 4+999 -7 99999999999999999999999999999999999999999 59 +3 < 3=>+ <5 63 -; 52 57 // . ,3 +077 :7 < 32,:+ 7; <2 ;< 06 /: +3 .3 // 26 =4 :: 999 37 99999999999999999999999999999999999999999 59 +3 < 3=>:,5 67 -; 52 57 // ., / +02 77 :7 . -:>:+ 1= <6 =6 ,2 /+ 13 .3 // -6 =6 +: .9 5> 1 99999999999999999999999999999999999999999 59 +/ 6 32>-,5 63 -; 52 57 // ., / +02 77 :7 . -:>:+ 1= 7< =2,/+ 13 .3 // -6 =6 +: <9 5> 1 99999999999999999999999999999999999999999 59 +/ / 32>-,5 63 -; 52 57 3/ ., /+ 07 /3 :7 ,; 6> < :-=24 7, ./ 7: << 3=+ ?2 65 7: + 99999999999999999999999999999999999999999999 6 ++2 47 =; 2>: <5. 33 ;6 2/ 3/ ., ;+ 07 /6 :7 ,7 2>3:-=4 47 20 7/ :< 3. -31 5+ 7; -5 999 :0 1 .++ 70 ?0 ::0 :0 07 -2,,:< 7+ :-+ -6 3. ::+ :: ;< :, ++: +/ =; ->:,5 .3 3; -27 3/ ., ;+ 07 /6 :7 ,7 -> / :-=4 47 2< 7/ :< 3. -31 55 ;; -2 999 +01 :+ +7 0? 0 ::0 :0 07 --,,:< 4< :3+ -6 3. ::+ 9 :=< :, ++: +/ =; ->:0 5. 33 ; -2/ 3; . ,3+ 07 /6 :7 <1 2>2:-=34 /6 ,/ 7: <3 .// -2 ;;6 + 999 +/ /5 :+ 9999999 20 ,7 + -,,: <5: + 99999 3,:5 :? /7 :: 0++2 +/ =; ->:0 5. 3/ ;8 2/ 8/ ., /+ 07 /7 :7 ,7 >> 6+ -==, 70 4/ 3+ :3. 57 /2 1= 26 99999999999999999999999999999999999999999999999 +: .7 /7 2> 77 5. 38 ; 222< 33. ,/ +0 7/ 7: 70 / ,>-+ -==, /, 46 /: :/ .6 ;/ 2. /+ > 99999999999999999999999999999999999999999999999 +> :7 07 2>:. 5. 3;26 + 33. ,/ +0 7/ 7: 70 / ,>-+ -; =, /, 46 / ::3. 6; /2 5/ .> 99999999999999999999999999999999999999999999999 +> :7 07 2>:0 5. 3;26 + 33. ,/ +0 7/ 7: 70 / ,>-+ -==, /0 46 /: :/ .6 ;/ 25 /+ > 99999999999999999999999999999999999999999999999 +> :7 07 2>: 75 .3 6; -2 6+ 3/ . ,3+ 07 /7 :7 ,7 >> 6+ -= ;, 70 46 /: :/ .2 // .82 65 9999999999999999999999999999999999999999999999 ..+ 70 / 2>: 75 .3 8; 8 22< =/ . ,3+ 07 /6 :1 07 ,>:+ -=:< ;, ,/ /+ <; .6 7/ <6 =-> 6 999 :; 99999999999999999999999999999999999999999 ,,:? <3 +> 7, 5 238 ;5 2-1 // . ,3+ 07 /6 :1 01 6 >:. -=:< ;, ,/ /2 +; .7 73 :6 =->> 999 :; 99999999999999999999999999999999999999999 ,, /? 03 +> 0: 56 3;5 2/ ;/ ., /+ 07 /, :7 07 2>< :/ <<0 /7 6/ 3,+ ;. -7 /+ 1; ->> 999 :; 99999999999999999999999999999999999999999 ,+ ,? ,3+ >5 , 5<1 -1 52 -7 ;/ . ,3+ 07 /, :7 07 6 >:,1 <<7 -:332. ;. <7 /+ 6 =-:> 999 := 99999999999999999999999999999999999999999 ,:,? ,32 /< + -23 6; 5 2-, ;/ . ,3+ 07 /. :7 ,7 :5 +: 7< .0 ;7 ,/ / -2-. ,7 /, .1 ->: 999 :; 99999999999999999999999999999999999999999 ,,:? 70 3:+ 5 23;5 2-= ;/ . ,3+ 07 /, :/ ,7 6 >:: /< 0, ; -,3 /7 0/ .. 7/ :,4 ->> 999 :4 99999999999999999999999999999999999999999 ,,+ ?< 36 3:+ 50 3;5 27 ;/ . ,3+ 07 /, :/ ,7 6 >3: / <5, ;,/ /2 03 .. 7/ ,,=->+ 999 :4 99999999999999999999999999999999999999999 , 0+? 73 . 3:+ 50 3;5 27 ;/ ., /+ 07 /, :7 ,7 6> <: / <<, /7 0/ /0 +; .5 7/ :-=5 :> 999 :; 99999999999999999999999999999999999999999 0 :,? ,3,? +.+ 23;5 27 ;/ ., /+ 07 /, :7 ,7 6> <: / <<, /7 0/ /0 +; .5 7/ :-=5 :> 999 :; 99999999999999999999999999999999999999999 0 :,? ,30 ? +.+ 23;5 27 ;/ ., /+ 07 /, :7 07 6> <: /< ,, /0/ /6 <; .+ 7 3:-=-:> 999 :; 99999999999999999999999999999999999999999 0+, ? ,30 ?: .+ 63 -7 52 57 ;/ . ,3+ 07 /0 :7 ,7 6> 0 :31 :, /2 0/ /. ,1 .: 7/ :-=-:> 999 <; 99999999999999999999999999999999999999999 ,::? ,7 2? ++: 2/ 2; 52 -7 ;/ . ,3+ 07 /2 :7 ,7 2>< +6 ;,/ +, /6 0: /. :7 / ,,30 >: 9999999999999999999999999999999999999999999999 52 5? 6/ +; -,. 23-35 2-, ;/ . ,3+ 07 /2 :7 ,7 2>< <6 ;: ,/ +, /6 0 :3. :7 / ,,30 >: 9999999999999999999999999999999999999999999999 52 +? 2/ +; -,. 23-35 2-, ;/ . ,3+ 07 /2 :7 ,7 2>< +6 ;,/ +, /6 0 :3. :7 / 003 0> : 9999999999999999999999999999999999999999999999 52 +? 6/ +; -,. 23/5 2-, ;/ . ,3+ 07 /2 :7 ,7 2>< +6 ;1 ,/ +, / 2,< 3. :7 /, ,/ 0> : 9999999999999999999999999999999999999999999999 5 2,? 53 +; -,. 23-35 2-, ;/ . ,3+ 07 /2 :7 ,7 2>< -6 ;: ,/ +, /2 0< 3. :7 /, ,/ 0> : 9999999999999999999999999999999999999999999999 52 5? 63 +; -,. 23-35 2-, ;/ . ,3+ 07 /2 :7 ,7 2>< +6 ;,/ +, /2 0< 3. :7 /, ,/ 0> : 9999999999999999999999999999999999999999999999 52 5? 53 +; -,. 23-35 2-, 999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 ;/ . ,3+ 07 /2 :7 ,7 2>< +6 ;,/ +, /2 0< 3. :7 /, ,/ 0> : 9999999999999999999999999999999999999999999999 52 5? 53 +; -,. 23-35 2-, 99999999999 7, 7 2>:+ 6 =-, /+ 1/ 6, <3 .: 7/ ,,30 >: 9999999999999999999999999999999999999999999999 52 +? 5/ +; -,. 23-35 2-, ;/ . ,3+ 07 /. :7 ,7 2>,< 3=:0 /+ ,/ <. :3. :7 /, ,/ 2>, 9999999999999999999999999999999999999999999999 5/ +? 03 2; > .. 23/5 2-, ;/ . ,3+ 07 /2 :7 ,7 2>< +6 ;,/ +, /6 0 :-. :7 / ,,30 >: 9999999999999999999999999999999999999999999999 5 2,? 23+ ; -,223 63 5 2-, ;3 . ,3+ 07 /2 :7 07 <> <+ 3; -0 3+ ,/ /. :3. ,3 / :,. <> , 9999999999999999999999999999999999999999999999 +0+? 08+ ; 2,. <3 -/ 5 2-, ;/ . ,3+ 07 /. :7 ,7 >>-,3=:0 /+ /// -,3. :7 / ,,3->, 999 := 99999999999999999999999999999999999999999 -7 +? 03 2; <: 5 23/5 2-, ;3 . ,3+ 07 /. :7 ,7 >>-,3=:0 /+ /// 2,3. <7 / ,,3->, 999 := 99999999999999999999999999999999999999999 -7 +? 03 2; <: 5 23/5 2-, ;3 . ,3+ 07 /. :7 ,7 >>-,3=:0 /+ /// -,3. :7 / ,,3->, 999 := 99999999999999999999999999999999999999999 -,:? 03 2; <+ 5 23/5 2-, ;3 . ,3+ 07 /. :7 ,7 >>,,3=:0 /+ // 3-,3. :7 /1 ,6 ->999 <; 99999999999999999999999999999999999999999 5, +? 03 2; : <5 23/5 2-, ;3 . ,3+ 07 /. :7 ,7 >>-,3=:0 /+ -/ / ,,3. +7 / -,3->> 999 :; 99999999999999999999999999999999999999999 7, 6? ,32 ;< :5 23-35 2-, ;/ . ,3+ 07 /2 :7 07 :>,,32< 07 +< // . ,3. +7 /7 ,3->> 999 :7 99999999999999999999999999999999999999999 -,2? 13 0; +.. 23;5 2-, ;/ . ,3+ 07 /2 :7 07 ->5 +3 ;< ,/ +. /7 . :3. ,7 / 0+. ->: 999 57 99999999999999999999999999999999999999999 5, +? <3 2; >+ 5 23-35 2-, ;/ . ,3+ 07 /2 :7 07 :>:+ 3; 5, 7. // -2< 3 ..1 /, ,7 ->> 999 :7 99999999999999999999999999999999999999999 ,,:? -/ 2; ,,. 2/ -/ 5 2-, ;/ . ,3+ 07 /, :7 ,7 2>,:3. :0 76 2/ -7 :3. 07 /0 :3-> 99999999999999999999999999999999999999999999999 ;, +? ,32 ; ++5 6/ -/ 5 2-, ;/ . ,3+ 07 /, :7 07 6 >,:35 ,0 7: + 99999999999999999999999999999999999999999999999 65 ?0 32 ;< +5 6/ -/ 5 2-, ;/ ., /+ 07 /0 :7 ,7 2+ :: / ,,,= /0 // ,2 ;. /7 /: 0 =-:4 999 >; 99999999999999999999999999999999999999999 ., 0? ,32> 75. 3;5 27 ;/ . ,3+ 07 /0 :7 ,7 2+ :+ / ,,,= /0 // ,2 ;. 67 /: 0 =-:4 999 >; 99999999999999999999999999999999999999999 ., 0? ,32> 75. 3;5 27 ;/ . ,3+ 07 /0 :7 ,7 2+ :: / ,,,= 60 // ,2 ;. 67 /: 0 =-+ > 999 >; 99999999999999999999999999999999999999999 . ,,? ,32> 75. 33 ;5 27 =/ . ,3+ 07 /0 :7 ,7 6+ :< /0 ,,= 60 // ,+ ;. 67 /, 0 =-+ , 999 >; 99999999999999999999999999999999999999999 ,,:? < 32>-+ 5. 3;5 27 =/ . ,3+ 07 /0 :7 ,7 6+ :< /0 ,,=20 // ,+ ;. 67 /, 0 =-+ , 999 >; 99999999999999999999999999999999999999999 ,,:? < 32>-+ 5. 3;5 27 =/ . ,3+ 07 /0 :7 ,7 6+ :< /0 ,,= 60 // .. ;. 67 3,0 =-+ , 999 >; 99999999999999999999999999999999999999999 ,,,? < 32>-+ 5. 3;5 27 =/ . ,3+ 07 /0 :7 ,7 6 .+< /0 ,,= 60 // ,+ ;. 67 3,0 =-+ , 999 >; 99999999999999999999999999999999999999999 ,6 :? < 32>-+ 5. 3;5 27 ;/ . ,3+ 07 /0 :7 ,7 6+ :+ /0 ,,= /0 // .+ ;. 67 /: 0 =-. > 999 >; 99999999999999999999999999999999999999999 . ,:? ,32>-:5 .3 -; 52 -7 ;/ ., 7+ 07 ;0 :7 ,7 ,>. --5 63 -; 52 -7 ;/ . ,3+ 07 /0 :7 ,7 2+ :+ / ,,,= /0 // .2 ;. 67 /: 0; -:999 2; 99999999999999999999999999999999999999999 ,,,? ,32>2 75 .3 -; 52 -7 ;/ . ,3+ 07 /. :7 ,7 6> 6< / -:0 /: // =. .; .6 7 3,0 ; 2,. 999 2> 99999999999999999999999999999999999999999 3>,? 02 26 0+5 23 /; 52 -7 ;/ ., /+ 07 /. :7 ,7 2:2: /< -0 7< . /// :; .2 7/ :2=2:+ 999 0; 99999999999999999999999999999999999999999 ,2. ?, /. >/ .5 2;; 52 -7 ;3 . ,3+ 07 /0 :0 07 .2 <, 6 -2: /. ,7 30 :; .17 5, =7 ++ 999 ,0 99999999999999999999999999999999999999999 7 ::? :/ 2; 2. 52 7 235 2/ ;3 . ,3+ 07 ;0 :0 07 2-< ,6 -:: /6 ,330 :; .17 5, =7 ++ 999 :0 99999999999999999999999999999999999999999 7 ::? :32 ; 2,5 27 235 2/ ;3 . ,3+ 07 ;0 :0 07 2-< ,6 -:: /6 ,330 :; .17 5, =7 ++ 999 :0 99999999999999999999999999999999999999999 7 ::? :32 ; 2,5 27 235 2/ ;/ . ,3+ 07 10 :,< 1 2>0 ,6 --: /6 07 30 :; .17 5, =7 +5 999 :7 99999999999999999999999999999999999999999 7. :? :32 ;6 ,5 27 -35 27 ;/ . ,3+ 07 /6 :0 ,7 ,-0 ,/ -:/. ,6 30 :; .17 5, =7 ++ 999 20 99999999999999999999999999999999999999999 7: 6? <3 2; 20 52 78 /5 27 999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 ;/ . ,3+ 07 /. :7 ,7 >,:, /: 60 /. ./ /< 13 .: 7/ :< =. /2 999 22 99999999999999999999999999999999999999999 2,+ ? ,30 ;+ ,5 63 /1 52 -7 ;/ ., /+ 07 /7 :7 0 =2:+ :/ ,:0 33: / 3:+ /. <7 /: 2; 7> : 999 ,= 999999999999999999999999999999999999999999 +2 ?, / ,+05 7/ 9/ 52 -7 ;/ . ,3+ 07 /0 :7 07 :7 :+ / ,:0 /: 0/ / :2=. -/ 7: 0 =2,, 999 6> 99999999999999999999999999999999999999999 7; .? 03 2; :. 5 23/5 27 ;/ . ,3+ 07 /6 :7 47 + 2:< / :,, ;. 0/ /2 0; .+ 7/ :/ 45 :> 999 7; 99999999999999999999999999999999999999999 ,+ ,? ,7 . 2-5 +.3 -; 52 -7 ;3 . ,3+ 07 /, :7 ,7 . :2/< 2, ;6 4/ /. +; .. 7/ +6 =-:2 999 :; 99999999999999999999999999999999999999999 0 ,,? ,/ +7, 56 33 ;5 27 ;3 ., 1+ ,/ 5+; 7 =235 67 0+0 7< -. -4 -/ <7 >/ 5 -:+ 99999 .; 999999999999999999999999999999999999999999999999999999999999 @ABCDEFGHIJ:;! @ABCDEFGHIJ0HC$JK:;7$L @ABCDEFGHIJ2MA$$&JKNM0.>L 4@HOPCQQCDIRC@FGCJ78%JSDFATJ:;! CDIRC@CQJ:;! ,FDCDIRC@FGCJSDFATJ$J:;! -NSCDPJ:;! ,FDCDIRC@FGCJSDFATJ!J:;! -NSCDPJ:;!JTCDCQFS ,FDCDIRC@FGCJSDFATJ!J:;!JTCDCQFS UCGRECDIRC@FGCJ:;!V UCIG@DHCQJ:;5 @ABCDEFGHIJ:;! @ABCDEFGHIJ0HC$JK:;7$L @ABCDEFGHIJ2MA$$&JKNM0.>L 4@HOPCQQCDIRC@FGCJ78%JSDFATJ:;! CDIRC@CQJ:;! ,FDCDIRC@FGCJSDFATJ$J:;! -NSCDPJ:;! ,FDCDIRC@FGCJSDFATJ!J:;! -NSCDPJ:;!JTCDCQFS ,FDCDIRC@FGCJSDFATJ!J:;!JTCDCQFS UCGRECDIRC@FGCJ:;!V UCIG@DHCQJ:;5 .,3+ !" 52!# -4H + 45,6 !$ 0 5 /6/ !% +6>54 !&

PAGE 152

! $'* ! Supplementary Fig. S 3. 6: Schematic view of occurrence of indels in archaeal, eukaryotic and bacterial EF 2 and EF 2 paralogs . The cartoon is based on an alignment of EF 2 and EF 2 paralogs from a selected set of representative organisms, mainl y comprising Asgard archaea and Eukaryotes. Canonical EF 2 sequences are represented by purple and EF 2 paralogs by light purple bars. Potential indels are shaded by red bars and indel positions are highlighted with orange triangles. 300 500 700 900 1100 1300 1500 !"#$%&'(%)*+,-./,0123456 !"#$%&'(%)"7%,89:;2 <("&%&'(%)"7% =>$?%&'(%)"7% @A&"B%'*C*+,?)*7&".($C*+ 9%'$CC*-,-*B7$C$!"#$%&'()*+,-./0 !"#$%&'()*+1234 !"#$%&'()*+125 D)$+>%CC,!1E D)$+>%CC,89F2E6 D)$+>%CC,9G D)$+>%CC,!1G !"#$%&'(%)"7%,89:;G !"#$%&'(%)"7%,89:;3 !"#$%&'(%)"7%,89:;6 !"#$%&'(%)"7%,89:;H !"#$%&'(%)"7%,89:;I !"#$%&'(%)"7%,89:;E !"#$%&'(%)"7%,89:22 !"#$%&'(%)"7%,89:26 !"#$%&'(%)"7%,1:43 !"#$%&'(%)*+,-./,0123456 !"#$%&'(%)"7%,89:;2 <("&%&'(%)"7% =>$?%&'(%)"7% D)$+>%CC,89F2E6 !"#$%&'(%)"7%,89:;G !"#$%&'(%)"7%,89:;3 !"#$%&'(%)"7%,89:;6 !"#$%&'(%)"7%,89:;H !"#$%&'(%)"7%,89:;I !"#$%&'(%)"7%,89:;E !"#$%&'(%)"7%,89:22 !"#$%&'(%)"7%,89:26 125 125+6$%$7'8

PAGE 153

! $'+ ! Supplementa ry Fig. S3.7

PAGE 154

! $', ! Supplementary Fig. S3.7

PAGE 155

! $(! Supplementary Fig. S3.7

PAGE 156

! $($ ! Supplementary Fig. S 3. 7: Multiple sequence alignment of occurrence of indels in archaeal, eukaryotic, and bacterial EF 2 and EF 2 paralogs. Selected characteristic indel regions derived from the multiple sequence alignment of a representative set of EF 2 homologs, which provided the basis for Fig. S 3. 6. Indel positions are shaded in light purple.

PAGE 157

! $(% ! Supplementary Fig. S 3. 8: Maximum likelihood phylogenetic analyses of Dph1/2 (a) and Dph5 (b) performed using IQ tree. Eukaryotic homologs were collapsed and are represented by dark red triangles, while homologs of Woesearchaeota and Heimdallarchaeota are shaded in light red. Bo th phylogenetic trees are unrooted. Values on branches refer to support values based on ultrafast bootstrap approximation as well as single branch tests. Whenever any of the two support values were lower than 70, bootstraps were removed. Scale bar indicate s the number of substitutions per site. !"# $ % & ' ()* +' , ' ()$ % -. % -& /%'+ 0 % 1 ' -0%' 0 '% -& /%'+ 0 % 2'0/%3+&+&&%4' , 56789!:;<:<=%3>?>%0@,?+, % E/'-P+S-+0'% 4 ',)FJ ? R%-&/%'+0 % 2'0/%3+P%,,?4??&+&&%4' , 6 78/9:;<=0!)12$33$% * .$!' ( !+7>= 5 7%>',%-&/%'%F)2MAT:F%-&/%' % I?%S/'-+0-?0',)FJ40?%-&/%' % E/%@P%-&/%'+0 % E/'-P+S4%,P%0%4',)FJ&?>@4?S-+N@3> ? !"#$%&'()*+,-. ; 2'0/%3+3%0-+3%-&/%'? % U: " 8 ) U U UU " U ) :! ! UV " V ) :! ! UV " # ) :! ! U: " V ) U ! U:)V U V8 " U ) 9 ; V: " D ) U 8 UU " U ) :! ! VU " 9 ) U U UU " 8 ) :! ! U8 " # ) V ! 9C " 9 ) U : V9 " U ) V 8 V8 " : ) V ! UU " 9 ) :! ! U: " : ) V 9 UU " U ) :! ! UU " U ) :! ! :!! ) :! ! V# " D ) 9 9 VU " V ) 9 U UU " V ) :! ! UD " 9 ) V ; UU " V ) :! ! :!! ) :! ! U8 " D ) U U 9D " # ) 9 U 9U " 8 ) V ; VV " # ) U C :!! ) :! ! U: " ; ) U U UC " : ) V U U; " # ) U U !"D A% 0 /B%-&/%'+ 0 % =-'3%-&/%'+ 0 % $ % & ' ()2 ? &% ()$ % -. % -& /%'+ 0 % *+','%-&/%'+0 % 2' 0 /%3+P?&-+O?% )F 7%4+%-&/%'% )F J-&/%'+R4+O%4' , =-'3%-&/%'+0%)FE/'-P+S4%,P%0%4',)FJ?R%-&/%'+0 % 56=#VC!!<:= 5 E/'-P+&+&&% 4 ' , K%3+/%4+%-&/%'%)FK%3+%-&/%'+0 % 2AQ I )F2 ' 0 /%3+ P % ,, ?4?? & + && %4' ,)F2 ' 0 /%3+3% 0+3% -& /%'?% )FJ& ?>@4?S + N @3> ? =-'3%-&/%'+ 0 % 0!)12$33$%*.$!'(!+45 AB C <;9:;+$%*.$!'D+ EF /9;;=AG; ? J40?%-&/%' % 2'0/%3+&+&&%4' , 7%>',%-&/%'%)F2MAT : U8 " # ) U U UV " U ) U ; U# " V ) U U UD " 9 ) 9 D V8 " ; ) U ; :!! ) :! ! U8 " # ) V V :!! ) :! ! V9 " # ) V ! V; " # ) V : V9 " 9 ) U C VU " D ) U U UV " U ) :! ! UV " # ) :! ! :!! ) :! ! U9 " 9 ) U V :!! ) :! ! V; " : ) U 8 U8 " : ) 9 8 V: " 9 ) 9 U U9 " 9 ) :! !

PAGE 158

! $(& ! The following supplementary files are permanently hosted online at: DOI : https://doi.org/10.1101/262600 Supplementary File S3.1 Sheet 1: Distribution of diphthamide biosynthesis genes in archaea. Archaeal homologues and their corresponding accession numbers for each Dph gene are shown as retrieved from an in house archaeal COG (arCOG) dataset. Total counts for each archaeal group of interest are shown. Sheet 2: Distribution of diphthamide biosyn thesis genes in eukaryotes. Eukaryotic homologues and their corresponding accession numbers for each Dph gene are shown as retrieved from EGGNOG or manual inspection. Total counts for each eukaryotic supergroup are indicated in different colours. When ap propriate, nucleomorph or nucleus encoded sequences are indicated. Sheet 3: Structural modeling results including scoring of top structural model predicted by i Tasser and best structural hit to that model from PDB. Supplementary File S3.2 Trimmed alignment of EF 2 homologs that was used for phylogenetic analyses. The alignment was generated using mafft LINSi (Katoh K, Standley DM. Mol Biol Evol. 30:772 80, 2013, doi: 10.1093/molbev/mst010 ) and subjected to trimming with trimAL (5%) (Capella Gutierrez S, Silla Martinez JM, Gabaldon T. Bioinformatics 25:1972 3, 2009, doi: 10.1093/bioinformatics/btp348 ) after the manual removal of poorly aligned ends. Please refer to methods section for more details. Supplementary File S3.3a Newick file of concatenated ribosomal proteins phylogeny. Please refer to figure legend of Fig. S 3. 1 for more details. Supplementary File S3.3b Newick file of EF 2 phylogeny presented in Figure 3. 2. Please refer to figure legend of Fig. 3. 2 for more details. Supplementary File S3.3c Newick file of Dph1/2 phylogeny. Please refer to figure legend of Fig. S 3. 8a for more details. Supplementary File S3.3d Newick file of Dph5 phylogeny. Please refer to figure legend of Fig. S 3. 8b for more details.

PAGE 159

! $( ' ! APPENDIX C Supplementary Material For Chapter I V Figure S4.1 mcrB phylogeny Maximum likelihood phylogeny of mcrB amino acid sequences from publicly available genomes and putative OWC Bathyarchaeotal mcrB sequences. OWC Bathyarchaeotal mcrB sequences clade with those from Ca. Bathyarchaeota BA1 and BA2 and near to Ca. Syntrophoarchaeum sp. sequences. Colors are as in Figure 4.3

PAGE 160

! $(( ! Figure S4.2 mcrG phylogeny Maximum likelihood phylogeny of mcrG amino ac id sequences from publicly available genomes and putative OWC Bathyarchaeotal mcrG sequences. OWC Bathyarchaeotal mcrG sequences clade with those from Ca. Bathyarchaeota BA1 and BA2 and near to Ca. Syntrophoarchaeum sp. sequences. Colors are as in Figure 4.3