文档库 最新最全的文档下载
当前位置:文档库 › 2012Nature@A metagenome-wide association study of gut microbiota in type 2 diabetes

2012Nature@A metagenome-wide association study of gut microbiota in type 2 diabetes

ARTICLE

doi:10.1038/nature11450

A metagenome-wide association study of gut microbiota in type 2diabetes

Junjie Qin 1*,Yingrui Li 1*,Zhiming Cai 2*,Shenghui Li 1*,Jianfeng Zhu 1*,Fan Zhang 3*,Suisha Liang 1,Wenwei Zhang 1,Yuanlin Guan 1,Dongqian Shen 1,Yangqing Peng 1,Dongya Zhang 1,Zhuye Jie 1,Wenxian Wu 1,Youwen Qin 1,Wenbin Xue 1,Junhua Li 1,Lingchuan Han 3,Donghui Lu 3,Peixian Wu 3,Yali Dai 3,Xiaojuan Sun 2,Zesong Li 2,Aifa Tang 2,Shilong Zhong 4,Xiaoping Li 1,Weineng Chen 1,Ran Xu 1,Mingbang Wang 1,Qiang Feng 1,Meihua Gong 1,Jing Yu 1,Yanyan Zhang 1,Ming Zhang 1,Torben Hansen 5,Gaston Sanchez 6,Jeroen Raes 7,8,Gwen Falony 7,8,Shujiro Okuda 7,8,Mathieu Almeida 9,

Emmanuelle LeChatelier 9,Pierre Renault 9,Nicolas Pons 9,Jean-Michel Batto 9,Zhaoxi Zhang 1,Hua Chen 1,Ruifu Yang 1,10,Weimou Zheng 1,Songgang Li 1,Huanming Yang 1,Jian W ang 1,S.Dusko Ehrlich 9,Rasmus Nielsen 6,Oluf Pedersen 5,11,12,Karsten Kristiansen 1,13&Jun Wang 1,5,13

Assessment and characterization of gut microbiota has become a major research area in human disease,including type 2diabetes,the most prevalent endocrine disease worldwide.To carry out analysis on gut microbial content in patients with type 2diabetes,we developed a protocol for a metagenome-wide association study (MGWAS)and undertook a two-stage MGWAS based on deep shotgun sequencing of the gut microbial DNA from 345Chinese individuals.We identified and validated approximately 60,000type-2-diabetes-associated markers and established the concept of a metagenomic linkage group,enabling taxonomic species-level analyses.MGWAS analysis showed that patients with type 2diabetes were characterized by a moderate degree of gut microbial dysbiosis,a decrease in the abundance of some universal butyrate-producing bacteria and an increase in various opportunistic pathogens,as well as an enrichment of other microbial functions conferring sulphate reduction and oxidative stress resistance.An analysis of 23additional individuals demonstrated that these gut microbial markers might be useful for classifying type 2diabetes.Type 2diabetes (T2D),which is a complex disorder influenced by both genetic and environmental components,has become a major public health issue throughout the world 1,2.Currently,research to parse the underlying genetic contributors to T2D is mainly through the use of genome-wide association studies (GWAS)focusing on identifying genetic components in the organism’s genome 3,4.Recently,research has indicated that the risk of developing T2D may also involve factors from the ‘other genome’,that is,the ‘intestinal microbiome’(also termed the gut metagenome)5.

Previous metagenomic research on the gut metagenome,primarily using 16S ribosomal RNA 6and whole-genome shotgun (WGS)sequencing 7,has provided an overall picture of commensal microbial communities and their functional repertoire.For example,a catalogue of 3.3million human gut microbial genes were established in 2010(ref.8)and,of note,a more extensive catalogue of gut microorganisms and their genes were published later 9,10.Recent research on the gut metagenome has changed our understanding of human disease and its potential medical impact as many studies have reported.From the perspective of both taxonomic and functional composition,the gut microbiota might be linked to and contribute to many complex diseases 11.For example,several studies have indicated that obesity is associated with an increase in the phylum Firmicutes and a relatively lower abundance of the phylum Bacteroidetes 7,12–16.Crohn’s disease research has revealed that patients had a significant reduction in the overall diversity of the gut microbiota 17and had changes in

microbial composition 18,and a T2D study showed that the proportion of the phylum Firmicutes and the class Clostridia in the gut of patients was significantly reduced 19.However,more work is required to gain detailed information about gut microbial compositional changes and their associated impact with these types of diseases,and additional tools are required to find ways to determine associated changes easily and rapidly.

To reach these initial goals,we devised and carried out a two-stage case-control metagenome-wide association study (MGWAS)based on deep next-generation shotgun sequencing of DNA extracted from the stool samples from a total of 345Chinese T2D patients and non-diabetic controls.From this we pinpointed specific genetic and func-tional components of the gut metagenome associated with T2D (Supplementary Fig.1).Our data provide insight into the character-istics of the gut metagenome related to T2D risk,a paradigm for future studies of the pathophysiological role of the gut metagenome in other relevant disorders,and the potential usefulness for a gut–microbiota-based approach for assessment of individuals at risk of such disorders.

Construction of a gut metagenome reference

To identify metagenomic markers associated with T2D,we first developed a comprehensive metagenome reference gene set that included genetic information from Chinese individuals and T2D-specific gut microbiota,as the currently available metagenomic ref-erence (the MetaHIT gene catalogue)did not include such data.We

*These authors contributed equally to this work.

1

BGI-Shenzhen,Shenzhen 518083,China.2Shenzhen Second People’s Hospital,The First Affiliated Hospital of Shenzhen University,Shenzhen 518035,China.3Peking University Shenzhen Hospital,Shenzhen 518036,China.4Medical Research Center of Guangdong General Hospital,Guangdong Academy of Medical Sciences,Guangzhou 510080,China.5The Novo Nordisk Foundation Center for Basic Metabolic Research,Faculty of Health Sciences,University of Copenhagen,DK-2100Copenhagen,Denmark.6Department of Integrative Biology and Department of Statistics,University of California Berkeley,Berkeley,CA 94820,USA.7Department of Structural Biology,VIB,1050Brussels,Belgium.8Department of Applied Biological Sciences (DBIT),Vrije Universiteit Brussel,1050Brussels,Belgium.9

Institut National de la Recherche Agronomique,78350Jouy en Josas,France.10State Key Laboratory of Pathogen and Biosecurity,Beijing Institute of Microbiology and Epidemiology,Beijing 100071,China.11Institute of Biomedical Sciences,University of Copenhagen &Faculty of Health Science,University of Aarhus,DK-8000Aarhus,Denmark.12Hagedorn Research Institute,DK-2820Gentofte,Denmark.13Department of Biology,University of Copenhagen,DK-2200Copenhagen,Denmark.00M O N T H 2012|V O L 000|N A T U R E |1

carried out WGS sequencing on individual faecal DNA samples from 145Chinese individuals(71cases and74controls,Supplementary Table1)and obtained an average of2.61gigabases(Gb)(15.8million) paired-end reads for each,totalling378.4Gb of high-quality data that was free of human DNA and adaptor contaminants(Supplementary Table2).We then performed de novo assembly and metagenomic gene prediction for all145samples.We integrated these data with the MetaHIT gene catalogue,which contained3.3million genes that were predicted from the gut metagenomes of individuals of European descent,and obtained an updated gene catalogue with4,267,985pre-dicted genes.A total of1,090,889of these genes were uniquely assembled from our Chinese samples,which contributed10.8%addi-tional coverage of sequencing reads when comparing our data against that from the MetaHIT gene catalogue alone(Supplementary Fig.2). Having a more complete gene reference,we carried out taxonomic assignment and functional annotation for the updated gene catalogue using2,890reference genomes(IMG v3.4;Supplementary Table3), KEGG(Release59.0)and eggNOG databases(v3).Here,21.3%of the genes in the updated catalogue could be robustly assigned to a genus, which covered26.4%–90.6%(61.2%on average)of the sequencing reads in the145samples(Supplementary Methods);the remaining genes were likely to be from currently undefined microbial species. For assessment at a functional level,we identified6,313KEGG ortho-

logues and38,641eggNOG orthologue groups in the updated gene catalogue,which covered47.1%and60.9%,respectively,of the genes in the catalogue.In addition,14.0%of genes that were not mapped to eggNOG orthologue groups could be clustered into7,042novel gene families;however,these do not yet have any functional annotation information,but were still included(as in-house eggNOG orthologue groups)in our analyses.For each metagenomic sample,on average, 48.7%and68.8%sequencing reads were covered,respectively,by these KEGG orthologues-and eggNOG orthologue groups-annotated genes. Marker identification using a two-stage MGWAS

To define T2D-associated metagenomic markers,we devised and carried out a two-stage MGWAS https://www.wendangku.net/doc/389971343.html,ing a sequence-based profiling method,we quantified the gut microbiota in the145samples for use in stage I.On average,with the requirement that there should be$90%identity,we could uniquely map77.460.6%(mean6s.e.m.; n5145)paired-end reads to the updated gene catalogue(Supplemen-tary Fig.2and Supplementary Table2).To normalize the sequencing coverage,we used relative abundance instead of the raw read count to quantify the gut microbial genes(Supplementary Methods).With nearly16million sequencing reads on average per sample,our sequence-based profiling method could reliably detect very low-abundance genes.For example,given a gene with a real relative abund-ance of131026,the detected value ranged from0.731026to 1.531026based on a theoretical estimation(Supplementary Fig.3). To facilitate the subsequent statistical analyses at both genetic and functional levels,we further defined and prepared three types of profiles using the quantified gene results:(1)a gene profile;(2)a KEGG orthologues profile;and(3)an eggNOG orthologue groups profile(Supplementary Methods).

We investigated the subpopulations of the145samples in these different profiles.Applying the same identification method as used in the MetaHIT study20,we identified three enterotypes in our Chinese samples(Supplementary Figs4and5).A principal component analysis(PCA)showed that these three enterotypes were primarily made up of several highly abundant genera,including Bacteroides, Prevotella,Bifidobacterium and Ruminococcus(Fig.1a).However, we found no significant relationship between enterotype and T2D disease status(P50.29,Fisher’s exact test).We examined the top five principal components(P value in Tracy–Widom test,0.05and con-tribution.3%):the first and second principal components were sig-nificantly correlated with enterotype(P,0.001,Kruskal–Wallis test), and the fifth principal component was significantly correlated with T2D(P,0.001,Wilcoxon rank-sum test;Supplementary Fig.5d), indicating that T2D,in addition to enterotype,was a determining factor in explaining the gut microbial differences in our samples. The third and fourth principal components,however,did not correlate with any known factors.

We then corrected for population stratification,which might be related to the non-T2D-related factors.For this we analysed our data using a modified EIGENSTRAT method21;however,unlike what is done in a GWAS subpopulation correction,we applied this analysis to microbial abundance rather than to genotype.For gene profile,after adjustment,we found that the effects that correlated with non-T2D-related factors disappeared(Supplementary Table4).A Wilcoxon rank-sum test was done on the adjusted gene profile to identify differ-

ential metagenomic gene content between the T2D patients and con-trols.The outcome of our analyses showed a substantial enrichment of a set of microbial genes that had very small P values,as compared with the expected distribution under the null hypothesis(Fig.1b),indi-cating that these genes were true T2D-associated gut microbial genes. To validate the significant associations identified in stage I,we carried out the stage II analysis using an additional200Chinese individuals(one of these samples had a very low within-sample diversity,which was probably owing to the presence of a high fraction of Escherichia and Klebsiella,and was therefore excluded in later analyses;Supplementary Tables1and2).We also used WGS sequen-cing in stage II and generated a total of830.8Gb sequence data with 23.6million paired-end reads on average per sample.We then assessed the278,167stage I genes that had P values,0.05and found that the majority of these genes still correlated with T2D in these stage II study samples(Supplementary Fig.6).We next controlled for the false discovery rate(FDR)in the stage II analysis,and defined a total of 52,484T2D-associated gene markers from these genes corresponding to a FDR of2.5%(stage II P value,0.01;Fig.1c,Supplementary Fig.7 and Supplementary Table5).

We applied the same two-stage analysis using the KEGG orthologues and eggNOG orthologue groups profiles and identified a total of1,345 KEGG orthologues markers(stage II P,0.05and4.5%FDR)and5,612 eggNOG orthologue groups markers(stage II P,0.05and6.6%FDR) that were associated with T2D(Supplementary Tables6and7). Development of a metagenomic linkage group

To reduce and structurally organize the abundant metagenomic data and to enable us to make a taxonomic description,we devised the

b

P values

D

e

n

s

i

t

y

Figure1|Identification of T2D-associated markers from gut metagenome. a,The T2D patients(n571)and controls(n574)from stage I were plotted on the first two principal components of the genus profile.Lines connect individuals determined to have the same enterotype(using the PAM clustering method of refs20,36),and coloured circles cover the individuals near the centre of gravity for each cluster(,1.5s).The top four genera as the main contributors to these clusters were determined and plotted by their loadings in these two components.b,Density histogram showing the P-value distribution of all genes tested in stage I.The horizon line represents the distribution of P values under the null hypothesis.c,Density histogram showing the P-value distribution of genes in stage II,which were identified from stage I.The blue and red curves denote the estimated statistical power and false discovery rate (FDR),respectively,for a particular P value.

RESEARCH ARTICLE

2|N A T U R E|V O L000|00M O N T H2012

generalized concept of metagenomic linkage group(MLG)in lieu of a species concept for a metagenome.Here a MLG is defined as a group of genetic material in a metagenome that is probably physically linked as a unit rather than being independently distributed;this allowed us to avoid the need to completely determine the specific microbial species present in the metagenome,which is important given there are a large number of unknown organisms and that there is frequent lateral gene transfer(LGT)between https://www.wendangku.net/doc/389971343.html,ing our gene profile, we defined and identified a MLG as a group of genes that co-exists among different individual samples and has a consistent abundance level and taxonomic assignment(Supplementary Methods).

To assess the reliability of our MLG identifying method,we first constructed a subset of bacterial genes from the updated metagenome gene catalogue(n5130,605)that were independently derived from 50known gut bacterial species(Supplementary Methods).We used a threshold for the minimum gene number for a MLG of100,above which all50bacterial species could be identified with an average genome coverage of83.0%and with an accuracy in the taxonomic classification of genes in the constructed subset of99.8%(Supplemen-tary Fig.8and Supplementary Table8).

We identified47MLGs in the T2D-associated gene markers,which covered84.4%of these markers(Supplementary Table9).Of these,17 MLGs could be assigned to known bacterial species on the basis of strong alignment sequence similarity with sequenced bacterial genomes at the nucleotide level(Table1).Using the taxonomic char-acterization from these MLGs,we found that almost all of the MLGs enriched in the control samples were from various butyrate-producing bacteria,including Clostridiales sp.SS3/4,Eubacterium rectale,Faecalibacterium prausnitzii,Roseburia intestinalis and Roseburia inulinivorans.By contrast,most of T2D-enriched MLGs were from opportunistic pathogens,such as Bacteroides caccae, Clostridium hathewayi,Clostridium ramosum,Clostridium symbiosum, Eggerthella lenta and Escherichia coli,which have previously been reported to cause or underlie human infections such as bacteraemia and intra-abdominal infections22–25.Of interest,the known mucin-degrading species Akkermansia muciniphila and sulphate-reducing species Desulfovibrio sp.3_1_syn3were also enriched in T2D samples. The MLGs that were of unknown species origin will be of interest for isolation and analysis in future studies to obtain information on their relevant taxonomy.

A co-occurrence network on these MLGs was generated to assess potential relationships between the T2D-associated gut bacteria (Fig.2a and Supplementary Methods).In this result,some types of butyrate-producers,from clostridial cluster XIVa and IV,showed a positive correlation with one another and were negatively correlated with a group of the T2D-enriched bacteria from Clostridium,which may indicate an antagonistic relationship between these different clostridial clusters.Another interesting finding was the presence of a small MLG from Haemophilus parainfluenzae,which is not a butyrate-producer but was significantly enriched in the control samples,even in an independent analysis comparing the coverage of its sequenced bacterial genome(the highest genome coverage in all samples was 94.5%;P,0.001between case and control groups,Student’s t-test). In the co-occurrence network,this MLG was clearly separate from the cluster of butyrate producers,and may have an unknown antagonistic relationship with a T2D-enriched bacterium that is unknown but appears closely related to the Subdoligranulum genus.These data presented various patterns indicating relationships between the T2D-associated gut bacteria and suggested it may be important to determine,in a case-by-case manner,the different roles gut bacteria may have in maintaining or interacting with their environment. Functional characterization related to T2D

Using the T2D-associated KEGG orthologues and eggNOG ortholo-gue groups markers,we assessed the potential microbial functional roles in the gut microbiota of T2D patients.In general T2D-enriched markers were typically involved in the KEGG categories of membrane transport(P,0.001,Fisher’s exact test).This result is consistent with

Table1|The list of T2D-associated MLGs that could be assigned to previously known phylotypes

MLG ID No.of genes P values*Odds ratios(95%CI){Taxonomy assignment(level)Percentage similarity{

Stage I Stage II

T2D-enriched

T2D-1543370.0014 2.5431024 1.52(1.05,2.19)Akkermansia muciniphila98.2

T2D-140148 3.97310240.0029 1.50(1.15,1.97)Bacteroides intestinalis98.2

T2D-1393,3860.0013 2.1131024 1.66(1.26,2.20)Bacteroides sp.20_399.3

T2D-115,113 4.16310287.5831025 5.89(1.39,25.0)Clostridium bolteae99.4

T2D-52,378 4.2131025 1.973102623.1(2.08,257)Clostridium hathewayi99.3

T2D-802,381 1.3031024 1.4131025 1.68(0.97,2.89)Clostridium ramosum99.8

T2D-57821 4.0031027 2.2131025 2.62(1.14,6.03)Clostridium sp.HGF299.6

T2D-152,492 4.7431025 2.9731024 1.13(0.88,1.44)Clostridium symbiosum99.6

T2D-1949 6.01310240.0036 1.41(0.93,2.13)Desulfovibrio sp.3_1_syn398.0

T2D-71,056 6.0131024 2.8031024 1.57(0.95,2.58)Eggerthella lenta99.6

T2D-137425 6.71310270.0012 1.72(1.16,2.57)Escherichia coli99.0

T2D-1651310.00960.0017 1.46(1.07,1.99)Alistipes(genus)99.51

T2D-12364 4.52310268.0431028 2.22(1.12,4.40)Clostridium(genus)91.0

T2D-85,2727.083102109.9531026 1.12(0.86,1.45)Clostridium(genus)88.8

T2D-931,590 2.01310-40.0020 1.84(1.03,3.29)Parabacteroides(genus)80.51

T2D-622,5847.6331026 6.8831024 2.41(1.43,4.08)Subdoligranulum(genus)98.71

T2D-22,430 3.14310250.0019 4.06(1.28,12.9)Lachnospiraceae(family)97.31 Control-enriched

Con-1071,677 1.12310270.0018 1.44(1.13,1.84)Clostridiales sp.SS3/498.0

Con-1122320.0064 1.9931024 1.51(1.13,2.03)Eubacterium rectale97.6

Con-1291,4400.00330.0010 1.55(1.19,2.00)Faecalibacterium prausnitzii98.2

Con-166273 3.8031025 1.9431024 1.25(0.93,1.69)Haemophilus parainfluenzae94.8

Con-1213,507 6.1131025 4.9031026 3.10(1.92,5.03)Roseburia intestinalis98.9

Con-113345 2.85310249.7231024 1.45(1.11,1.89)Roseburia inulinivorans98.2

Con-120116 1.9031024 5.4131024 1.55(1.17,2.06)Eubacterium(genus)89.0

Con-1306700.01340.0018 1.59(1.21,2.08)Faecalibacterium(genus)89.4

Con-1312028.99310240.0017 1.58(1.16,2.15)Faecalibacterium(genus)96.9

Con-1331,555 3.43310250.0015 1.52(1.15,2.01)Erysipelotrichaceae(family)66.91

Con-1093780.0135 1.6731024 1.41(1.09,1.83)Clostridiales(order)87.0

*The stage I P value was calculated after adjustment for population structures,stage II P value was one-side.

{Calculated by logistic model.

{Similarity at nucleic acid level or,when marked with1at the protein level.

ARTICLE RESEARCH

00M O N T H2012|V O L000|N A T U R E|3

the previous findings in studies of inflammatory bowel disease and obese patients 26.By contrast,control-enriched markers were fre-quently involved in cell motility and metabolism of cofactors and vitamins (P ,0.002;Supplementary Fig.9).

At the module or pathway level,the gut microbiota of T2D patients was functionally characterized with our T2D-associated markers and showed enrichment in membrane transport of sugars,branched-chain amino acid (BCAA)transport,methane metabolism,xenobiotics degradation and metabolism,and sulphate reduction.By contrast,there was a decrease in the level of bacterial chemotaxis,flagellar assembly,butyrate biosynthesis and metabolism of cofactors and vitamins (Fig.2b and Supplementary Table 10;see Supplementary Fig.10for the detailed information on butyrate-CoA transferase).Some important functions,including butyrate biosynthesis and sul-phate reduction,coincided with the T2D-associated bacteria identified in the MLG analysis.The butyrate-producing bacteria seemed to be the primary contributors to the cell motility functions (Supplementary Table 11),potentially indicating some functional enrichment might be related to the presence of specific species enrichment.

We found that seven of the T2D-enriched KEGG orthologues markers were related to oxidative stress resistance,including catalase (K03781),peroxiredoxin (K03386),Mn-containing catalase (K07217),glutathione reductase (NADPH)(K00383),nitric oxide reductase (K02448),putative iron-dependent peroxidase (K07223),and cyto-chrome c peroxidase (K00428),but none of the identified control-enriched KEGG orthologues markers had similar types of function.

This may indicate that the gut environment of a T2D patient is one that stimulates bacterial defence mechanisms against oxidative stress (Supplementary Table 10).Similarly,we found 14KEGG orthologues markers related to drug resistance that were greatly enriched in T2D patients,further supporting that T2D patients may have a more hostile gut environment,and the medical histories of these patients may reflect this (Supplementary Table 10).

T2D-related dysbiosis in gut microbiota

In light of the above MGWAS result and an additional PERMANOVA 27(permutational multivariate analysis of variance)analysis that clearly showed that T2D was a significant factor for explaining the variation in the examined gut microbial samples (Supplementary Table 12),we deduced that the gut microbiota in T2D patients featured dysbiosis,which is a state where the balance of the normal microbiota has been disturbed.However,the degree of this T2D-related dysbiosis was moderate,because only 3.860.2%(mean 6s.e.m.;n 5344)of the gut microbial genes (at the relative abundance level)were associated with T2D in an individual.Additionally,we did not observe a significant difference in the within-sample diversity between T2D and control groups (Fig.3a).Specifically,the degree of gut microbiota change in T2D was not as substantial as that seen in inflammatory bowel disease (from the MetaHIT samples 8;see Fig.3a)or enterotypes (Supplementary Fig.11).A similar result using the eggNOG orthologue groups profile sup-ported the same conclusion (Supplementary Fig.12).

b

Host tissues

Figure 2|Taxonomic and functional characterization of gut microbiota in T2D.a ,A co-occurrence network was deduced from 47MLGs that were identified from 52,484gene markers.Nodes depict MLGs with their ID

displayed in the centre.The size of the nodes indicates gene number within the MLG.The colour of the nodes indicates their taxonomic assignment.

Connecting lines represent Spearman correlation coefficient values above 0.4(blue)or below 20.4(red).b ,A schematic diagram showing the main functions of the gut microbes that had a predicted T2D association.Red text denotes enriched functions in T2D patients;blue text denotes depleted functions in T2D patients;black text denotes an uncertain functional role relative to T2D.The dashed line arrows point to the inference that was not detected directly but reported by previous studies.

RESEARCH ARTICLE

4|N A T U R E |V O L 000|00M O N T H 2012

To characterize ecologically the gut bacteria involved in the T2D-related dysbiosis,we compared,in all individual samples,the distri-bution of the occurrence rate of both T2D-associated gene and func-tion markers,and these showed the same pattern,which was that the control-enriched markers had a higher occurrence rate on average than the T2D-enriched markers (Fig.3b and Supplementary Figs 13–15).This may be because the beneficial bacteria lost in the T2D gut were universally present,whereas some of the harmful bacteria that appeared in the T2D gut were diverse,and thus had less overall abundance within the human population.

Gut-microbiota-based T2D classification

To exploit the potential ability of T2D classification by gut microbiota,we developed a T2D classifier system based on the 50gene markers that we defined as an optimal gene set by a minimum redundancy–maximum relevance (mRMR)feature selection method (Supplementary Fig.16and Supplementary Table 13).For intuitive evaluation of the risk of T2D disease based on these 50gut microbial gene markers,we computed a T2D index (Supplementary Methods),which correlated well with the ratio of T2D patients in our population (Fig.4a),and the area under the receiver operating characteristic (ROC)curve was 0.81(95%confidence interval 0.76–0.85)(Fig.4b),indicating the gut-microbiota-based T2D index could be used to classify T2D individuals accurately.

We validated the discriminatory power of our T2D classifier using an independent study group:11T2D patients and 12non-diabetic controls.In this assessment analysis,the top eight samples with the highest T2D index were all T2D patients (Fig.4c and Supplementary Table 14);the average T2D index between case and control was sig-nificantly different (P 50.004,Student’s t -test).Overall,our cross-sectional study in overt T2D indicated that it would be worthwhile to test more extensively gut-microbiota-based classifiers in future lon-gitudinal studies for their ability to identify subsets of the population that are at high risk for progressing to clinically defined T2D.

Discussion

T2D is a heterogeneous and multifactorial disease,influenced by a number of different genetic and environmental factors.By applying the standard two-stage GWAS strategy to design and carry out a MGWAS to identify disease-associated metagenomic markers,the present study highlights how the gut microbial composition,traditionally considered to be factors of environmental origin 12,dif-fers between T2D patients and non-diabetic control subjects in a Chinese population.

We first established an updated human microbial gene reference set,adding information from both a new ethnicity and from T2D patients,which will be a useful resource for future metagenomic analyses.We also developed the concept of a MLG,which provided various types of taxonomic information from whole-genome shotgun data,including bacterial species-specific regions on a chromosome,and mobile genetic elements,such as plasmids and bacteriophages.Thus,a MLG can provide metagenomic species-level information even for unknown species,instead of requiring traditional taxonomic classification approaches based on sequence composition or similarity 28,29.The use of species-level information allows assessment of the relationships between the T2D-associated bacteria.For example,we identified what appears to be an antagonistic relationship between beneficial bacteria and harmful bacteria,highlighted by the large populations of clostridial clusters.These species-level analyses also showed various patterns:for example,the MLG from Haemophilus parainfluenzae in the control samples could be inferred,under these circumstances,to be beneficial;however,on the basis of relationship patterns,it was quite distinct from the other inferred beneficial bacteria,indicating that H.parainfluenzae may have a different type of impact in this specific biological context (Fig.2a).

Our findings indicated that T2D patients had only a moderate degree gut bacterial dysbiosis;however,functional annotation ana-lyses indicated a decline in butyrate-producing bacteria,which may be metabolically beneficial,and an increase in several opportunistic

a

b

O c c u r r e n c e r a t e (n = 344)

T2D-enriched markers Control-enriched markers

A b u n d a n c

e s u m

W i t h i n

-s a m p l e d i v e r s i t y

0.00.1

0.211.0

11.4

11.8

Controls IBD patients Controls T2D patients Figure 3|Gut microbiota of T2D patients show a moderate degree of dysbiosis.a ,An ecological comparison between T2D patients (n 5170)and control (n 5174)in all samples,as well as inflammatory bowel disease (IBD)patients (n 525)and control (n 599)from published MetaHIT samples 8.The upward bars denote the gross relative abundance of the T2D-associated gene markers for each sample and the same value computed on the inflammatory-bowel-disease-associated gene markers (see Supplementary Methods).The downward bars denote the within-sample diversity (calculated using the

Shannon index)in each group.For an individual sample,a lower proportion of gut microbiota was implicated in T2D disease and there was no significant difference in the within-sample diversity between the T2D patients and control as compared with the distinct difference seen in the inflammatory bowel disease analysis.**P ,0.01;***P ,0.001(Student’s t -test);NS,not significant;and the error bar denote standard error.b ,A density histogram showing a

comparison of the occurrence rate distribution between T2D-enriched gene markers and control-enriched gene markers in all samples (n 5344).The threshold of mapped read number for gene identification is $2.

T2D index

N u m b e r o f i n d i v i d u

a l s

Percentage of T2D patients

a

b

C o n t r o l s

T 2D p a t i e n t

s

T2D index

c

1 – specificity

S e n s i t i v i t y

Figure 4|A trial classification of T2D using gut microbial gene markers.a ,A classifier to identify T2D individuals was constructed using 50gene markers selected by mRMR,and then,for each individual,a T2D index was calculated to evaluate the risk of T2D.The histogram shows the distribution of T2D indices for all individuals,in which values less than 21.5and values greater than 3.5were grouped.For each bin,the black dots show the proportion of T2D patients in the population of that bin (y axis on the right).b ,The area under the ROC curve (AUC)of gut-microbiota-based T2D classification.The black bars denote the 95%confidence interval (CI)and the area between the two outside curves represents the 95%CI shape.c ,The T2D index was computed for an additional 11Chinese T2D samples and 12non-diabetic controls.The box depicts the interquartile range (IQR)between the first and third quartiles (25th and 75th percentiles,respectively)and the line inside denotes the median,whereas the points represent the T2D index in each sample.

ARTICLE RESEARCH

00M O N T H 2012|V O L 000|N A T U R E |5

pathogens.Importantly,the abundance of these categories of opportunistic pathogens seemed to be quite diverse among our Chinese study participants.Such changes in the intestinal bacteria composition have recently been reported for colorectal cancer patients30and ageing population31.Thus,a general picture is emerging where butyrate-producing bacteria seem to have a protective role against several types of diseases.Additionally,our finding of a general dysbiosis in T2D patients raises the possibility that there is a‘func-tional dysbiosis’,rather than there being a specific microbial species that has a direct association with T2D pathophysiology.Furthermore, given that other intestinal diseases show a loss of butyrate-producing bacteria with a commensurate increase in opportunistic pathogens,it is possible that dysbiosis that results in a disordered,rather than directional,alteration of gut microbial composition may itself have a role in increasing the susceptibility to a variety of diseases.

Our analysis of bacterial gene functions indicating there was an increase in functions relating to gut oxidative stress response is also of interest,given that previous studies have shown that a high oxidative stress level is related to a predisposition for diabetic com-plications32.Finally,our findings that gut metagenomic markers are able to differentiate between T2D cases and controls with a higher level of specificity than similar analyses based on human genome variation33raises the possibility for a mode of monitoring gut health and a complementary approach for risk assessment of this common disorder.

METHODS SUMMARY

Sample collection and DNA extraction.Faecal samples were obtained from368 volunteers(345samples for MGWAS and23additional samples for T2D clas-sification)after signing an informed consent form.The sampling procedure was approved by the Ethical Committee for Clinical Research from the Peking University Shenzhen Hospital,Shenzhen Second People’s Hospital and Medical Research Center of Guangdong General Hospital.The individuals had not received any antibiotic treatment within2months before sample collection. The samples were frozen immediately and underwent DNA extraction using standard methods34.

Sequencing and data processing.Illumina GAIIx and HiSeq2000were used to sequence the samples.We constructed a paired-end library with insert size of ,350base pairs for every sample.Adaptor contamination and low-quality reads were discarded from the raw reads,and the remaining reads were filtered to eliminate human host DNA based on the human genome reference(hg18). Full Methods and associated references are available in the Supplementary Information.

Received30August2011;accepted27July2012.

Published online26September2012.

1.Wellen,K.E.&Hotamisligil,G.S.Inflammation,stress,and diabetes.J.Clin.Invest.

115,1111–1119(2005).

2.Rise′rus,U.,Willett,W.C.&Hu,F.B.Dietary fats and prevention of type2diabetes.

Prog.Lipid Res.48,44–51(2009).

3.The Wellcome Trust Case Control Consortium.Genome-wide association study of

14,000cases of seven common diseases and3,000shared controls.Nature447, 661–678(2007).

4.Scott,L.J.et al.A genome-wide association study of type2diabetes in Finns

detects multiple susceptibility variants.Science316,1341–1345(2007).

5.Musso,G.,Gambino,R.&Cassader,M.Interactions between gut microbiota and

host metabolism predisposing to obesity and diabetes.Annu.Rev.Med.62,

361–380(2011).

6.Eckburg,P.B.et al.Diversity of the human intestinal microbial flora.Science308,

1635–1638(2005).

7.Turnbaugh,P.J.et al.A core gut microbiome in obese and lean twins.Nature457,

480–484(2009).

8.Qin,J.et al.A human gut microbial gene catalogue established by metagenomic

sequencing.Nature464,59–65(2010).

9.The Human Microbiome Project Consortium.Structure,function and diversity of

the healthy human microbiome.Nature486,207–214(2012).

10.The Human Microbiome Project Consortium.A framework for human microbiome

research.Nature486,215–221(2012).

11.Vijay-Kumar,M.et al.Metabolic syndrome and altered gut microbiota in mice

lacking Toll-like receptor5.Science328,228–231(2010).

12.Ba¨ckhed,F.et al.The gut microbiota as an environmental factor that regulates fat

storage.Proc.Natl https://www.wendangku.net/doc/389971343.html,A101,15718–15723(2004).

13.Ley,R.E.et al.Obesity alters gut microbial ecology.Proc.Natl https://www.wendangku.net/doc/389971343.html,A102,

11070–11075(2005).14.Zhang,H.et al.Human gut microbiota in obesity and after gastric bypass.Proc.Natl

https://www.wendangku.net/doc/389971343.html,A106,2365–2370(2009).

15.Ba¨ckhed,F.,Manchester,J.K.,Semenkovich,C.F.&Gordon,J.I.Mechanisms

underlying the resistance to diet-induced obesity in germ-free mice.Proc.Natl https://www.wendangku.net/doc/389971343.html,A104,979–984(2007).

16.Turnbaugh,P.J.et al.An obesity-associated gut microbiome with increased

capacity for energy harvest.Nature444,1027–1031(2006).

17.Manichanh,C.et al.Reduced diversity of faecal microbiota in Crohn’s disease

revealed by a metagenomic approach.Gut55,205–211(2006).

18.Joossens,M.et al.Dysbiosis of the faecal microbiota in patients with Crohn’s

disease and their unaffected relatives.Gut60,631–637(2011).

https://www.wendangku.net/doc/389971343.html,rsen,N.et al.Gut microbiota in human adults with type2diabetes differs from

non-diabetic adults.PLoS ONE5,e9085(2010).

20.Arumugam,M.et al.Enterotypes of the human gut microbiome.Nature473,

174–180(2011).

21.Price,A.L.et al.Principal components analysis corrects for stratification in

genome-wide association studies.Nature Genet.38,904–909(2006).

22.Woo,P.C.Y.et al.Bacteremia due to Clostridium hathewayi in a patient with acute

appendicitis.J.Clin.Microbiol.42,5947–5949(2004).

23.Elsayed,S.&Zhang,K.Bacteremia caused by Clostridium symbiosum.J.Clin.

Microbiol.42,4390–4392(2004).

24.McClean,K.L.,Sheehan,G.J.&Harding,G.K.Intraabdominal infection:a review.

Clin.Inf.Dis.19,100–116(1994).

25.Brook,I.Clostridial infection in children.J.Med.Microbiol.42,78–82(1995).

26.Greenblum,S.,Turnbaugh,P.J.&Borenstein,E.Metagenomic systems biology of

the human gut microbiome reveals topological shifts associated with obesity and inflammatory bowel disease.Proc.Natl https://www.wendangku.net/doc/389971343.html,A109,594–599(2012). 27.McArdle,B.H.&Anderson,M.J.Fitting multivariate models to community data:a

comment on distance-based redundancy analysis.Ecology82,290–297(2001).

28.Yang,B.et al.Unsupervised binning of environmental genomic fragments based

on an error robust selection of l-mers.BMC Bioinformatics11(suppl.2),S5(2010).

29.Krause,L.et al.Phylogenetic classification of short environmental DNA fragments.

Nucleic Acids Res.36,2230–2239(2008).

30.Wang,T.et al.Structural segregation of gut microbiota between colorectal cancer

patients and healthy volunteers.ISME J.6,320–329(2012).

31.Biagi,E.et al.Through ageing,and beyond:gut microbiota and inflammatory

status in seniors and centenarians.PLoS ONE5,e10667(2010).

32.Kashyap,P.&Farrugia,G.Oxidative stress:key player in gastrointestinal

complications of diabetes.Neurogastroenterol.Motil.23,111–114(2011).

33.Lyssenko,V.et al.Clinical risk factors,DNA variants,and the development of type2

diabetes.N.Engl.J.Med.359,2220–2232(2008).

34.Godon,J.J.,Zumstein,E.,Dabert,P.,Habouzit,F.&Moletta,R.Molecular microbial

diversity of an anaerobic digestor as determined by small-subunit rDNA sequence analysis.Appl.Environ.Microbiol.63,2802–2813(1997).

35.Li,S.et.al.Type2diabetes gut metagenome(microbiome)data from368Chinese

samples.GigaScience https://www.wendangku.net/doc/389971343.html,/10.5524/100036(2012).

36.Wu,G.D.et al.Linking long-term dietary patterns with gut microbial enterotypes.

Science334,105–108(2011).

Supplementary Information is available in the online version of the paper. Acknowledgements We thank L.Goodman for editing the manuscript and providing comments.This research was supported by the Ministry of Science and Technology of China,863program(2012AA02A201),the National Natural Science Foundation of China(30890032,30725008,30811130531,31161130357),the Shenzhen Municipal Government of China(ZYC200903240080A,BGI20100001,

CXB201108250096A,CXB201108250098A),the Danish Strategic Research Council grant(2106-07-0021),the Ole R?mer grant from Danish Natural Science Research Council,the Solexa project(272-07-0196),and the European Commission FP7grant HEALTH-F4-2007-201052.The Lundbeck Foundation Centre for Applied Medical Genomics in Personalised Disease Prediction,Prevention and Care(LuCamp, https://www.wendangku.net/doc/389971343.html,).The Novo Nordisk Foundation Center for Basic Metabolic Research is an independent Research Center at the University of Copenhagen partially funded by an unrestricted donation from the Novo Nordisk Foundation(http://

www.metabol.ku.dk).We are also indebted to many additional faculty and staff of BGI-Shenzhen who contributed to this work.

Author Contributions The project idea was conceived and the project was designed by Ju.W.,K.K.,O.P.,R.N.and S.D.E.;J.Q.,Y.L.,Sh.L.and Ju.W.managed the project.F.Z.,Z.C., R.X.,Su.L.,L.H.,D.L.,P.W.,Y.D.,X.S.,Z.L.,A.T.,S.Z.,M.W.,Q.F.and T.H.performed sample collection and clinical study.Wen.Z.,M.G.,J.Y.,Y.Z.and W.X.performed DNA experiments.Ju.W.,K.K.,O.P.,R.N.,S.D.E.,J.Q.,Y.L.,Sh.L.and J.Z.designed the analysis. J.Q.,Y.L.,Sh.L.,J.Z.,Su.L.,Y.G.,Y.P.,D.S.,X.L.,W.C.,D.Z.,Y.Q.,M.Z.,Z.Z.,Z.J.,G.S.,J.L.,J.R., S.O.,H.C.and W.W.performed the data analysis.J.Q.,Sh.L.,J.Z.,Y.G.,Y.P.,M.A.,E.L.,P.R., N.P.and J.-M.B.worked on metagenomic linkage group method.J.Q.,D.S.,Su.L.,Y.Q., J.R.,G.F.and S.O.did the functional annotation analyses.J.Q.,Sh.L.,D.S.,J.Z.,Y.P.and Y.L.wrote the paper.Ju.W.,O.P.,K.K.,R.N.,S.D.E.,Ji.W.,H.Y.,So.L.,Wei.Z.and R.Y.revised the paper.

Author Information The raw Illumina read data of all368samples has been deposited in the NCBI Sequence Read Archive under accession numbers SRA045646and SRA050230.The assembly data,updated metagenome gene catalogue,annotation information,and MGLs are published in the GigaScience database,Giga DB35.Reprints and permissions information is available at https://www.wendangku.net/doc/389971343.html,/reprints.The authors declare no competing financial interests.Readers are welcome to comment on the online version of the paper.Correspondence and requests for materials should be addressed to Ju.W.(wangj@https://www.wendangku.net/doc/389971343.html,).

RESEARCH ARTICLE

6|N A T U R E|V O L000|00M O N T H2012

相关文档