(Updated 21-06-2011] Enough information is now emerging from the frantic work of the DNA sequencing teams and the crowdsourced intense detective work by bioinformatics experts (all mentioned in several previous posts at this site) to assemble a broad picture of how the German outbreak germ has evolved in recent years in terms of changes to its overall chromosome structure.
The bottom line is this: the strain is roughly 75% conservative genes and 25% radical genes. It’s the newly arrived radicals that do the damage.
Kat Holt has presented in very visual terms her analysis of all the DNA decoding made public by several life-science workers and various corporations in Germany, USA, China, France and Britain. She has just made this available at backpathgenomics blog. Her latest work summarises DNA decoding data obtained on four different bacterial strains. It is similar to the picture sketched out by the BGI sequencing group when they released their fully analysed DNA data recently — but much easier for most people to understand because of the colourful graphics. One of her graphics is displayed above, but there are more detailed larger scale pixs of the individual chromosomes at her blog.
The set of compared germs include two strains analysed in Germany, one strain analysed in Britain, and one done by the American-Chinese consortium BGI.
This heavy duty computerised detective work on DNA reveals that the main germ chromosome is huge at 5.2 million base pairs. It possesses about 1 million base pairs of extra information not present in non-pathogenic non-dangerous strains such as the laboratory strain E. coli K-12. This shows the huge amount of horizontal gene movement going on in E. coli. This is not a surprise. The first pathogenic E. coli sequence (by Perna and colleagues in 2001) carried similar extra genetic baggage. Only about 80% of the germ’s main chromosome DNA is common heritage for all the divergent strains, but that DNA which is retained in all strains doesn’t change much in its coding information during strain evolution.
There are also three smaller additional mini-chromosomes to the main chromosome. These are completely lacking from most other E. coli strains. Two important ones of these are (i) a plasmid carrying a broad spectrum antibiotic inactivating enzyme (called a beta-lactamase) and (ii) another plasmid that is somewhat similar to a plasmid found in an ancestral strain of the enteroaggregative E. coli bacteria. This ancestral is called plasmid AA, usually abbreviated to pAA.
From the comparison of different strains done by Kat Holt it seems the small pAA plasmid has undergone extensive recent genetic change, and it carries a novel gene called aag — most likely involved in aggregated microbial cell attachment to the gut wall — when its plasmid variant is present in the German outbreak strains. The broad spectrum antibiotic resistance plasmid has a known means of of mini-chromosome replication and regulation shared by many other plasmids. This method of replication/regulation is referred to as IncI so we will refer to this plasmid here as the IncI plasmid. This plasmid does not seem to have undergone much recent genetic change.
The remaining plasmid is a small selfish DNA that seems to possess only the means to replicate itself inside the bacterial cells. Possibly this selfish plasmid does no harm to humans. Selfishness is not a mortal sin.
In addition to these plasmids, there are two extra identifiable major additions of new mobile DNA in the main chromosome.
These extra genes are DNA blocks from a bacterial virus that is similar to one (called VT2) carries the Shiga toxin gene in EHEC bacteria. They appear in red in the diagram above from Kat Holt. Their presence is not a surprise either. These shigatoxin virus DNAs come and go in many different pathogenic E. coli strains.
There is extremely high similarity of the great bulk of the main E. coli chromosome between all analysed genomes including the more divergent African derived strain. These retained genes can be called the clonal framework or alternatively the housekeeping chromosomal backbone. Thus all E. coli strains have the same backbone. This is well established from many other investigations on E. coli genetic structure.
But because of the extra presence the Shiga toxin related block of DNA represented by the virus insertions, the outbreak strain is starting to be called Shiga toxin producing enteroaggregative E. coli or STpEAEC. This acronym recognises that the stain is a EAEC type E. coli with ability to produce Shiga toxin.
Evolution of the germ chromosomes
One more thing can be said from this extra information summarised by Kat. The four outbreak strains that have been completely decoded are very similar to one another in their main chromosomes. They have not had a chance to undergo much evolution. But they are consistently different from the African EAEC strain Ec55989 at about seven positions at least in addition to those that have been bought in by the shiga toxin phage. These gaps in the similarity maps shown in the diagram indicate insertion of a new block of block of DNA in the outbreak strains the has no corresponding DNA in the African isolate. This indicated that the outbreak isolates have a common source that has has a significant time to evolve from the African isolate called Ec55989.
Thanks to Githiub wiki we can any follow further progress on characterizing these various DNA additions.[Updated section 21-06-2011
Peter Slickers has set up a wiki there to collect comments on regions of difference.
Quoting from that wiki:
The term “Regions of difference” (ROD) referres to loci within a bacterial genome which are found only in some members of a species. These loci are typically prophages, transposons, plasmids integrated into the chromosome, and remnants of all these mobile elements. If their evolutionary origin cannot be deduced, they are just called genomic islands. Chromosomal encoded virulence factors and antibiotic resistence determinats are most often haboured in ROD. In an excellent publication ChaudhuriRR-SebaihiaM-2010 have used the concept of ROD to analyze the full genome of the prototypical enteroaggregative Escherichia coli strain 042.
This page is intended to create an inventory of all ROD in the genome of the 2011 O104:H4 outbreak strain in a collaborative fashion in the style of Wikipedia. Please feel free to contribute to this page and its subpages in order to get a complete and throughout analysis and annotation of all ROD.
|id||size / nt||integration site||type||name||payload||note|
|RD0002||86630||ycdU and tRNAserX||O island||Tellurite resistance- and adherence-conferring island||iha, mchBCF, terCDE, yeeVW||adhesine, microcin|
|RD0004||48778||tRNAselC and setC||composite island||merDA, tetRA, flu, yeeVW|
- it is amazing to see how all the prominent virulence and resistence factors cluster together at only a few sites.
- the yeeVW gene cluster is found on the chromosome at 3 locations. Is this an artefact of assembling or an example of elevated gene dosage?
- tRNA genes are hot-spots for integration of mobile elements in O104:H4 (which is long known)
The Pundit notes that “natural” mechanisms for insertion of 4/4 of these inserts are suggested by the DNA sequence data itself. These are phage and tRNA sites which are established hot-spots for new gene insertion.
End of updated section from Peter Slickers 21-06-2011]
The historic 2001 strain of the germ that was found in Germany.
It will be fascinating when the University of Muenster-Life sciences Corporation joint collaboration releases their genome sequence information about the historical 2001 isolate of pathogenic EAEC E. coli that was present in Germany some years before this outbreak. Comparison of its decoded genome with that of the outbreak strains will open up many leads to understanding why the germ is so dangerous and perhaps even provide clues to treating infections.
Because extremely prudent use of genetic sequencing to characterise that strain of E. coli genome scientists were able to work out that it is very definitely closely related to the current outbreak strain. I gather Mike the mad scientist was one of the first to realise this. His discovery nicely illustrates how genetic sequence information enables communication about outbreak causes across the Internet through quick, easy and certain cross-referencing between different disease events. [His blog might also suggest this strain could even date back to Jan-91.]
MLST typing is essential to any serious Public Health work.
This is one of the reasons why the technique of gene sequencing of outbreak strains must become a routine world-wide part public health work to manage and eliminate foodborne disease outbreaks. The technique of using gene sequences to fingerprint strains of germs is referred to as MLST and was cleverly developed more than 13 years ago by a consortium of European academics including Dominique Caugant, Ian Feavers, Martin Maiden, Brian Spratt, and Mark Achtman and others – who were involved in studying the natural evolution of germs. It will be nice to see a public tribute to their work when the general public realise the importance of their leadership and foresight.
Terrorism and conspiracy theories? Huh!
It may be sobering for those who worry about the possible deliberate introduction of this germ into Germany, or the possible import of contamination on seeds imported from outside the country, to realise that a very similar germ has been around the country for a considerable period, maybe dating back to 1991, and at least 2001. It is really important that people understand that the germ causing the outbreak has a reservoir for further transmission in the human alimentary canal and is transmitted by human contact, for example unwashed hands of food handlers. It also be sobering to contemplate any other information that emerges about the widespread distribution of this germ already in the environment in locations widely disconnected from the implicated sprout farm in northern Germany (such as news like “Deadly E. coli found in a stream near Frankfurt“).