Hi all, hope you’re enjoying the holiday break. I’m back with news of a new plant genome publication!
Today’s plant is the woodland strawberry (Fragaria vesca). Now these aren’t the strawberries you probably see at your local grocery store, those are garden strawberries (Fragaria x ananassa). Woodland strawberries were the predominant strawberries grown throughout europe until around 250 years ago when they were displaced by the new garden strawberries — created when a strawberry species brought from north america crossed with another species from chile when they were grown next to each other in france. The new hybrid species bore larger fruit than the woodland strawberry.
Sequencing the genome of the garden strawberry directly would be a real mess, as the genome of that species is made up of four closely related genome-copies*. With modern DNA-sequencing technology, generating the raw sequence data that makes up a genome is — relatively — cheap and easy, but afterwards you are left with a lot of small pieces of DNA sequence, and putting those pieces together (like putting together a puzzle with millions of pieces) remains challenging. Mix together pieces from four closely related puzzles together with no way to tell them apart and the project becomes even more challenging.
Fortunately the woodland strawberry side-steps that problem, being a normal diploid plant without any of the whole genome duplications that would make sequencing garden strawberries such a terrible mess. It also has a pleasingly small genome, with a genome of 206 million base pairs spread over seven chromosomes, making it only slightly larger than the genome of the first plant to be sequenced (Arabidopsis 157 million base pairs and five chromosomes). Small genomes are easier to put together, with less total pieces to go around.
The research consortium that sequenced and assembled the strawberry genome, first assembled overlapping pieces of sequenced DNA into larger pieces called contigs and then using genetic map data to line those contigs up into seven pseudomolecules, each of which represents a whole strawberry chromosome. The strawberry genome itself wasn’t released prior to the publication of the paper, so I haven’t had a chance to look at it myself, but both the fact that they’ve been able to assemble all the way to the chromosome level, and that they developed and used genetic map data argue for a well done assembly.
Speaking of assembly, here are all the vital genome stats that I normally would have to hunt around for after reading a “new genome sequenced!” story in the popular press (some of these I’ve already mentioned above):
- Strawberries have a haploid number of 7, and a genome size of 206 MB
- The average base pair in the strawberry genome was sequenced 39 times using second generation technology (a label that includes Illumina, 454, and SOLiD sequencers, in this case a mixture of all three technologies were employed)
- 34,809 predicted genes were identified across the strawberry genome.
- The authors found no evidence of the whole genome duplications found in other rosids (I’m assuming this means the most recent whole genome duplication in the ancestors of strawberries was the pre-rosid hexaploidy.)
- The paper describing the genome will shortly be available from Nature Genetics. The title is “The genome of the woodland strawberry (Fragaria vesca)” and the last name of the first author is Shulaev. UPDATE: Here’s the link to the genome paper.
Aside from the enjoyment I always feel when a new genome goes live, I’m particularly happy to see the strawberry genome come out for two reasons.
The first is that there was no “strawberry genome” grant. Funding for sequencing the genome came from a number of sources. I take this as a sign that in addition to the rapidly declining cost of sequencing itself, the cost and difficulty of assembling and annotating the genome a new plant species are also continuing to decline at a rapid pace.
The second reason is that I once before announced the sequencing of the strawberry genome on this site. It was almost a year ago, after a reporter misunderstood a presentation at PAG and posted a “new genome sequenced” story online that was rapidly picked up by a number of websites including my own. It was a bad break for the folks working hard to sequence, assemble, and annotate the real strawberry genome, and I’m very glad to see them get the moment in the spotlight they so richly deserve. The people who sequence genomes make the work of so many other researchers, including myself, possible.
Links to other coverage (updated as I find them):
- Scientific American
- EurakAlert (focusing on the genome assembly at Oregon State)
- Nature Blog (focuses more on the chocolate genome that was published at the same time in Nature Genetics, also I’m not sure two genomes qualifies as a “Smorgasbord”)
- MyGenomix (in Italian)
- The story behind how the strawberry genome came to be published (written by one of the authors)
*Either the result of duplicated copies of a single genome that have since evolved independently (autotetraploidy), or hybrids that merged the genomes of closely related strawberry species together in a single plant (allotetraploidy).