I’m going to stand on a soapbox for a second and talk about life. In particular, I’m going to talk about evolution. In a lot of common parlance, and a lot of media, we refer to evolution as something like a ladder — think the terrible image I used as my featured image. Humans are usually at the top.
Even when we’re being really careful about it, we use a metaphor like a tree — all these different branches reaching towards the sun.
My soapbox is thus: If we were striving for accuracy, we might want to use the analogy of, say, a slime mold.
The Mold of Life
Let’s examine the common tree-structure of many phylogenetic studies. It looks like this:
This is a pretty good way of imagining things, but it does have one crucial misstep, and that’s that any node can be flipped around and create what is the same exact tree but a very different appearance, depending on which groups are in which places. So here’s the same tree, rotated slightly:
Note that the data hasn’t changed, but the appearance has.
For that reason, it’s actually probably better to use something more like Darwin’s famous sketch:
The reason is that in this one, if you rotate any individual node, it doesn’t change the appearance of the entire tree nearly as much — because the things that are next to each other are next to each other due to relatedness, not happenstance.
But that doesn’t look much like a tree, because trees grow in one direction: up. They don’t shoot branches towards the ground, like that one does. So if it’s a tree, it’s this kind of crazy tree with branches sticking out in all kinds of directions. Maybe it’s more some kind of a slime mold?
The Dangers of Imagery
Once again, I’m tempted to fall back on “this is a minor issue and it doesn’t have much of an effect”. But the fact is that the common perception of evolution is far more ladder-like than mold-like, or even tree-like. We talk about evolution as something that can be climbed, and something that we are particularly good at climbing. People are shocked to learn that humans are still evolving, as if we should have reached the end of that road already, instead of it being OBVIOUSLY something that is ongoing at every step. What’s more, we talk about things living today as if they are “more” or “less” evolved — which doesn’t make a huge amount of sense when you think about it, since if everything came from a common ancestor, then de facto everything has been evolving for the same amount of time. This feeds into arguments like “if humans evolved from monkeys, why are there still monkeys?” and terms like “living fossils“.
And I’m not just talking about laypeople making these mistakes. A recent paper posted to Arxiv and picked up in the MIT tech review is based on just this sort of thinking.
The paper uses Moore’s law to argue that life must have an extraterrestrial origin — that the billions of years since the Earth was born was not sufficient to explain our big genomes, based on their perceived rate of growth.
This is the only figure they use to bolster their argument:
Now, as a biologist, it was my immediate reaction to find holes in their data set. Because why do all prokaryotes get one dot to average all their diversity, whereas worms (presumably not all worms but rather C elegans), a much smaller clade, get a dot all to themselves? Or mammals — which genomes are represented there? Mice? Rats? Humans? Chimps? Why aren’t Mammals, Fish, and Worms — all Eukaryotes — included in the Eukaryote dot? What about plants, which have the biggest genomes on the planet (where’s a tree dot)? In the wealth of sequenced genomes (a huge number when you start counting up the number of prokaryotic genomes we’ve sequenced — these are smaller and therefore require less effort — but UCSC’s genome browswer catalogues 47 unique mammalian genomes as well, from human to tree shrew to pika), why did the study authors choose just those five dots?
The answer is because using specific species — forcing specificity in this case — tanks their entire argument.
We don’t have any genomes from individuals that weren’t alive right at the end of their chart — we can’t get genomes from anything that actually was alive at the time that mammals arose, even. The very longest DNA could plausibly be preserved is about a million years, not long enough to make a DENT in that chart of theirs.
What that means is that they’re using a specific, currently-extant, bacterium as their representative for “prokaryotes” — as their proxy for the prokaryote-like thing that presumably emerged approximately 3.5 billion years ago. Or maybe an average of many different prokaryotes (in which case, WHERE ARE YOUR ERROR BARS?).
Is that appropriate? Given the generation time of most prokaryotes (on the order of an hour) how many generations have passed in the past 3.5 billion years? (I am not going to do that calculation. It is a very large number.) Is there any reason to expect that the genome of currently extant bacteria shares anything in common with the early prokaryotes, except perhaps its ability to drive self-replication?
It seems to me that, instead, there is every reason to expect that the bacteria which exist today have had just as much time to grow their genomes as everything else alive today — and that any cap on genome size we see exists more as a selective pressure for small, relatively easy-to-replicate, genomes (which would facilitate a fast dividing time: mammalian cells take about a day to divide).
Genome Archaeology
Genome science can tell us a lot about life on earth, its interrelatedness and the processes that drove its development. We can, to an extent, extrapolate genomes of common ancestors — for instance, for every difference between humans and chimpanzees, we can look to gorillas, orangutans, and other closely related species to suggest which state is “ancestral”. It gets more difficult the more time has passed in between speciation and the present day, and it is almost always necessary to use an outgroup — in this case, other great apes as a contrast to humans and chimps — to guide the analysis.
So we can’t directly use that method to figure out what the oldest genomes looked like. Almost all we can do is ask “what does every life form on earth share?” The answer is, it seems, not much — apart from something that will stitch a few nucleotides together.
Does that mean that Moore’s law has no place in biology? Not necessarily. But we’re not going to get a clear doubling pattern when we can only see, at best, less than 0.03% of the chart.
Featured image is by José-manuel Benitos (Own work) [GFDL (http://www.gnu.org/copyleft/fdl.html)], via Wikimedia Commons
An excellent article, but I’m going to quibble about one bit of it.
Your tree with coloured clades gets messy on rotation because the coloured clades are not all monophyletic. Changing to a picture like Darwin’s doesn’t help this: if his B were your blue clade, and C and D your green clade, you have the exact same problem.
Here’s another laughably wrong phylogenetics paper:
Siddall, M. E. (1998), Success of Parsimony in the Four-Taxon Case: Long-Branch Repulsion by Likelihood in the Farris Zone. Cladistics, 14: 209–220. doi: 10.1111/j.1096-0031.1998.tb00334.x
He uses a method that is known to be biased, analyses only (mostly synthetic) data which conform to the bias, and claims the method is good because it gets the correct results more reliably than an unbiased method. It is like a policeman who believes only blue eyed people commit crimes, and we evaluate the policeman’s effectiveness by looking at their crime-solving performance only for crimes which were, in fact, committed by a blue eyed person.
Thank you for your kind words!
I absolutely know that the problem is properly identifying which groups are monophyletic and which groups aren’t. And I think that the real advantage of Darwin’s tree is that it makes it easier to see what’s monophyletic and what isn’t, because it makes the branching structure more obvious. (Case in point: B, C, and D were labelled separately, not lumped together.) It draws more attention to the pattern of the branches, rather than the order of the endpoints. Hope that makes sense!
The extent to which a HUGE portion of phylogenetics (especially that before the genome era kicked off into full steam) was based on “well that looks about right to me” is… yeah. Yeah.