Monday, March 30, 2015

Biological Diversity

In recent conversations I've had with biologists at a plant breeding symposium at UMN and other local events, I've come across people using the term "diversity" in rather distinctively different ways.



A researcher was described how low diversity soybeans were, as an argument for the fast-neutron mutagenesis project he was working on to develop new useful diversity. I didn't believe soybeans to be a low diversity crop, as there are so many wild and weedy forms available in the center of diversity for the species in China. He then made some comment about the limited diversity brought to the United States from Asia...  which made me wonder why he would limit himself to what is locally available, when strains from the whole world are available with some effort.

The closing keynote speaker was later discussing his 34 year career of soybean breeding and started off by describing the great diversity available in soybeans for breeders to work with. I then realized that the first researcher and I had been using different measures of diversity. He was using "diversity" to mean what was commonly available locally (definition #1 below), while I (and the keynote speaker) were using "diversity" to mean the range of genotypes available worldwide (definition #2 below).
Population genotype structures:
Local: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABB
World: AAAAAAAAAAAAAAAAAAAAAAAAAAAAABBCDEFGHIJKLMN

1. Range of genotypes locally available. The population approximates just genotype A, so under this definition the crop species has a very low diversity.
2. Range of genotypes in a species. The population includes 14 distinct genotypes (ABCDEFGHIJKLMN), so under this definition the crop species has a very high diversity.


At another event, a speaker was talking about how some regions of the genome for his study plant species had very low diversity and that this indicated recent introgression from a related species because the rest of the genome had a very high diversity.

I commented that I would have to look into the math behind it to better grasp what he was talking about.  He shrugged off the need for this and said the region was determined as having low diversity using a hidden-markov-model (HMM) approach.

I pondered on his statement. The HMM approach is used to identify transitions between states as you move along the axis of a dataset. In the case of the speaker's research, the HMM would calculate the most probably coordinates in the genome for transitions from high diversity to low diversity and back. Unfortunately, this still doesn't explain what he meant by "diversity". As the speaker was talking about a region of the genome having lower diversity than average for the species(individual?), there are two ways I can interpret his use of "diversity".
3. The amount of heterozygosity within an individual. A highly inbred individual will have a very low "diversity", while a highly outbred hybrid individual will have a very high "diversity".
4. The variation in haplotypes across a population. Haplotypes are distinct coordinately traveling regions of genetic information. The more haplotypes in a region, the more diverse the population is. A You would need to sequence a relatively large number of individuals to get a glimpse at the sort of data this analysis would require.
I have the feeling he was using "diversity" to mean a change in some measure across the genome of an individual (definition #3 above), rather than a measure of the population of the species (definition #4 above).

If he meant "diversity" like definition #3 above, then I would interpret a region of the genome with much lower level of heterozygosity to indicate a recent loss-of-heterozygosity (LOH) event, such as a damaged chromosome region being repaired by replacement with sequence from the intact other homolog to the chromosome. If he meant "diversity" like definition #4 above, then I would interpret such a region to indicate a recent selective sweep. In neither case does the data suggest to me there has been an introgression from an unspecified near relative with a higher propensity to self.

I really wish I had been able to get more clarity from the speaker about what he meant. I also wish he hadn't dismissed my interest in his project so quickly.



The meaning of even commonly used words can drift between different groups of speakers. In science it is very important to be clear in what you mean by the terms you use, even for the words that don't seem to be jargon.


References:
  1. HMM: en.wikipedia.org/wiki/Hidden_Markov_model
  2. Haplotypes: en.wikipedia.org/wiki/Haplotype