The Biologist Is In

Friday, January 20, 2023

The Color of Beans 3

Diagram illustrating the flavone pigment pathway.

Starting section from the top: phenylalanine to cinnaminate to 4-coumerate to p-coumaroyl-CoA (+ 3x malonyl-CoA) to naringen chalcone to naringen. Naringen goes left and right to eriodictyol and pentahydroxy flavone. Eriodictyol goes left to flavan-4-ols and then to phlobaphenes (highlighted in red). Eriodictyol goes right to tricetin. Naringen goes right to apigentin (highlighte light brown). Pentahydroxy falvanone goes right to luteolin (highlighted pale yellow). Naringenin goes down to dihydrokaempferol. Eriodictyol and dihydrokaempferol go left to dihydroquercetin. Pentahydroxy flavanone and dihydrokaempferol go right to dihydromyricetin. Dihydroquercetin goes right to quercetin (highlighted in yellow). Dihydrokaempferol goes right to kaempferol (highlighted in yellow) and then down to atragalin (highlighted in yellow). Dihydromyricetin goes right to myricetin (highlighted light brown). Dihydroquercetin goes down to leuocyanidin. Dihydrokaempferol goes down to leucopelargonidin. Dihydromyricetin goes down to leucodelphinidin. Leucocyanidin goes down to cyanidin and then cyanin (both highlighted red). Leucopelargonidin goes down to pelargonidin and pelargonin (both highlighted orange). Leucodelphinidin goes down to delphinidin and delphinin (both highlighted blue). Leucocyanidin, leucopelargonidin, and leucodelphinidin go left to 2,3-trans-flaven-3-ols (catechin) (highlighted in in a gradient from white to brown). Cyanidin, pelargonidin, and delphinidin go left to 2,3-cis-flaven-3-ols (epecatechin) (highlighted in a gradient from white to brown). Catechin and epicatechin go down to proanthocyanidins (highlighted in a gradient from white to brown). Luteolin, apigenin, and tricetin have a group label 'flavones'. Myrictein, kaempferol, and quercetin have a group label 'flavonols'.

The figure has enzyme labels at most steps.

In the top starting section: PAL, C4H, 4CL, CHS, and CHI. Naringenin to eriodictyol is F3'H. Naringenin to pentahydroxy flavanone is F3'5'H. Eriodictyol, naringenin, and pentahydroxy flavanone to tricetin, apigenin, and luteolin are FNS. Eriodictyol, naringenin, and pentahydroxy flavanone to to dihydroquercetin, dihydrokaempferol, and dihydromyrecetin are F3'H. Eriodictyol to flavan-4-ols is DFR. Dihydrokaempferol to dihydroquercetin is F3'H. Dihydrokaemperol to dihydromyricetin is F3'5'H. Dihydroquercetin, dihydrokaempferol, and dihydromyricetin to quercetin, kaempferol, and myricetin are FLS. Dihydroquercetin, dihydrokaempferol, and dihydromyricetin to leucocyanidin, leucopelargonidin, and leucodelphindin are DFR. Leucocyanidin, leucopelargonidin, and leucodelphindin to cyanidin, pelargonidin, and delphinidin are ANS. Cyanidin, pelargonidin, and delphinidin to cyanin, pelargonin, and delphinin are GT. Leucocyanidin, leucopelargonidin, and leucodelphindin to catechin are LAR. Cyanidin, pelargonidin, and delphinidin to epicatechin are ANR.

A great deal of research has gone into our understanding of how colors are made in plants. I've previously written about the carotenoid pigment pathways in tomatoes [1] and peppers [2], condensing a great deal of published literature in the process. Until recently, I didn't have a solid grasp of the pathway plants use to make a second major category of pigments, the flavonoid pigments. These pigments are responsible for many of the red/purple/blue colors you see in flowers and other plant parts, but I've been learning about them through my focus on the various colors of dry beans.

The carotenoid pigment pathway I discussed in those earlier articles was relatively simple. A single main pathway, with a couple branches. The anthocyanin pathway figure at above-right is a bit more complicated. The figure is a consensus pathway, built from research in a few different species. There are definitely more pieces that could be added, but this amount is a good start. The colored highlights are intended to represent the colors of those chemicals. The lower red, orange, and blue pigments are anthocyanins, the pigments responsible for the color of many flowers (and other plant parts). The white-to-brown gradient highlight is for the proanthocyanidins. They oxidize over time, changing from clear to brown. The red pigment at upper-left is found in some trees, but I wasn't able to find too much information about them. The yellow pigments at right are found in various plants and plant parts, but they're not generally the source for bright yellows in flowers. (The enzyme FGT leading to astragalin at far right is something I made up, since I couldn't find any research naming the enzyme performing that step.)

Diagram illustrating the flavone pigment pathway as found in common dry beans.

Starting section from the top: phenylalanine to cinnaminate to 4-coumerate to p-coumaroyl-CoA (+ 3x malonyl-CoA) to naringen chalcone to naringen to dihydrokaempferol. Dihydrokaempferol goes left to dihydroquercetin and right to dihydromyricetin. Dihydroquercetin goes right to quercetin (highlighted in yellow). Dihydrokaempferol goes right to kaempferol (highlighted in yellow) and then down to atragalin (highlighted in yellow). Dihydromyricetin goes right to myricetin (highlighted light brown). Dihydroquercetin goes down to leuocyanidin. Dihydromyricetin goes down to leucodelphinidin. Leucocyanidin goes down to cyanidin and then cyanin (both highlighted red). Leucodelphinidin goes down to delphinidin and delphinin (both highlighted blue). Leucocyanidin and leucodelphinidin go left to 2,3-trans-flaven-3-ols (catechin) (highlighted in in a gradient from white to brown). Cyanidin and delphinidin go left to 2,3-cis-flaven-3-ols (epecatechin) (highlighted in a gradient from white to brown). Catechin and epicatechin go down to proanthocyanidins (highlighted in a gradient from white to brown). Myrictein, kaempferol, and quercetin have a group label 'flavonols'.

The figure has enzyme labels at most steps.

In the top starting section: PAL, C4H, 4CL, CHS, CHI, and F3H. Dihydrokaempferol to dihydroquercetin is F3'H. Dihydrokaemperol to dihydromyricetin is F3'5'H. Dihydroquercetin, dihydrokaempferol, and dihydromyricetin to quercetin, kaempferol, and myricetin are FLS. Dihydroquercetin and dihydromyricetin to leucocyanidin and leucodelphindin are DFR. Leucocyanidin and leucodelphindin to cyanidin and delphinidin are ANS. Cyanidin and delphinidin to cyanin and delphinin are GT. Leucocyanidin and leucodelphindin to catechin are LAR. Cyanidin and delphinidin to epicatechin are ANR.

At left is a heavily reduced version of the first figure, trimmed to an approximation of what seems to be going on in common beans (Phaseolus vulgaris). Combinations of the yellow, red, blue, and brown pigments seem to be responsible for most of the variations in color that we see in dry beans. I've seen some evidence for a brown pigment derived from the yellow ones here, but I haven't found any research clarifying the chemistry involved. There's the possibility of some green pigments made up from a different metabolic pathway, but I haven't found sufficient research about them to know if they're represented in beans.
Various of the trimmed compounds are also found in common beans, but they don't seem to be found in significant amounts. The orange pelargonidin pigments have been reported in some bean varieties, but I've never come across a common bean that has a color dominated by orange pigment. There might be orange examples from P. coccineus, the scarlet runner bean, but I'm still investigating this.

The colors of beans drew attention far before we had any understanding of the physiology of the pigments involved. Much of the early published research into bean colors sought to identify different genes responsible for the traits. Eventually the gene labels assigned by different authors got correlated with each other and the set of labels for important color genes became standardized. Even more recently, there have been efforts to identify the molecular mechanism behind the different classical gene labels. Some gene labels are now associated with specific enzymes or other genes important in the flavonoid pathway.

R [red] : Enzyme F3'H, or more likely a transcription factor driving F3'H in the seed coat. F3'H is important for stress response in plant tissues and so is unlikely to be absent even when the enzyme isn't active in the pathway.
V [violet] : Enzyme F3'5'H. This one isn't as important as F3'H and is entirely absent in many plants.
J : Pretty solidly identified as the enzyme DFR.
P : A transcription factor driving expression of several genes important in the flavonoid pathway. In the figures above, the regulated enzyme targets are drawn in blue.
B : A transcription factor driving expression of chalcone synthase (CHS) and/or chalcone isomerase (CHI).
G : A transcription factor leading to increased levels of astragalin, perhaps by driving expression of FLS and/or FGT. Likely has other impacts, but I haven't found sufficient research.

Tracking down which gene was associated with which step in the pathway was tricky. Many of the older papers had models for what a given gene did, but then those models were overturned by more recent research. The paper identifying V as being the gene for the enzyme F3'5'H was only published in March 2022. Finding that paper got me interesting in trying to see how many of the others could also be associated with a specific part of the pathway. The other gene notes above came from the scattered papers linked in the references section, though few were specifically the point of the papers.

My goal was to better understand what the gene labels were doing, so I could better figure out what genes were likely to be involved in the beans I was growing and crossing. I'll write more on that another time.

References

Related blog posts:

https://the-biologist-is-in.blogspot.com/2014/04/the-color-of-tomatoes.html

Carotenoid pigments in tomatoes.

https://the-biologist-is-in.blogspot.com/2015/11/the-color-of-peppers-2.html

Carotenoid pigments in peppers.

https://the-biologist-is-in.blogspot.com/2018/10/the-color-of-beans-1.html

Introduction of my #BlueBeanProject.

https://the-biologist-is-in.blogspot.com/2022/12/the-color-of-beans-2.html

Status update of my #BlueBeanProject.

https://the-biologist-is-in.blogspot.com/2019/11/biology-of-blue.html

Discussions around the chemistry of blue in biology.

Papers related to anthocyanin pathway in bean, cotton, etc:

Friday, January 13, 2023

The Color of Pineapple

Clear glas bowl filled with chunks of cut pink pineapple.

The pineapple we all grew up with is a bright yellow color. The pineapples of today isn't necessarily the same shade. Del Monte is now selling a variety with a distinctly pink flesh, called PinkGlow^TM or Rosé pineapple. This is a bio-engineered variety, first conceptualized way back in 2005. A patent for the variety was issued in 2012 and the US FDA deregulated the variety in 2016, deciding the variety was essentially the same as other varieties with regards to safety and regulatory concerns.
The variety started as an extra sweet variety grown in Hawaii called MD2. This pink version shares the extra sweet and low acid traits of that original variety. I think it is a worthwhile product, even though (in my limited experience) most people's reactions to seeing the pink cut pieces at left was to think they looked like pieces of meat.

Figure depicting the carotenoid biosynthesis pathway in plants. Starting at top: Acetyl-CoA -> Isopentyl pyrophosphate -> Geranyl pyrophosphate -> Farnesyl pyrophosphate -> Geranylgeranyl pyrophosphate -> Phytoene. An arrow also goes from Geranylgeranyl pyrophosphate to Phytol -> Chlorophyll ->->-> Un-colored metabolites. From Phytoene -> Phytofluene -> Ksi-carotene -> Neurosporene -> Prolycopene -> Lycopene -> Delta-carotene -> Alpha-carotene -> Lutein. A second branch from Lycopene -> Gamma-carotene -> Beta-carotene -> Beta-cryptoxanthin -> Zeaxanthin -> Antheraxanthin -> Violaxanthin -> Xanthoxin -> Abscisic Acid aldehyde -> Abscisic acid. A side brance from Gamma-carotene -> Torulene. A side brance from Violaxanthin -> Neoxanthin -> Xanthoxin (already in the pathway described).

The patent is a bit of a pain to read, as they generally are. The "DETAILED DESCRIPTION OF THE INVENTION" section is where they describe the details of the alterations they made.

A figure representing part of the carotenoid pathway described in the previous image. A larger arrow goes from Geranylgeranyl pyrophosphate to Phytoene. A large X covers each arrow leading away from Lycopene.

At left is a sketch of the carotenoid pathway in pineapples. There is limited published information about the specifics of the pathway in pineapple, so this diagram was constructed from more general research in tomatoes, peppers, and other species. At right is a closeup of the pathway altered to illustrate the changes that were made in the pink pineapple, as described in the patent.

The first modification was to introduce a second copy of the phytoene synthase gene, driving increased amounts of metabolic energy through the carotenoid pathway. This is represented in the figure by a larger arrow at the top. The added gene was combined with a pineapple fruit flesh specific promotor, so the rest of the plant doesn't have its carotenoid pathway messed around with.

The second modification was to shut down two enzymes, lycopene beta-cyclase and lycopene epsilon-cyclase, normally responsible for converting lycopene into the next steps in the two branches of the carotenoid pathway after lycopene. The consequence of this is all the metabolic energy passing through the pathway is stopped at lycopene. Shutting down these genes was performed by RNA interference (RNAi), also driven by a copy of the same fruit flesh specific promotor. Again, this prevents the modification from interfering with the carotenoid pathway elsewhere in the pineapple plant.

The carotenoid pathway is important for a plant's stress response and other systems. It is likely a pineapple plant would survive more dramatic alterations to the carotenoid pathway that impacted the entire plant, but doing so would throw off the existing balance. The efforts they've taken to limit the pathway tweaks to only happen within the fruit flesh were important to ensure the plants generally are as productive and healthy as the pineapple they started with.

A third modification was atempted, but how the patent is written indicates they're not exactly sure the alteration worked. Commercial pineapple production relies on precision planning. To get a pineapple crop to mature at a specific planned time, the plants are treated with a hormone which induces flowering. In pineapple, the hormone that triggers flowering is the simple gas ethylene. Either ethylene or the similar shaped molecule acetylene is used to induce a crop to start blooming at a specific time. The problem is, pineapple plants will initiate blooming all on their own, when the growers may not want the plants to do so. This is called "natural flowering" and interferes with the plans of the growers.

So, to try and reduce the rate of natural flowering, the third modification was to try and supress the ACC synthase gene important for normal ethylene biosynthesis. They again used RNAi for this, targeted to growing meristems where the gene enzyme activity is important for normal flower induction. I suspect the reason the patent expresses uncertainty about this modification working is because at the time of patent filing, they didn't have enough experience with growing the new pineapple in field conditions to be able to see a reduction in the rate of natural flowering. By now they'll know for sure if it worked.

References:

Marketing piece: https://specialtyproduce.com/produce/Pinkglow_Pineapple_17105.php
Patent: https://patents.google.com/patent/USPP25763
FDA statement: https://www.fda.gov/food/cfsan-constituent-updates/fda-concludes-consultation-pink-flesh-pineapple
Carotenoids in tomatoes: https://the-biologist-is-in.blogspot.com/2014/04/the-color-of-tomatoes.html
Carotenoids in peppers: https://the-biologist-is-in.blogspot.com/2015/11/the-color-of-peppers-2.html
Pineapple flower induction: https://www.echocommunity.org/resources/f0e9cfeb-ba1d-435e-a515-7705ca79b409

Friday, December 30, 2022

The Color of Beans 2

A few years back I wrote a short post to introduced a project I had started to breed up a nicely blue colored dry bean.

https://the-biologist-is-in.blogspot.com/2018/10/the-color-of-beans-1.html

The project as been moving forward nicely since then. This year's crop was very consistently blue in color, the first time I didn't harvest a large fraction of tan/blue seeds as well.

Dry beans in mixed colors. Browns, blues, and dark greys.

The picture at left looks very similar to the one I included in the post linked above, but this photo is from a few days ago. These beans are the extras I had saved from earlier generations, including many from 2018. This tells me the best blue colored seeds are able to maintain their color well in long-term storage.

The other truly blue varieties I have come across all seem to darken towards brown during storage. "San Berdardo Blue" and the rarer "Pragerhof" beans both have a nice blue color at harvest, but that color doesn't last. My blues keeping their color for a few years in storage is a nice improvement.

Over the first several years, I selected the best blue colored seeds from each harvest to plant the following spring. Until this year's harvest, each year I kept finding brown/tan seeds. This tells me the brown color was due to recessive alleles, which means it can be very hard to filter out the brown-seed trait. Any given blue seed could be hiding the recessive brown color allele.

This year I was lucky and the entire harvest had the rich blue color I had been working towards. The recessive allele for brown color could still be hiding among these. I won't be more certain I have finished filtering out that trait for at least a couple more years, but I am hopeful. Because I didn't have to select on color this year, I instead selected for larger seed size and pods (or pod clusters) with more seeds in them.

Right now I am working to figure out how I can distribute this new variety, but it may not happen this year. I have very limited seed stock and any method of selling or distributing them comes with some significant costs.

You can find more about these beans with the tag #BlueBeanProject on various social media systems. I'll also be writing more posts here, so stay tuned.

Eleven pale blue bean seeds, each with a black ring around the hilium.

Five dark blue bean seeds with tan speckles.

I also have a couple new blue lines, unrelated to those above. These samples are F2s from a cross between "Pragerhof" and an unknown black bean.

One blue is darker than my main line and the other is lighter. I don't know for sure what these will become during the several years it will take to stabilize their genetics, but I aim to find out!

References:

https://the-biologist-is-in.blogspot.com/2018/10/the-color-of-beans.html
"San Berdardo Blue" beans: https://store.experimentalfarmnetwork.org/products/nonna-agness-blue-bean
"Pragerhof" beans: https://oroseeds.com/wp-content/uploads/2019/02/download-18.png

No longer offered for sale by OroSeeds, but their photo remains online.

Thursday, May 20, 2021

Viable Interspecific Eggplant Hybrids

The last year has been a mess. I'm fine. My family is fine. Most of my friends are fine. The increased anxiety and stress basically shut off any motivation or ability I had to write posts here. I was still active over on twitter or instagram, as those require less focused thought, but I just couldn't will myself to sit down at a computer and type up anything I felt was worthwhile to post.

I'm now fully vaccinated against covid19, but I know there are many people who still have not been able to access a vaccine. Some in my family. Many in the broader community. Covid cases in my community are dropping, but they're still higher than the peak we had in May of last year. I worry about recent CDC guidance and how people broadly seem to think it means the pandemic is over. It is not. Not here, and not elsewhere.

For now, most people locally seem to still be keeping up distancing and masking practices gained over the last year. As always, the next few weeks will be informative.

Even with the persistent writer's block, I routinely thought about writing something. This post is the first something to come of that. It isn't really the long and information or photo rich posts I like to write, but it is what it is

My plant breeding projects have continued without interruption. My gardens have provided me with useful exercise and amusement.

Most of my plant breeding projects start with hybrids between divergent varieties within one species. The F1 generally stands out from the two parental lines, so Iit is fairly easy to have confidence that the cross took. In the F2 generation, there are almost always useful and unexpected traits which segregates out.

Last year I grew out a F2 population of scarlet eggplant. Every plant was different, but two stood out. One was extra productive and ripened fruit far earlier than any others. The other developed fruit that were white when immature, but ripened to the typical red later. This season I have F3 populations from those two plants.

I still haven't figured out how to like eating eggplant, especially the more bitter flavors of the scarlet eggplant, but I like the plants and will continue to try.

Recently I found some references describing successful hybrids between the scarlet eggplant (Solanum aethiopicum) and more common purple eggplant (S. melongena), with some significant effort in the lab. This got me thinking about what species one could make hybrids with among the eggplant. Any such hybrids would allow for much more diverse F2 populations, with their higher potential for selection towards interesting new traits.

This led to some discussion about primary (1'), secondary (2'), and tertiary (3') germplasm. 1' germplasm includes plants in the same or related species which can cross readily to your subject species. 2' germplasm includes plants which can cross to your subject species with significant reduction in fertility. 3' germplasm is then plants that can cross with your subject only with intensive laboratory operations such as embryo rescue or induced genome duplication.

In the case of eggplants, there has been much more exploration of 2' and 3' germplasm for the common eggplant. The scarlet eggplant is an important crop for many communities, but it has not attracted as much attention in communities with higher levels of biological research investment. As such, the 2' and 3' germplasm lists below for scarlet eggplant are very much incomplete.

Asian Eggplant (Solanum melongena)

primary: S. incanum and S. insanum.
secondary: S. anguivi, S. dasyphyllum, S. lichtensteinii, S. linnaeanum, S. pyracanthos, S. tomentosum, and S. violaceum.
tertiary: S. elaeagnifolium, S. sisymbriifolium, S. torvum, and S. aethiopicum.

Scarlet Eggplant (Solanum aethiopicum)

primary: S. anguivi, S. macrocarpon, and S. dasyphyllum
secondary:
tertiary: S. melongena.

Professional plant breeders pursue traits from related species like these to improve disease resistance, drought resistance, or other traits important to growing large crops.

Independent plant breeders can afford to use traits from related species (among the 1' and 2' germplasm resources at least) to express their creativity towards developing new varieties. Even if you're not sure what to do with them (as I am), they're still lovely plants which might be fun to work with in the garden.

I hope you are and remain well as the pandemic continues.

References

S. melongena germplasm.

https://journals.ashs.org/jashs/view/journals/jashs/141/1/article-p34.xml

S. aethiopicum germplasm

Fertility restoration in S. melongena x S. aethiopicum hybrids.

https://link.springer.com/article/10.1023/B:EUPH.0000003883.39440.6d

Primary, secondary, and tertiary gene pools.

https://www.frontiersin.org/articles/10.3389/fpls.2014.00068/full

Friday, February 21, 2020

Photoshop again.

Online seed vendors vary dramatically from the largely respectable, to the folks selling "peppers" like those below that I found on the Amazon or Ebay marketplaces.

To you, the botanically savvy purchaser, these vendors stand out as clearly fraudulent. But not everyone is botanically savvy. People who don't instantly know these wonderful colored peppers are just photoshopped versions of a red pepper photo are such vendors intended "customers".

Pile of cayenne peppers photo edited to look cyan in color.

Pile of cayenne peppers photo edited to look blue in color.

Pile of cayenne peppers photo edited to look purple in color.

Pile of cayenne peppers photo edited to look magenta in color.

Pile of cayenne peppers; original photo the others here were modified from.

Even among the largely respectable vendors, there are a wide range of philosophical or political stances that may impact your decisions of who to buy from. Does the company support white supremacists? Do they sell patented plant varieties? Do they push pseudo-science in their catalogs?

It can take some digging to be certain you agree with the politics behind any given company. It can take significant effort to bring such considerations into your buying decisions, so I understand if you choose not to do so.

But please, don't buy seeds from vendors selling off-hue peppers, blue strawberries, rainbow roses, rainbow onions, or the many other scams that are out there. If something looks too good to be true, at the very least investigate further. These online vendors rely on people clicking "buy" when seeing something interesting. By the time you've grown up the seeds and realize you were scammed, the time to contest the purchase in the marketplaces the vendors work through will have long since expired.

Friday, February 14, 2020

Tomatillo Breeding (4/n)

The last couple posts have looked at simulations for selection of a single gene, for recessive or dominant alleles. Increasing the number of genes actively under selection results in it taking longer and longer for the population to converge.

Plot titled "Multiple recessive traits, large population", illustrating selection for a trait in an out-crossing population.

Plot titled "Multiple dominant traits, large population", illustrating selection for a trait in an out-crossing population. It takes more years for the trait of interest to reach saturation in the population.

The change in code to simulate multiple genetic loci is really simple if we assume the different alleles we're selecting on are found sufficiently distant from each other on the chromosomes. This is referred to as "un-linked" and means the probability calculations for each are independent of the others.

R Script 5: Multiple recessive traits, large population.

# One recessive trait, infinite population.
#     Stabilize progeny for recessive trait via selection.
#     Save seeds from double-recessive plants each generation.
years <- 10;

# Define F2 population.
P_AA <- vector();
P_Aa <- vector();
P_aa <- vector();
P_AA <- 0.25;
P_Aa <- 0.50;
P_aa <- 0.25;

# Save seeds only from aabb plants, unknown pollen donor. Iterate over years.
for(i in 1:years) {
  P_AA <- append(P_AA,   0);
  P_Aa <- append(P_Aa,   P_aa[i]*P_AA[i]*1.00 + P_aa[i]*P_Aa[i]*0.50);
  P_aa <- append(P_aa,   P_aa[i]*P_aa[i]*1.00 + P_aa[i]*P_Aa[i]*0.50);
  P_sum <- P_aa[i+1] + P_Aa[i+1];
  P_Aa[i+1] <- P_Aa[i+1]/P_sum;
  P_aa[i+1] <- P_aa[i+1]/P_sum;
}

# Make figure.
plot(  0:years, P_aa^1, col="red", main="Multiple recessive traits, large population.", xlab="Years", ylab="%aa pollen donors", xlim=c(0,years), ylim=c(0,1), axes=TRUE, frame.plot=TRUE);
lines(0:years, P_aa^1, col="red");
lines(0:years, 1-P_aa^1, col="blue", lty="dashed");
lines(c(0,years),c(0.95,0.95), col="black", lty="dotted");
lines(c(0,years),c(0.99,0.99), col="black", lty="dashed");

for(i in 2:20) {
  lines(0:years, P_aa^i, col="red");
  lines(0:years, 1-P_aa^i, col="blue", lty="dashed");
}

R Script 6: Multiple dominant traits, large population.

# One dominant trait, infinite population.
#     Stabilize progeny for dominant trait via selection.
#     Save seeds from dominant plants each generation.
years <- 10;

# Define F2 population.
P_AA <- vector();
P_Aa <- vector();
P_aa <- vector();
P_AA <- 0.25;
P_Aa <- 0.50;
P_aa <- 0.25;

# Save seeds only from (AA and Aa) plants, unknown pollen donor. Iterate over years.
for(i in 1:years) {
  P_AA <- append(P_AA,   P_AA[i]*P_AA[i]*1.00 + P_AA[i]*P_Aa[i]*0.50 + P_Aa[i]*P_Aa[i]*0.25);
  P_Aa <- append(P_Aa,   P_AA[i]*P_aa[i]*1.00 + P_AA[i]*P_Aa[i]*0.50 + P_Aa[i]*P_aa[i]*0.50 + P_Aa[i]*P_Aa[i]*0.50);
  P_aa <- append(P_aa,   0);
  
  P_sum <- P_AA[i+1] + P_Aa[i+1];
  P_AA[i+1] <- P_AA[i+1]/P_sum;
  P_Aa[i+1] <- P_Aa[i+1]/P_sum;
}

# Make figure.
plot(  0:years, P_AA, col="red", main="Multiple dominant traits, large population.", xlab="Years", ylab="%AA pollen donors", xlim=c(0,years), ylim=c(0,1), axes=TRUE, frame.plot=TRUE);
lines(0:years, P_AA, col="red");
lines(0:years, 1-P_AA, col="blue", lty="dashed");
lines(c(0,years),c(0.95,0.95), col="black", lty="dotted");
lines(c(0,years),c(0.99,0.99), col="black", lty="dashed");

for(i in 2:20) {
  lines(0:years, P_AA^i, col="red");
  lines(0:years, 1-P_AA^i, col="blue", lty="dashed");
}

The probability of an F2 plant having two copies of recessive alleles for multiple genes drops to minimal very quickly when we increase the number of genes. In a small population this low probability means we might not find an F2 with all the recessive alleles stacked up the way we might want. All is not lost.

With our small F2 population, roughly a quarter would be expected to be in the double-recessive condition for the first gene of interest.

25% AA; 50% Aa; 25% aa

If we were unlucky and couldn't find a single plant that was also double-recessive for the second gene of interest, we can go ahead with plants showing the dominant trait for that second gene. The probability is that two thirds of the plants showing the dominant trait for the second gene will be heterozygous, carrying one copy of the recessive allele.

aaB_ (⅓BB; ⅔Bb)

In the next generation we have pretty good odds of recovering that second recessive trait that we were looking for. This way we can progressively collect multiple recessive traits without finding them in that first F2 generation. With this strategy, we need to keep seeds from prior generations. If we can't recover that next recessive trait in the next year, then we managed to find plants that were not heterozygous for the gene of interest. We need to grow more plants from the previous generation again, to try and find some carrying a copy of the recessive allele.

With plants that typically self-pollinate (like peppers and tomatoes), it can be pretty simple to intentionally remove recessive alleles for genes of interest. If you grow out the seeds produced by a plant and find any double-recessive progeny, you know that plant was heterozygous. If you don't find any double-recessive progeny, if you grow enough seeds, you can be pretty confident of that plant being homozygous for the dominant allele.

With plants that can't self-pollinate (like tomatillos), it can take more work/time. Lets say we have one plant that is showing the dominant trait. If we cross it with a plant showing the recessive trait, the resulting progeny will tell us if that first plant is "AA" or "Aa". If all the progeny show the dominant trait, then the plant we were testing is "AA". If the progeny show a mix of dominant and recessive traits, then the plant we were testing is "Aa" (and can be discarded). This is called a "test-cross" because it is used to test the genetics of a specific individual, even though we have no interest in using the progeny that result for further breeding work.

Since tomatilloes can be kept alive over several years, you can use such test crosses to progressively collect multiple plants with just the dominant alleles for your genes of interest. Once you have a few such plants, you can then allow them to inter-cross and be confident you won't have the recessive allele turning up in the next generations.

Friday, February 7, 2020

Tomatillo Breeding (3/n)

I've been doing some math to help me think about breeding strategies with tomatillos. Last week I showed some code for calculating how populations of different sizes converge under selection for a single recessive trait. Here I'll show similar code for a single dominant trait.

X-axis, years going from 0 to 10. Y-axis, "%AA pollen donors" going from 0 to 1. Red curve for %AA goes from lower left, rises slowly towards 1, and then smooths out to approach 1. Blue curve descends in a mirror image.

Solid red curve with circles: %AA pollen donors.
Dashed blue curve: %Aa & %aa pollen donors.

Like before, we'll start with an infinite population.

Since we can't tell the difference between plants with one or two copies of the dominant trait ("AA" or "Aa"), we can't tell what the genetic status is of any one plant that we save seeds from. Our goal is a population entirely consisting of "AA" plants, so that is what the code will plot.

The zero year is our F2 population. It takes seven years for the "AA" individuals to represent 95% (dotted horizontal line) of the population. Three years later the level crosses above 99% (dashed horizontal line) of the population.

Because this is the infinite population scenario, there will always be a small percentage of the population carrying the recessive allele.

R Script 3: One dominant trait, infinite population.

# One dominant trait, infinite population.
#     Stabilize progeny for dominant trait via selection.
#     Save seeds from dominant plants each generation.
years <- 10;

# Define F2 population.
P_AA <- vector();
P_Aa <- vector();
P_aa <- vector();
P_AA <- 0.25;
P_Aa <- 0.50;
P_aa <- 0.25;

# Save seeds only from (AA and Aa) plants, unknown pollen donor. Iterate over years.
for(i in 1:years) {
  P_AA <- append(P_AA,   P_AA[i]*P_AA[i]*1.00 + P_AA[i]*P_Aa[i]*0.50 + P_Aa[i]*P_Aa[i]*0.25);
  P_Aa <- append(P_Aa,   P_AA[i]*P_aa[i]*1.00 + P_AA[i]*P_Aa[i]*0.50 + P_Aa[i]*P_aa[i]*0.50 + P_Aa[i]*P_Aa[i]*0.50);
  P_aa <- append(P_aa,   0);
  
  P_sum <- P_AA[i+1] + P_Aa[i+1];
  P_AA[i+1] <- P_AA[i+1]/P_sum;
  P_Aa[i+1] <- P_Aa[i+1]/P_sum;
}

# Make figure.
plot(  0:years, P_AA, col="red", main="One dominant trait, large population.", xlab="Years", ylab="%AA pollen donors", xlim=c(0,years), ylim=c(0,1), axes=TRUE, frame.plot=TRUE);
lines(0:years, P_AA, col="red");
lines(0:years, P_Aa+P_aa, col="blue", lty="dashed");
lines(c(0,years),c(0.95,0.95), col="black", lty="dotted");
lines(c(0,years),c(0.99,0.99), col="black", lty="dashed")

X-axis, years going from 0 to 10. Y-axis, "%target Pollen Donors" going from 0 to 1. Cyan curve for recessive percentage goes from lower left, rises sharply towards 1, and then smooths out to approach 1. Red curve for dominant percentage goes from lower left, rises slowly towards 1, and then smooths out to approach 1. Yellow curve descends in a mirror image of cyan curve. Blue curve descends in a mirror image of red curve.

Cyan line w/circles: recessive selection.
Red line w/circles: dominant selection.

To compare the trajectory for selection on the recessive allele vs on the dominant allele, I overlaid the two curves in an image editor. I inverted the colors for the recessive curves to better distinguish them from the added dominant curves.

Selection on a dominant trait progresses at a slower rate initially than selection on a recessive trait, but by about ten years the two approaches would be expected to reach a similar degree of completeness.

With smaller population sizes, we'd expect the selected allele (dominant or recessive) to reach complete saturation by about that time point.

With recessive traits, I only had to consider "aa" plants as seed producers. With dominant traits, I have to consider "AA" and "Aa" plants. This seems like a small difference, but for simulating small numbers this adds significant complexity.

Similar to above figure, but each curve is replaced by a tight cluster of overlapping curves representing individual runs of the simulation.

Population = 1000

Similar to above figure, but each curve is replaced by a very loose cluster of overlapping curves representing individual runs of the simulation.

Population = 50

Similar to above figure, but each curve is replaced by an extremely loose cluster of overlapping curves representing individual runs of the simulation. These curves occupy almost the entire figure.

Population = 10

If you compare these plots to those for the recessive selection scenario (https://the-biologist-is-in.blogspot.com/2020/01/tomatillo-breeding-2n.html), you'll see that this scenario has a much higher level of noise in the trajectories. For the smallest population level, it takes 30 years (not shown in figures) for the majority of the experimental replicates to converge on the targeted "AA" condition.

R Script 4: One dominant trait, small population.

# One dominant trait, small population.
#     Stabilize progeny for dominant trait via selection.
#     Save seeds from dominant plants each generation.
years <- 10;
population <- 1000; # 1000, 50, 10
trials <- 100;

# Intialize figure.
plot( c(0,years),c(0,years), col="red", main="One dominant trait, small population.", xlab="Years", ylab="%AA pollen donors", xlim=c(0,years), ylim=c(0,1), axes=TRUE, frame.plot=TRUE);
lines(c(0,years),c(0.95,0.95), col="black", lty="dotted");
lines(c(0,years),c(0.99,0.99), col="black", lty="dashed");

for (ii in 1:trials) {
  # Define F2 population probabilities for selection on AA plants.
  P_AA_1 <- vector();
  P_Aa_1 <- vector();
  P_aa_1 <- vector();
  P_AA_1 <- 0.25;
  P_Aa_1 <- 0.50;
  P_aa_1 <- 0.25;
  
  # Define F2 population probabilities for selection on Aa plants.
  P_AA_2 <- vector();
  P_Aa_2 <- vector();
  P_aa_2 <- vector();
  P_AA_2 <- 0.25;
  P_Aa_2 <- 0.50;
  P_aa_2 <- 0.25;

  # Save seeds only from (AA and Aa) plants, which can't self-polinate.
  for (i in 1:(years+2)) {
    # Generate actual population.
    rands <- runif(population, 0, 1);
    Genotypes <- vector();
    for (j in 1:population) {
      if (rands[j] < P_AA_1[i]) {
        Genotypes <- append(Genotypes, "AA");
      } else if (rands[j] < P_AA_1[i]+P_Aa_1[i]) {
        Genotypes <- append(Genotypes, "Aa");
      } else {
        Genotypes <- append(Genotypes, "aa");
      }
    }
    Genotype_counts <- table(Genotypes);
    
    # Determine actual genotype probabilities for pollen donors. (Assuming "AA" plant in case 1, "Aa" plant in case 2.)
    if (is.na(Genotype_counts["AA"])) {
      P_AA_1[i] <- 0;
      P_AA_2[i] <- 0;
    } else {
      P_AA_1[i] <- (Genotype_counts["AA"]-1)/(population-1); # The plant we're saving seeds from can't be polinated by itself.
      P_AA_2[i] <- Genotype_counts["AA"]/(population-1);
    }
    if (is.na(Genotype_counts["Aa"])) {
      P_Aa_1[i] <- 0;
      P_Aa_2[i] <- 0;
    } else {
      P_Aa_1[i] <- Genotype_counts["Aa"]/(population-1);
      P_Aa_2[i] <- (Genotype_counts["AA"]-1)/(population-1); # The plant we're saving seeds from can't be polinated by itself.
    }
    if (is.na(Genotype_counts["aa"])) {
      P_aa_1[i] <- 0;
      P_aa_2[i] <- 0;
    } else {
      P_aa_1[i] <- Genotype_counts["aa"]/(population-1);
      P_aa_2[i] <- Genotype_counts["aa"]/(population-1);
    }
  
    # Generate new theoretical genotype probabilities.
    P_AA_1 <- append(P_AA_1,   P_AA_1[i]*P_AA_1[i]*1.00 + P_AA_1[i]*P_Aa_1[i]*0.50 + P_Aa_1[i]*P_Aa_1[i]*0.25);
    P_Aa_1 <- append(P_Aa_1,   P_AA_1[i]*P_aa_1[i]*1.00 + P_AA_1[i]*P_Aa_1[i]*0.50 + P_Aa_1[i]*P_aa_1[i]*0.50 + P_Aa_1[i]*P_Aa_1[i]*0.50);
    P_aa_1 <- append(P_aa_1,   0);
    
    P_AA_2 <- append(P_AA_2,   P_AA_2[i]*P_AA_2[i]*1.00 + P_AA_2[i]*P_Aa_2[i]*0.50 + P_Aa_2[i]*P_Aa_2[i]*0.25);
    P_Aa_2 <- append(P_Aa_2,   P_AA_2[i]*P_aa_2[i]*1.00 + P_AA_2[i]*P_Aa_2[i]*0.50 + P_Aa_2[i]*P_aa_2[i]*0.50 + P_Aa_2[i]*P_Aa_2[i]*0.50);
    P_aa_2 <- append(P_aa_2,   0);

    P_sum_1 <- P_AA_1[i+1] + P_Aa_1[i+1];
    P_AA_1[i+1] <- P_AA_1[i+1]/P_sum_1;
    P_Aa_1[i+1] <- P_Aa_1[i+1]/P_sum_1;
    
    P_sum_2 <- P_AA_2[i+1] + P_Aa_2[i+1];
    P_AA_2[i+1] <- P_AA_2[i+1]/P_sum_2;
    P_Aa_2[i+1] <- P_Aa_2[i+1]/P_sum_2;
    
    # Weighted average of the two probability sets by proportion of "AA" vs "Aa" plants.
    #  Only _1 values carry over to next iteration.
    if (is.na(Genotype_counts["AA"])) {
      count_AA <- 0; } else {
      count_AA <- Genotype_counts["AA"];
    }
    if (is.na(Genotype_counts["Aa"])) {
      count_Aa <- 0; } else {
      count_Aa <- Genotype_counts["Aa"];
    }
    weight1 <- count_AA/(count_AA+count_Aa);
    weight2 <- 1-weight1;
    val_AA_1 <- P_AA_1[i+1];
    val_AA_2 <- P_AA_2[i+1];
    val_Aa_1 <- P_Aa_1[i+1];
    val_Aa_2 <- P_Aa_2[i+1];
    P_AA_1[i+1] <- val_AA_1*weight1 + val_AA_2*weight2;
    P_Aa_1[i+1] <- val_Aa_1*weight1 + val_Aa_2*weight2;
    
    if (is.na(P_AA_1[i+1]) == TRUE) {  P_AA_1[i+1] <- 0;  }
    if (is.na(P_Aa_1[i+1]) == TRUE) {  P_Aa_1[i+1] <- 0;  }
    
    if ((P_AA_1[i+1]+P_Aa_1[i+1]) == 0) {
      # End simulation cycle if no "AA" or "Aa" plants.
      for (j in (length(P_aa_1)):years) {
        P_AA_1 <- append(P_AA_1,   0);
        P_Aa_1 <- append(P_Aa_1,   0);
        P_aa_1 <- append(P_aa_1,   0);
      }
      break;
    }
    
    ## Debugging output.
    #message("Iteration ", i);
    #print(Genotypes);
    #message("  ");
  }

  # Add current simulation cycle to figure.
  points(0:years, P_AA_1[1:(years+1)], col="red");
  lines( 0:years, P_AA_1[1:(years+1)], col="red");
  lines( 0:years, 1-P_AA_1[1:(years+1)], col="blue", lty="dashed");
}

This essentially means it isn't possible to selectively breed a dominant trait to complete saturation in a small population just using simple selection.

Unlike in the recessive case, we can't just save a few plants over winter to reset the population with only the exact genetics we want. A similar strategy should allow for more rapid progress towards the goal, however.

I'll explore this topic further next time.