// Twitter Cards // Prexisting Head The Biologist Is In: January 2020

Friday, January 31, 2020

Tomatillo Breeding (2/n)

I thought it would take me a week to get back to this, but that didn't happen. Oops. Sorry.



The big difficulty with tomatillo breeding is that they're very strong out-crossers. Unlike tomatoes, peppers, eggplant, beans, etc., you can't just grow one plant from each generation to help reduce control the genetics during the process of making a new variety. If you grow a dozen tomatillo plants and don't like how half of them grew, you can be sure that seeds saved from the plants you liked will have genetics from the ones you didn't.

I worked out some of the math long-hand, showing how this difficulty plays out over several generations. I faltered when it came to the task of outlining all those calculations via text. It is easy enough to throw a few equations into text, but I didn't want to post pages of derivations for you to read through. (And I'd have most assuredly made silly typos along the way.)

Instead, I wrote up some simulations in R. These can be run using RStudio if you want to play around with them, or you can just look at my summary figures here.



X-axis, years going from 0 to 10. Y-axis, "%aa pollen donors" going from 0 to 1. Red curve for %aa goes from lower left, rises rapidly towards 1, and then smooths out to approach 1. Blue curve descends in a mirror image.
Solid red curve with circles: %aa pollen donors.
Dashed blue curve: %Aa & %AA pollen donors.
We'll start with the simple case of a single recessive trait in an infinite population. (Sometimes infinity makes the math hard to do, other times it makes it very easy.)

If we seeds only from plants showing the recessive trait, we can rapidly select away the dominant allele. The zero year of this plot is the F2 generation, where traits first start assorting. It takes five years for the recessive trait to be at 95% (dotted horizontal line) of the population and another three for it to be at 99% (dashed horizontal line) of the population.

Because the population is infinite, we can never quite reach 100%. There will always be a small amount of the dominant allele hanging around.

R Script 1: One recessive trait, infinite population.
# One recessive trait, infinite population.
#     Stabilize progeny for recessive trait via selection.
#     Save seeds from double-recessive plants each generation.
years <- 10;

# Define F2 population.
P_AA <- vector();
P_Aa <- vector();
P_aa <- vector();
P_AA <- 0.25;
P_Aa <- 0.50;
P_aa <- 0.25;

# Save seeds only from aa plants, unknown pollen donor. Iterate over years.
for(i in 1:years) {
  P_AA <- append(P_AA,   0);
  P_Aa <- append(P_Aa,   P_aa[i]*P_AA[i]*1.00 + P_aa[i]*P_Aa[i]*0.50);
  P_aa <- append(P_aa,   P_aa[i]*P_aa[i]*1.00 + P_aa[i]*P_Aa[i]*0.50);
  P_sum <- P_aa[i+1] + P_Aa[i+1];
  P_Aa[i+1] <- P_Aa[i+1]/P_sum;
  P_aa[i+1] <- P_aa[i+1]/P_sum;
}

# Make figure.
plot(  0:years, P_aa, col="red", main="One recessive trait, large population.", xlab="Years", ylab="%aa pollen donors", xlim=c(0,years), ylim=c(0,1), axes=TRUE, frame.plot=TRUE);
lines(0:years, P_aa, col="red");
lines(0:years, P_Aa+P_AA, col="red", lty="dashed");
lines(c(0,years),c(0.95,0.95), col="black", lty="dotted");
lines(c(0,years),c(0.99,0.99), col="black", lty="dashed")



We can extend this simulation to better model a realistic situation where you can only grow a limited number of plants. Coding this is much more complicated. If we run it with a large population, we see a pattern much like the infinite one above. If we run it with a small population, we get very noisy trajectories that vary a lot from run to run. At small population numbers, it is fairly easy to accidentally end the experiment with no "aa" plants to save seeds from. (In real life, we'd just go back to seeds from the previous generation.)

Similar to above figure, but each curve is replaced by a tight cluster of overlapping curves representing individual runs of the simulation.
Population = 1000
Similar to above figure, but each curve is replaced by a loose cluster of overlapping curves representing individual runs of the simulation.
Population =50
Similar to above figure, but each curve is replaced by a very loose cluster of overlapping curves representing individual runs of the simulation. The curves are so noisy that the cluster is spread over much of the plot.
Population =10

The upshot of the simulations is that if we grow small numbers of plants each generation, we can eventually eliminate the pesky dominant alleles for the trait of interest. It will take a while, but it is doable if you're willing to wait several years to a decade.

R Script 2: One recessive trait, small population.
# One recessive trait, small population.
#     Stabilize progeny for recessive trait via selection.
#     Save seeds from double-recessive plants each generation.
years <- 10;
population <- 50; # 1000, 50, 10
trials <- 100;

# Intialize figure.
plot(  c(0,years),c(0,years), col="red", main="One recessive trait, small population.", xlab="Years", ylab="%aa pollen donors", xlim=c(0,years), ylim=c(0,1), axes=TRUE, frame.plot=TRUE);
lines(c(0,years),c(0.95,0.95), col="black", lty="dotted");
lines(c(0,years),c(0.99,0.99), col="black", lty="dashed");

for (ii in 1:trials) {
  # Define F2 population probabilities
  P_AA <- vector();
  P_Aa <- vector();
  P_aa <- vector();
  P_AA <- 0.25;
  P_Aa <- 0.50;
  P_aa <- 0.25;

  # Save seeds only from "aa" plants, which can't self-polinate.
  for (i in 1:(years+2)) {
    # Generate actual population.
    rands <- runif(population, 0, 1);
    Genotypes <- vector();
    for (j in 1:population) {
      if (rands[j] < P_AA[i]) {
        Genotypes <- append(Genotypes, "AA");
      } else if (rands[j] < P_AA[i]+P_Aa[i]) {
        Genotypes <- append(Genotypes, "Aa");
      } else {
        Genotypes <- append(Genotypes, "aa");
      }
    }
    Genotype_counts <- table(Genotypes);
    
    # Determine actual genotype probabilities for pollen donors.
    if (is.na(Genotype_counts["AA"])) {
      P_AA[i] <- 0; } else {
      P_AA[i] <- Genotype_counts["AA"]/(population-1);
    }
    if (is.na(Genotype_counts["Aa"])) {
      P_Aa[i] <- 0; } else {
      P_Aa[i] <- Genotype_counts["Aa"]/(population-1);
    }
    if (is.na(Genotype_counts["aa"])) {
      P_aa[i] <- 0; } else {
      P_aa[i] <- (Genotype_counts["aa"]-1)/(population-1); # The plant we're saving seeds from can't be polinated by itself.  
    }
  
    # Generate new theoretical genotype probabilities.
    P_AA <- append(P_AA,   0);
    P_Aa <- append(P_Aa,   P_aa[i]*P_AA[i]*1.00 + P_aa[i]*P_Aa[i]*0.50);
    P_aa <- append(P_aa,   P_aa[i]*P_aa[i]*1.00 + P_aa[i]*P_Aa[i]*0.50);
    
    P_sum <- P_aa[i+1] + P_Aa[i+1];
    P_Aa[i+1] <- P_Aa[i+1]/P_sum;
    P_aa[i+1] <- P_aa[i+1]/P_sum;
    
    if (is.na(P_Aa[i+1]) == TRUE) {  P_Aa[i+1] <- 0;  }
    if (is.na(P_aa[i+1]) == TRUE) {  P_aa[i+1] <- 0;  }
    
    if (P_aa[i+1] == 0) {
      # End simulation cycle if no "aa" plants.
      for (j in (length(P_aa)):years) {
        P_AA <- append(P_AA,   0);
        P_Aa <- append(P_Aa,   0);
        P_aa <- append(P_aa,   0);
      }
      break;
    }
  }

  # Add current simulation cycle to figure.
  points(0:years, P_aa[1:(years+1)], col="red");
  lines( 0:years, P_aa[1:(years+1)], col="red");
  lines( 0:years, 1-P_aa[1:(years+1)], col="blue", lty="dashed");
}



However, there's a much faster way to complete the process. It might even only take a couple years.

Tomatilloes are perennial where it is warm enough for them to survive through winter. They also easily root from cuttings. These traits combined mean we can pot up rooted cuttings from each plant at the end of one year and continue growing selected plants the next year after we've had a chance to evaluate their fruit characteristics.

You'd have to grow enough plants the first year to be able to find multiple individuals with the recessive traits you're interested in. The next year, you can continue growing only those few plants and allow them to cross-pollinate. Every seed they produce in their second year will contain those recessive traits you selected the parents for. The dominant alleles will be gone from your population.

You're done in one year of selection and another for seed production. No waiting around for a decade or more, gambling with the whims of chance. You may have your new tomatillo variety complete and ready to go.



I didn't talk about dominant traits here. They're a bit more involved and I'll have to do another post about that case. I'll also have to do another post talking about the case where you're looking for multiple specific genes (recessive or dominant) at once.

It looks like my planned two part series is going to be a bit longer in the end.