Tuesday, July 22, 2014

Simulating diversity loss - Mitochondrial Eve

Based on that talk

Not that long time ago scientists have established that we all share one mother (and and one father, then). The interesting thing is that Eve was living 100 thouthands years earlier than Adam. So they could not have sex with each other and even see each other. How is this possible? The answer is that there are competing pieces of genome: some people have one allele in that locus of genome, others carry another. It can happen that one allele evicts all the competitors. So, there was a women which had a mutation in her mitochondrial DNA and this piece of DNA started to spread over the population so that 100 000 years later all women had it (others died without descendents).

You may think that our mother have got very evolutionary advantageous piece of DNA, which have managed to monopolize all the population because it was so efficient. I have considered however what if no allele has evolutionary advantages over the others. That is, they all are equally likely to be selected from the population for reproduction. We start with all parents different. Will any of them monopolize the population, eliminating all competitors?

The simplest case to permit elimination of some genes is to maintain a fixed-size population. N parents produce N offspring every generation. These N kids are parents for the next generation. You do it by sampling the (parent) population N times (with replacements, sinble parent can be picked any number of times; otherwise, population of children will be a copy of the parents). If some parent is not sampled, his allele ceases to exist in the future. You see, it seems like a converging process. You can loose some alleles at every generation so that chance will reduce the diversity the same way as gas molecules tend to fill up all the tank: there is more molecules to move from dense area into the vacuum than vice-versa.

It seems inevitable now that you will loose all diversity over time even if competitors have no advantages over each other. The loss of diversity is however slows down as you get a much more individuals than alleles. It is very likely that one of 100 different parents will not be sampled by 100 random draws. Yet, if you have a population of 50 fathers of one kind and 50 of another kind, and sample the box of them 100 times at random (two companies remains on the market and 100 people order the services from both at random) it will be very unlikely that no company remains without orders and thus disappears. On the other hand, it may be like a random walk - your company reduces a share on the market, it becomes less popular (less parents of that kind), so, it is less likely to be chosen on the next round so on and on and the less popular provider disappears. So, what is the speed of the diversity loss?

Simulator suggests that average extinction time is no longer than 4 population size:


pop
size
samples avg vari-
ance
computer
time (sec)
2 20 3 3 0
2 20 3 2 0
2 20 3 3 0
2 20 2 2 0
2 200 2 2 0
2 200 2 2 0
2 200 3 1 0
2 200 3 1 0
2 200 2 2 0
2 200 2 2 0
2 200 3 1 0
2 200 3 2 0
2 2000 2 2 0
2 2000 3 2 0
2 2000 3 2 0
2 2000 2 3 0
2 2000 3 2 0
2 20000 3 2 1
2 20000 2 2 1
2 20000 3 2 1
2 20000 3 2 1
2 2000 2 3 0
2 2000 2 2 0
2 20000 3 2 1
2 20000 3 2 1
2 20000 2 3 1
2 20000 3 2 1
2 20000 2 2 1
2 20000 3 1 1
2 20000 2 2 1
2 200000 2 2 3
2 200000 2 2 3
20 2 28 72 0
20 2 44 25 0
20 2 40 1201 0
20 2 64 128 0
20 2 38 613 0
20 200 36 460 0
20 200 37 515 0
20 200 36 474 0
20 200 38 444 0
20 200 35 269 0
20 20000 37 425 2
20 20000 38 425 2
20 20000 38 451 2
20 20000 38 424 2
20 20000 38 439 3
200 2 611 167621 0
200 2 462 2048 0
200 2 357 41185 0
200 2 397 265 0
200 2 278 18050 0
200 20 380 26853 0
200 20 376 33459 0
200 20 369 31579 0
200 20 396 114631 0
200 200 415 50068 2
200 200 402 58259 1
200 200 408 55265 1
200 200 395 57118 1
200 200 392 38700 1
200 2000 397 48032 11
200 2000 399 46132 11
200 2000 408 47565 11
200 2000 397 45438 10
200 20000 392 44702 106
200 20000 396 45552 100
200 20000 396 45646 103
200 20000 398 45682 102
200 20000 399 47252 105
200 20000 396 45894 106

Here, avg reports the monopolization time, in generations whereas `samples` stands for the num of games. Every game starts with absolutely diverse population and finishes when monopoly is achieved so that no further change is possible.

You see that monopolization time is proportional to population size. Yet, it looks quite sublinear. However, this can be as deceptive as St. Petersburg paradox, which demonstrates limited score despite it is infinite in fact. There is always a chance that diversity is not lost at this reproduction and game can proceed indefinitely, making average number of generations per loss/monopolization infinite after all.

I therefore made some analysis. If you have only two individual population, ab, it can be changed into aa, ab, ba or bb it the next generation (with probability 1/4). Doubles aa and bb mean that all diversity is lost in the very first reproduction. The probability of that is 1/2. Alternatively, original individuals ab persist and experiment is retried. So, at every generation there is 1/2 chance of terminating and 1/2 of proceeding. This means that distribution is

$ \begin{matrix} Gen: 1 & 2 & 3 & 4 & \ldots & n \ldots\\ Prob: 1/2 & 1/4 & 1/8 & 1/16 & \ldots & 1/2^n \ldots \end{matrix}$

and expected number of generations to monopolize converges to Avg = 1/2 + 2/4 + 3/8 + 4/16 + ... = 2 and Monte-Carlo simulation oscillates around 2 and 3 for pop size 2 in the table above.

The analysis for 3 is already very difficult for me. We can guess that abc can take on 27 different child generations: a[a(a,b,c), b(a,b,c), c(a,b,c)], b[a(a,b,c),b(a,b,c},c(a,b,c))], c[a(a,b,c),b(a,b,c),c(a,b,c)]. We can consider only 1/3 of the cases, when a is first since others are identical. Among the offsprings of a(a(a,b,c),b(a,b,c),c(a,b,c)), monopoisation, aaa, can happen with chance of 1/9. This is a chance of terminating after the first game. We can also get 3 different offsprings in cases abc an acb. That is, in 2/9 of cases, we original game is repeated. In the rest 6/9 = 2/3 of cases, aab, aac, abb, aba, aca, acc, we have diversity (richness, to be precise) reduced to two.

What happens when population of 3 items has richness of 2? How can we select 3 child from 3 parents, whereas two identical parents have weight 2 and one has weight 1? It is a kind of Pascal's triange whose branches have biased weights 2/3 and 1/3 rather than regular 1/2. As in normal Pascal triangle, we yield one child per step and it will be A with probability 2/3 and B with probability 1/3

?, 1A, 2/3B, 1/3AA, 4/9AB, 4/9BB, 1/9AAA, 8/27AAB, 12/27BBB, 1/27BBA, 6/272/3 ** 1/3* 1/3* 1/3* 1/3* 1/3* 1/32/3 *2/3 *2/3 *2/3 *Yeild 3Yeild 0Yeild 1Yeild 2

On the last row you see that, once 3 children are yeilded, they are a monopoly (blue) with a probability of (8+1)/27 = 1/3 or remain a dipole (orange) with p = (12+6)/27 = 2/3.

On the graph below you see how started from 3 distinct individuals (green nodes), we can switch into the evolution mode with richness of 2 (orange, as stated above) and complete monopoly (blue), which terminate the game. Nodes are labelled with roman letters to denote the generation. Ratios stand for the prob to get to the child from the parent. You see familiar probabilities: 1/9 to terminate from richness-3 node, 1/3 from ricness-2 node, transit from richness 3 to richness 2 is p=2/3 and keep richness 3 = 2/9. 0, 1startI, 1/9I, 2/9I, 2/3II, 1/3II, 2/3III, 2/3III, 1/3IV, 2/3IV, 1/3IV, 2/3IV, 1/3IV, 2/3IV, 2/3II, 2/3II, 1/9II, 2/9III, 2/3IV, 2/3V, 2/3...III, 1/3IV, 1/3V, 1/3VI, 1/3...III, 2/9IV, 2/9V, 2/9III, 1/9IV, 1/9V, 1/9VI ...VI, ...Simulating V generationsIII, 2/3IV, 2/3V, 2/3VI, 2/3 ...IV, 2/3V, 2/3III, 2/3III, 2/3V, 1/3IV, 1/3IV, 2/3IV, 2/3IV, 2/3V, 2/3V,... The probability to get to the child from the root is a multiple of all ratios along the path. The probability to terminate at generateion N is the sum of probabilities of blue nodes with generation=N. You see that there is 0 of terminal nodes at gen 0, 1 at gen I, two at gen II and so one -- there is n nodes to terminate at step n. The expected number of generations to terminate for our population of size 3 will be ($p_i$ is the prob to terminate at $i$th generation) $$Gen_{avg}= \sum_{i=1}^{\infty}{i * p_i} = 1 * {1 \over 9} + 2 * ({2\over 3}{1 \over 3} + {2\over 9}{1 \over 9}) + 3 * ({2\over 3}{2\over 3}{1 \over 3} + {2\over 9}({2\over 3}{1 \over 3} + {2\over 9}{1 \over 9})) + $$ $$ + 4 * ({2\over 3}{2\over 3}{2\over 3}{1 \over 3} + {2\over 9}({2\over 3}{2\over 3}{1 \over 3} + {2\over 9}({2\over 3}{1 \over 3} + {2\over 9}({2\over 3}{1 \over 3} + {2\over 9}{1 \over 9}))) + $$ $$ + 5 * (({2\over 3})^4{1 \over 3} + {2\over 9}(({2\over 3})^3{1 \over 3} + {2\over 9}(({2\over 3})^2{1 \over 3} + {2\over 9}({2\over 3}{1 \over 3} + {2\over 9}{1\over 9})))) + \ldots $$ $$ + i * (({2\over 3}^(j-1){1 \over 3} + {2\over 9}(({2\over 3})^(j-2){1 \over 3} + {2\over 9}(\ldots {1 \over 3} + {2\over 9}{1\over 9}\underbrace{(\ldots ))))}_\text{i-1 braces} + \ldots $$

You see a recursion $$\begin{cases} III_n = 2/9\,III_{n-1}\\ II_n = 2/3(II_{n-1} + III_{n-1})\\ I_n = 1/3\,(II_{n-1} + 1/9\,III_{n-1})\\ \end{cases} \Rightarrow\\ I_n = {3\over 4}({2\over 3})^n - {7\over 4}({2\over 9})^n $$ The latter indicates the probability of getting diversity I at step n (it is 1/3 of populations with diversity II and 1/9 of loosing diversity when it is III). We can express the generating functions for the series of I_n: $$I(x) = {3/4\over 1-2/3\, x} - {7/4\over 1-2/9\,x} = I_0 + I_1x + I_2 x^2 +\cdots$$ Now, it is easy to compute expected value, the expected amount of generations to monopoly: $$1I_1 + 2I_2 + 3I_3 + \cdots = \sum_1^\infty nI_n = I'(x) \rvert_{x=1} = {3\over 4}{2/3\over (1-2/3)^2} - {7\over 4}{2/9\over (1-2/9)^2} = 27/7 = 3.85\ldots$$ Simulation gives the same result numerically it converges to 3.8571428571428537 (Simulator shows 4.857 but I guess this is because he considers initial generation 1 rather than 0)

You see that there persists an infinite tail but it seems that average still converges to 3.8 regardless of it. The complexity of analysis for such tiny populations makes larger populations intractable other than by sampling that I started with.

Since even StPB paradox, which have clearly infinite expected win, always results in finite win in all practical realizations, we can be absolutely sure that despite there is always a change of longer runs, we can absolutely be sure that metachondric eves and business enterprises always monopolize market in linear time by individuals who are not better than the others. In reality however new genes emerge looney in front of some population-dominating alleles (we see that diversity very quickly converges if not to complete monopoly but just to a couple of companies on the market) and the question is: should they have strong fenotypical advantages to proliferate or this can happen at random? I'll leave this question open. So, when you copy-paste previous solutions somewhere, it can happen that not the best takes over the world.

The problem should be somehow related with random walk (gambler's ruin). What is the expectation to deviate from center further than half number of steps. How?

No comments: