A road to injustice: Simulating diversity loss

Based on that talk

Not that long time ago scientists have established that we all share one mother (and and one father, then). The interesting thing is that Eve was living 100 thouthands years earlier than Adam. So they could not have sex with each other and even see each other. How is this possible? The answer is that there are competing pieces of genome: some people have one allele in that locus of genome, others carry another. It can happen that one allele evicts all the competitors. So, there was a women which had a mutation in her mitochondrial DNA and this piece of DNA started to spread over the population so that 100 000 years later all women had it (others died without descendents).

You may think that our mother have got very evolutionary advantageous piece of DNA, which have managed to monopolize all the population because it was so efficient. I have considered however what if no allele has evolutionary advantages over the others. That is, they all are equally likely to be selected from the population for reproduction. We start with all parents different. Will any of them monopolize the population, eliminating all competitors?

The simplest case to permit elimination of some genes is to maintain a fixed-size population. N parents produce N offspring every generation. These N kids are parents for the next generation. You do it by sampling the (parent) population N times (with replacements, sinble parent can be picked any number of times; otherwise, population of children will be a copy of the parents). If some parent is not sampled, his allele ceases to exist in the future. You see, it seems like a converging process. You can loose some alleles at every generation so that chance will reduce the diversity the same way as gas molecules tend to fill up all the tank: there is more molecules to move from dense area into the vacuum than vice-versa.

It seems inevitable now that you will loose all diversity over time even if competitors have no advantages over each other. The loss of diversity is however slows down as you get a much more individuals than alleles. It is very likely that one of 100 different parents will not be sampled by 100 random draws. Yet, if you have a population of 50 fathers of one kind and 50 of another kind, and sample the box of them 100 times at random (two companies remains on the market and 100 people order the services from both at random) it will be very unlikely that no company remains without orders and thus disappears. On the other hand, it may be like a random walk - your company reduces a share on the market, it becomes less popular (less parents of that kind), so, it is less likely to be chosen on the next round so on and on and the less popular provider disappears. So, what is the speed of the diversity loss?

Simulator suggests that average extinction time is no longer than 4 population size:

pop size	samples	avg	vari- ance	computer time (sec)
2	20	3	3	0
2	20	3	2	0
2	20	3	3	0
2	20	2	2	0
2	200	2	2	0
2	200	2	2	0
2	200	3	1	0
2	200	3	1	0
2	200	2	2	0
2	200	2	2	0
2	200	3	1	0
2	200	3	2	0
2	2000	2	2	0
2	2000	3	2	0
2	2000	3	2	0
2	2000	2	3	0
2	2000	3	2	0
2	20000	3	2	1
2	20000	2	2	1
2	20000	3	2	1
2	20000	3	2	1
2	2000	2	3	0
2	2000	2	2	0
2	20000	3	2	1
2	20000	3	2	1
2	20000	2	3	1
2	20000	3	2	1
2	20000	2	2	1
2	20000	3	1	1
2	20000	2	2	1
2	200000	2	2	3
2	200000	2	2	3
20	2	28	72	0
20	2	44	25	0
20	2	40	1201	0
20	2	64	128	0
20	2	38	613	0
20	200	36	460	0
20	200	37	515	0
20	200	36	474	0
20	200	38	444	0
20	200	35	269	0
20	20000	37	425	2
20	20000	38	425	2
20	20000	38	451	2
20	20000	38	424	2
20	20000	38	439	3
200	2	611	167621	0
200	2	462	2048	0
200	2	357	41185	0
200	2	397	265	0
200	2	278	18050	0
200	20	380	26853	0
200	20	376	33459	0
200	20	369	31579	0
200	20	396	114631	0
200	200	415	50068	2
200	200	402	58259	1
200	200	408	55265	1
200	200	395	57118	1
200	200	392	38700	1
200	2000	397	48032	11
200	2000	399	46132	11
200	2000	408	47565	11
200	2000	397	45438	10
200	20000	392	44702	106
200	20000	396	45552	100
200	20000	396	45646	103
200	20000	398	45682	102
200	20000	399	47252	105
200	20000	396	45894	106

Here, avg reports the monopolization time, in generations whereas `samples` stands for the num of games. Every game starts with absolutely diverse population and finishes when monopoly is achieved so that no further change is possible.

You see that monopolization time is proportional to population size. Yet, it looks quite sublinear. However, this can be as deceptive as St. Petersburg paradox, which demonstrates limited score despite it is infinite in fact. There is always a chance that diversity is not lost at this reproduction and game can proceed indefinitely, making average number of generations per loss/monopolization infinite after all.

I therefore made some analysis. If you have only two individual population, ab, it can be changed into aa, ab, ba or bb it the next generation (with probability 1/4). Doubles aa and bb mean that all diversity is lost in the very first reproduction. The probability of that is 1/2. Alternatively, original individuals ab persist and experiment is retried. So, at every generation there is 1/2 chance of terminating and 1/2 of proceeding. This means that distribution is

$ \begin{matrix} Gen: 1 & 2 & 3 & 4 & \ldots & n \ldots\\ Prob: 1/2 & 1/4 & 1/8 & 1/16 & \ldots & 1/2^n \ldots \end{matrix}$

and expected number of generations to monopolize converges to Avg = 1/2 + 2/4 + 3/8 + 4/16 + ... = 2 and Monte-Carlo simulation oscillates around 2 and 3 for pop size 2 in the table above.

The analysis for 3 is already very difficult for me. We can guess that abc can take on 27 different child generations: a[a(a,b,c), b(a,b,c), c(a,b,c)], b[a(a,b,c),b(a,b,c},c(a,b,c))], c[a(a,b,c),b(a,b,c),c(a,b,c)]. We can consider only 1/3 of the cases, when a is first since others are identical. Among the offsprings of a(a(a,b,c),b(a,b,c),c(a,b,c)), monopoisation, aaa, can happen with chance of 1/9. This is a chance of terminating after the first game. We can also get 3 different offsprings in cases abc an acb. That is, in 2/9 of cases, we original game is repeated. In the rest 6/9 = 2/3 of cases, aab, aac, abb, aba, aca, acc, we have diversity (richness, to be precise) reduced to two.

What happens when population of 3 items has richness of 2? How can we select 3 child from 3 parents, whereas two identical parents have weight 2 and one has weight 1? It is a kind of Pascal's triange whose branches have biased weights 2/3 and 1/3 rather than regular 1/2. As in normal Pascal triangle, we yield one child per step and it will be A with probability 2/3 and B with probability 1/3

On the last row you see that, once 3 children are yeilded, they are a monopoly (blue) with a probability of (8+1)/27 = 1/3 or remain a dipole (orange) with p = (12+6)/27 = 2/3.

On the graph below you see how started from 3 distinct individuals (green nodes), we can switch into the evolution mode with richness of 2 (orange, as stated above) and complete monopoly (blue), which terminate the game. Nodes are labelled with roman letters to denote the generation. Ratios stand for the prob to get to the child from the parent. You see familiar probabilities: 1/9 to terminate from richness-3 node, 1/3 from ricness-2 node, transit from richness 3 to richness 2 is p=2/3 and keep richness 3 = 2/9. The probability to get to the child from the root is a multiple of all ratios along the path. The probability to terminate at generateion N is the sum of probabilities of blue nodes with generation=N. You see that there is 0 of terminal nodes at gen 0, 1 at gen I, two at gen II and so one -- there is n nodes to terminate at step n. The expected number of generations to terminate for our population of size 3 will be ($p_i$ is the prob to terminate at $i$th generation) $$Gen_{avg}= \sum_{i=1}^{\infty}{i * p_i} = 1 * {1 \over 9} + 2 * ({2\over 3}{1 \over 3} + {2\over 9}{1 \over 9}) + 3 * ({2\over 3}{2\over 3}{1 \over 3} + {2\over 9}({2\over 3}{1 \over 3} + {2\over 9}{1 \over 9})) + $$ $$ + 4 * ({2\over 3}{2\over 3}{2\over 3}{1 \over 3} + {2\over 9}({2\over 3}{2\over 3}{1 \over 3} + {2\over 9}({2\over 3}{1 \over 3} + {2\over 9}({2\over 3}{1 \over 3} + {2\over 9}{1 \over 9}))) + $$ $$ + 5 * (({2\over 3})^4{1 \over 3} + {2\over 9}(({2\over 3})^3{1 \over 3} + {2\over 9}(({2\over 3})^2{1 \over 3} + {2\over 9}({2\over 3}{1 \over 3} + {2\over 9}{1\over 9})))) + \ldots $$ $$ + i * (({2\over 3}^(j-1){1 \over 3} + {2\over 9}(({2\over 3})^(j-2){1 \over 3} + {2\over 9}(\ldots {1 \over 3} + {2\over 9}{1\over 9}\underbrace{(\ldots ))))}_\text{i-1 braces} + \ldots $$

You see a recursion $$\begin{cases} III_n = 2/9\,III_{n-1}\\ II_n = 2/3(II_{n-1} + III_{n-1})\\ I_n = 1/3\,(II_{n-1} + 1/9\,III_{n-1})\\ \end{cases} \Rightarrow\\ I_n = {3\over 4}({2\over 3})^n - {7\over 4}({2\over 9})^n $$ The latter indicates the probability of getting diversity I at step n (it is 1/3 of populations with diversity II and 1/9 of loosing diversity when it is III). We can express the generating functions for the series of I_n: $$I(x) = {3/4\over 1-2/3\, x} - {7/4\over 1-2/9\,x} = I_0 + I_1x + I_2 x^2 +\cdots$$ Now, it is easy to compute expected value, the expected amount of generations to monopoly: $$1I_1 + 2I_2 + 3I_3 + \cdots = \sum_1^\infty nI_n = I'(x) \rvert_{x=1} = {3\over 4}{2/3\over (1-2/3)^2} - {7\over 4}{2/9\over (1-2/9)^2} = 27/7 = 3.85\ldots$$ Simulation gives the same result numerically it converges to 3.8571428571428537 (Simulator shows 4.857 but I guess this is because he considers initial generation 1 rather than 0)

You see that there persists an infinite tail but it seems that average still converges to 3.8 regardless of it. The complexity of analysis for such tiny populations makes larger populations intractable other than by sampling that I started with.

Since even StPB paradox, which have clearly infinite expected win, always results in finite win in all practical realizations, we can be absolutely sure that despite there is always a change of longer runs, we can absolutely be sure that metachondric eves and business enterprises always monopolize market in linear time by individuals who are not better than the others. In reality however new genes emerge looney in front of some population-dominating alleles (we see that diversity very quickly converges if not to complete monopoly but just to a couple of companies on the market) and the question is: should they have strong fenotypical advantages to proliferate or this can happen at random? I'll leave this question open. So, when you copy-paste previous solutions somewhere, it can happen that not the best takes over the world.

The problem should be somehow related with random walk (gambler's ruin). What is the expectation to deviate from center further than half number of steps. How?

A road to injustice

Tuesday, July 22, 2014

Simulating diversity loss - Mitochondrial Eve

No comments: