Phase Portrait: statistics

Showing posts with label statistics. Show all posts

Friday, May 06, 2011

Someone asked me to explain the Price equation today...

I got an e-mail today asking for help understanding the Price equation, prompted partly by the recent RadioLab about George Price. The person who e-mailed me made it sound like he was OK with a long explanation, just so long as it explained the ugliness of the mathematics. Here is my response... (pardon the e-mail-esque formatting... I'm just pasting it rather than re-formatting it)

[ This post can also be found on my web page. ]

You shouldn't believe everything the media tells you about the complexity of the Price equation. I'm always frustrated when I hear someone on the radio read the Price equation out loud as a mathematical statement. It is not meant to be a mathematical statement. It is just a logical justification for something we all think should be true -- traits with higher differential fitness advantage should spread throughout a population (which is a critical aspect of natural selection). Price formalized that statement and then proved that the formalism is a tautology. That's all that's important.

It is a very simple idea, and it has almost nothing to do with statistics (because there are no random variables nor data in the price equation). The Price equation is a theoretical statement about the relationship between two sequential generations of a model population. You can use it to predict how the representation of a particular trait will change over time and eventually settle at some fixed distribution. However, again, numerical applications aside, it really is just a mathematical verification of something which makes intuitive sense.

Just to get comfortable with the notation, consider a trait like "height" across a population of n=100 individuals. Each individual might have a different height. Let's say that in our population, people basically have two different heights (perhaps due to sexual dimorphism). So we have two groups:

z_1 = 5 feet
z_2 = 6 feet

We represent the number of people with each height using the variables:

n_1 = 50
n_2 = 50

That is, there are an equal number of 5' tall people and 6' tall people from our 100 person population (note that n_1 + n_2 = n). Further, we find that both 5' tall and 6' tall people tend to have 1 offspring each. That is, they both have an equivalent "fitness" of 1:

w_1 = 1
w_2 = 1

Where w_i is the number of offspring an individual of group i will contribute to the next generation. Let's say we also know that offspring from 5' tall people end up also being 5' tall, and offspring of 6' tall people also end up being 6' tall. Then we have:

z'_1 = 5 feet
z'_2 = 6 feet

So the value of the trait (height) does not change from generation to generation.

Everything above is a parameter of the model. It represents what we know about "height" of individuals in this generation as well as the relationship between the height of an INDIVIDUAL and its offspring. What Price equation does is tell us about how the distribution of height in the POPULATION will change from this generation to the next. It might be helpful to think about Price equation as relating the AVERAGE value of a trait (e.g., height) in one generation to the AVERAGE value of the trait (e.g., height) in the next generation.

So now let's add-on the Price equation stuff. To account for the changes in the average value of the trait (height here), we have to worry about two effects -- "background bias [due to individuals]" (my term) and "differential fitness" (a quantity that drives natural selection):

1.) Imagine that 5' tall parents produced 5' tall offspring (so z'_1=z_1=5 feet, as above), but 6' tall parents produced 10' tall offspring (so z'_2=10 feet in this hypothetical scenario). Then even without worrying about "differential fitness", we might expect an upward shift in AVERAGE height from the parent generation to the offspring generation. This "background bias [due to individuals]" is related to the "E(w_i \delta z_i)" term in the Price equation. It represents the change in a trait at the individual level. I'll give more info about the math later.

2.) Now, instead, assume that z'_1=z_1 and z'_2=z_2 (so offspring height is the same as parent height) as above. It may still be the case that the average height in the offspring generation changes from the parent generation. This would occur if one height had a higher fitness than the other height. Here, we see that w_1=w_2=1. They both have the same fitness, and so we don't expect any differences IN REPRESENTATION from one generation to the other. Note that if w_1=w_2=5, then each individual would produce 5 offspring. Consequently, the TOTAL population would grow, but the DISTRIBUTION of height would stay the same. To make things more interesting, imagine that w_1=1 and w_2=2. Now each 5' tall person produces one 5' tall offspring, but a 6' tall person produces TWO 6' tall offspring. Consequently, the distribution of height would change from parent to offspring generation. The AVERAGE height would shift toward 6' tall people. The "cov(w_i, z_i)" term aggregates this change. It relates the "differential fitness" of one height to its success into growing the representation of that height in the next generation. I'll give more info about the math in a bit. [NOTE that the average fitness represents the average "background" rate of growth from population to population.]

To get ready for an explanation of the actual Price equation, let's get some terminology out of the way.

First, we define the "expectation" or "average" height in the current population with:

E(z_i) = ( n_1 * z_1 + n_2 * z_2 + ... )/n

That is, "E(z_i)" is the average value of the trait (height above). There are n_1 individuals with z_1 value of the trait, and so we have to multiply n_1 * z_1 to get the total contribution of that value of the trait. We do that for each group. We can do the same for other variables too. For example, here's average fitness:

E(w_i) = ( n_1 * w_1 + n_2 * w_2 + ... )/n

The average fitness "E(w_i)" somehow represents the average rate of population growth. If every w_i is 1, then there will be 1-to-1 replacement of parent by offspring and there will be no population growth; likewise, the average "E(w_i)" will be 1 reflecting no growth. However, if every w_i is 5, then "E(w_i)" will also be 5 and the population will grow 5 fold every generation. With some simple arithmetic, it is easy to verify that the total population in the NEXT (i.e., offspring) generation is given by the product of the number of individuals in this generation (n) and the average fitness (E(w_i)).

We can also find the average value of the trait in the NEXT (i.e., offspring) generation. To do so, we have to scale each value of the trait in the next generation (z'_i) by the number of individuals with that trait in the next generation (n_i w_i), and then we have to divide by the total number of individuals in the next generation (n*E(w_i)). So the average value of the trait in the NEXT (i.e., offspring) generation is:

E(z'_i) = ( n_1 * w_1 * z'_1 + n_2 * w_2 * z'_2 + ... )/(n * E(w_i))

For simplicity, let's use symbols "z", "w", and "z'" as a shorthand for those three quantities above. That is:

z = E(z_i)
w = E(w_i)
z' = E(z'_i)

Penultimately, let's define "delta" which gives the difference in a variable from the this generation to the next. The difference in the average value of the trait is:

delta(z) = E(z') - E(z)

that difference may be due either to differential fitness (i.e., when w_i is not the same as w) or to intrinsic height changes at the individual level. Those intrinsic height changes at the individual level are:

delta(z_1) = z'_1 - z_1
delta(z_2) = z'_2 - z_2
...

Finally, let's define this "covariance" formula. For each group i, let's say we have variables A_i and B_i (e.g., z_i and w_i). Let A be the average value of A_i across the population:

A = ( n_1 A_1 + n_2 A_2 + ... )/n

and B be the similarly defined average value of B_i across the population. Then we can define the covariance across the POPULATION in a similar way as we defined average. That is:

cov( A_i, B_i )
=
E( (A_i-A)*(B_i-B) )
=
( n_1*(A_i - A)*(B_i - B) + n_2*(A_2 - A)*(B_2 - B) + ... )/n

That is, cov(A_i,B_i) is the AVERAGE value of the product of the difference between each A_i and its average A and the difference between each B_i and its average B. We call this the "covariance" because:

* If A_i doesn't vary across values of i, then A_i=A (no "variance" in A) so there is no "covariance"

* If B_i doesn't vary, then there is similarly no covariance

* If whenever A_i is far from its average B_i is close to its average, then there is LOW (i.e., near zero) covariance. That is, both A_i and B_i vary across the population, but they don't vary in the same way.

* If whenever A_i is far from its average B_i is also far from its average, then there is HIGH (i.e., far from zero) covariance. Both A_i and B_i vary across the population, and they vary in the same way.

Note that HIGH covariance could be very positive or very negative. In the positive case, A_i and B_i have a similar pattern across values of i. In the negative case, A_i and B_i have mirrored patterns across values of i (i.e., A_i is very positive when B_i is very negative and vice versa). LOW covariance is specifically when the cov() formula is near zero. That indicates that the pattern of A_i has little relationship to the pattern of B_i.

Now, let's look at the Price equation more closely. The left-hand side:

w*delta(z)

is roughly the amount of new trait ADDED to each "average" individual. So if the average trait shifts (e.g., from 5.5' tall to 6.5' tall, corresponding to a delta(z) of 1'), but the population has GROWN as well (i.e., "w>1"), then amount of height "added" to the parent population to get the offspring population is more than just 1' per person. We scale the 1' per person by the "w" growth rate. Thus, "w delta(z)" captures effects of population growth (which naturally adds trait to a population) and mean change in representation. Note that if the AVERAGE trait did not change ("delta(z)=0") but the population did grow ("w>1"), then we interpret "w delta(z)=0" to mean that even though the "total amount" of trait increased due to population increase, there was no marginal change in each individual's trait (i.e., individuals aren't getting taller; the population is just getting larger).

Now let's look at the right-hand side:

cov(w_i, z_i) + E(w_i*delta(z_i))

This implies that the amount of new trait added to each average individual is the combination of two components.

To parallel the discussion above, let's consider the E() part first:

E(w_i * delta(z_i))

we can expand this average to be:

( n_1*w_1*(z'_1 - z_1) + n_2*w_2*(z'_2 - z_2) + ... )/n

That is, delta(z_i) gives us the average change from AN INDIVIDUAL to A SINGLE OFFSPRING from z_i to z_i'. The w_i part ACCUMULATES those changes to EACH offspring. For example, if w_1=2, then group 1 parents have 2 offspring. So the total increase in the trait from group 1 is not delta(z_1) but is 2*delta(z_1). So you can see how this is the "BACKGROUND BIAS" representing the "w*delta(z)" component that we get even without worrying about differential fitness. This represents the change in "w*delta(z)" just due to INDIVIDUALS and POPULATION GROWTH.

Next, look at the covariance:

cov(w_i, z_i)

The covariance of w_i and z_i is a measure of how much the DIFFERENTIAL FITNESS contributes to added trait. Recall the formula for cov(w_i,z_i):

E( (w_i-w)*(z_i-z) )

which is equivalent to:

( n_1*(w_1-w)*(z_1-z) + n_2*(w_2-w)*(z_2-z) + ... )/n

Here, the quantity (w_i-w) is the "differential fitness" of group i, and the quantity (z_i-z) represents the location of the trait with respect to the average trait. So:

* if the fitness varies in a similar way as the level of trait across values of i, then the average value of the trait will tend to increase from population to population

* if the fitness varies in exactly the opposite way as the level of the trait across values of i, then the average value of the trait will tend to decrease from population to population

* if the fitness varies differently than the level of the trait, then there will be little change in the average trait from population to population

* if there is no variance in either fitness nor level of the trait, there will be little change in the average trait

Put in other words:

* if high differential fitness always comes with high values of the trait and low differential fitness always comes with low values of the trait, then there will be selection toward MORE trait

* if high differential fitness always comes with to low values of the trait and low differential fitness always comes with high values of the trait, then there will be selection toward LESS trait

* if differential fitness variation has no relationship to trait level variation, then selection will not change the average value of the trait

* if there is no variation in the trait or no variation in the fitness, then selection will not change the average value of the trait

Put in MORE words at a more individual group level:

If a group i has both a high "differential fitness" (w_i-w) AND a high (z_i-z), then its FITNESS w_i is far above the average fitness w and its level of the trait z_i is far above the average value of the trait z. Either one of those alone would be enough to cause the "total amount" of trait to shift upward. On the other hand, if BOTH (w_i-w) and (z_i-z) are NEGATIVE, then the average population is already far away from this trait value AND has a much higher fitness. Consequently, the motion of the average trait will still be upward, but here upward is AWAY from the trait z_i (because z_i is under the average z). Finally, if (w_i-w) and (z_i-z) have opposite signs, the motion of the average trait z will be negative, which will either be heading toward z_i if w_i>w or away from z_i if w_i<w. The covariance formula takes the average value of (w_i-w)(z_i-z). That average represents the contribution to the amount of trait "added" to each individual due to DIFFERENTIAL FITNESS.

So there you have it. Assuming that "w" (average fitness -- which is a growth rate) is not zero (which just assumes that the population does not die out in one generation), then we can divide everything by "w" to get a less complicated (but equivalent) Price equation:

delta(z) = ( cov(w_i,z_i) + E(w_i*delta(z_i)) )/w

So now we have an equation representing the average change from parent to offspring population. If you expand all the formulas, you can verify that this statement is equivalent to:

delta(z) = cov(w_i/w, z_i) + E( (w_i/w)*delta(z_i) )

The quotient "w_i/w" is a "fractional fitness." It is a measure comparing the fitness of group i with the average fitness, where high differential fitness corresponds to w_i/w > 1 and low differential fitness corresponds to w_i/w < 1. So let's create a new variable

v_i = w_i/w

to be the fractional fitness. Then we can rewrite Price's equation to be:

delta(z) = cov( v_i, z_i ) + E( v_i*delta(z_i) )

This version gets rid of the need to worry about scaling for population growth. If you think about it, v_i is just a normalized version of w_i where you have "factored out" the background growth rate of the population. So now we basically have:

AVERAGE_CHANGE
=
POPULATION_CHANGE_DUE_TO_DIFFERENTIAL_FITNESS
+
POPULATION_CHANGE_DUE_TO_INDIVIDUAL_CHANGES

In other words:

"the change in the average value of the trait is due to two parts:

1. The differential fitness of each value represented in the population

2. The individual change from parent trait level to offspring trait level"

So if you wish to go back to the "height" example...

"The average height increases when:
1. Natural selection favors increases in height
OR
2. Tall people have taller offspring"

You could create other variations that work as well:

"The average height DEcreases when:
1. Natural selection favors DEcreases in height
OR
2. Short people have shorter offspring"

====

"The average height stays the same when:
1. Natural selection has no preference for height
AND
2. Short people have short offspring and tall people have tall offspring"

====

"The average height DEcreases when:
1. Natural selection has no preference for height
AND
2. Short people have short offspring and tall people have short offspring"

====

"The average height INcreases when:
1. Natural selection has no preference for height
AND
2. Short people have tall offspring and tall people have tall offspring"

Friday, February 18, 2011

Dr. Bernoulli gets a job: Mathematics of the Job Search – Faculty Version

I recently found out that the Duke Computer Science department had 404 applicants for the open position in their department. I mentioned that to a CS professor from a different university, and he didn't seem surprised by that number. Moreover, when you think about how many "faculty candidate" lectures there usually are within a CS-like department each hiring season, and you consider that those interviewees are likely a small selection of the total applicants, then 404 starts sounding reasonable.

When there are 404 applicants who each have PhD degrees, publications, and possible post-doctoral or existing faculty appointments, let's also assume that the objective function that each department is maximizing is pretty flat. If you don't like that assumption, then assume we have no prior information, and so we will maximize entropy and assume that each applicant has a 1/404 chance of being picked for the job (in reality, this probability is itself conditioned on whether the state steps in and has a hiring freeze... so the real probability might be closer to 1/1000). So that is a very low number. Can we fight low probability with high volume of applications?

Assume we apply to N schools where the probability of getting an offer is

p = 1/404

at each of them. Then the probability of not getting an offer from each of them is

1 - p = 403/404,

and so the probability of not getting an offer from all of them is

(1 - p)^N = (403/404)^N.

So finally we arrive at the probability of getting an offer from at least one of them, which is

1 - (1 - p)^N = 1 - (403/404)^N.

Hypothetically speaking, let's say you apply to N = 50 such positions. Then you have a

1 - (403/404)^N = 1 - (403/404)⁵⁰ ≈ 11.65%

probability of getting the offer. Of course, if you were paying attention, you remember that p (1/404) is very small in this example. Consequently, the (1 - (1 - p)^N) curve looks linear for a wide region around the origin. So even though you remember your fourth-grade math teacher teaching you that you cannot additively accumulate probabilities (i.e., your probability of getting a job is not (N × p)), in this small-p case, it is a pretty decent approximation. In particular, even with our ostensibly large N, it is the case that

N × p = (50)(1/404) ≈ 12.38%,

which is pretty close to our slightly more dismal 11.65%.

In December, I ran into a woman who just got finished submitting all of her faculty positions. She said she applying to just 10 of them because she was exhausted and figured she was just practicing this round. Setting N = 10 reduces your chances to 2.45%. Having said that, the distribution across the applicant pool is certainly not flat. Her home institution, research, adviser, and other factors make her a very attractive candidate who will likely do well with such a low N... In fact, she was recently interviewed at a university near me (that, again, may have to deal with hiring freezes, etc., in the near future).

Now, in my case... Maybe I should burn my CV and dust off my résumé... I hope I'm not too old and outdated.

Friday, April 16, 2010

Is an SPSS monster like a SAS bunny rabbit?

A friend of mine had a Google Talk status of "Now I'm the SPSS monster" today. Lately, I have picked up the contagious habit of making fun of people who use gooey (GUI) SPSS, and so I responded by e-mail, "Is an SPSS monster like a SAS bunny rabbit?" She responded, "Could be. Or an R-invader." I couldn't resist letting this snowball turn into the avalanche it really could be, and so...

Kick S. Way to JMP on that one and even Z-score. Such a rejoinder makes me want to click away to one of the Minitabs of my browser. Phew, all of this stat talk makes me want to regress back into MATLAB; even if I am still centrally limited there, at least I can feel normal again.

Anyway, I wasn't trying to be mean. If I was, I hope you won't log this transformation and hold it against me later. I'm certain I can transcend and function better in the future; a higher power law need not intervene. Hopefully this hypothesis is correct and you will see some significant change. That should help you restore your confidence.

On a different note, I saw some Monte Carlo tulips at the zoo last weekend; it seems risky to have planted those at this time of the season, but hopefully they will Excel. If they do die, I'm afraid this story will have a heavy tail indeed.

By the way, yesterday for graduate appreciation day, Jessie got a coupon for $1 coffee at the expensive campus Starbucks. With the discount, prices are about normal. I guess there is no such thing as a scale free lunch. Shoot, I'm afraid my coffee has gone cold and is starting to taste a little bit like Poisson.

Well, enough of this. I'm sure if you remove the outlier that is e-mail thread, you'll find that the remaining e-mails are far less skewed and better fit the distribution you have come to expect.

I hope all of your days are better than average! --
Ted

There are parts of that that I'm not that excited about, but overall I'm pretty proud of myself.

Sunday, February 04, 2007

Do (Reuters) reporters know absolutely nothing about everything?

UPDATE: It was pointed out to me by a post on Nanotechnology Today that the Professor who heads up the research group that built the "demon" said (speaking about J. C. Maxwell):
"As he predicted, the machine does need energy and in our experiment it is powered by light. While light has previously been used to energise tiny particles directly, this is the first time that a system has been devised to trap molecules as they move in a certain direction under their natural motion. Once the molecules are trapped they cannot escape."
Again, what's going on at Reuters?! This quote is EXACTLY the opposite of their summary (quoted below).

It's just silly that the article "1867 nanomachine now reality" has gone the extra mile to be completely worthless. It's also silly that CNN has decided to put this in their "Offbeat news."

The experiment described in the article involves Maxwell's Demon, which is a thought experiment involving a "paradox" of statistical thermodynamics. However, nowhere in the article is this paradox ever mentioned. In fact, they go so far as to say this:

His mechanism traps molecular-sized particles as they move. As Maxwell had predicted long ago, it does not need energy because it is powered by light.

Now, I'm guessing that the scientists involved said that Maxwell's paradox was a paradox because his little demon did not require additional energy. However, this device doesn't cause any paradox because their demon DOES require additional energy IN THE FORM OF LIGHT. As I explain to my fourth graders, light is energy. Nearly all of the things that end in "cycle" in the study of earth and life science are driven by the energy brought from the sun in the form of light.

Anyway, the article completely misses the point and is filled with lots of misunderstandings and statements which could generously be called wrong.

Do they have editors at Reuters? To cut costs did they just fire them all and hire the cast of Who's the Boss? instead?

Pages