Disclaimer

Do your homework before you invest. I am not a professional. I just enjoy investing. I am often wrong.

Friday, August 25, 2017

The bias of the null hypothesis

"An infinite number of monkeys sitting at an infinite number of typewriters would eventually reproduce the works of Shakespeare by chance."

Which is true, but an absolutely ridiculous statement, and practically meaningless, because humans cannot comprehend the value of infinity.  Everything is possible in infinity.  It is the same as saying that there is a remote possibility that the world was created and continues to be ruled by one snail.  The probability that a chorus of monkeys types anything noteworthy that is longer than a page in any finite amount of time is close to nil.  For an example of how improbable the monkey theory is within a time period that is comprehensible by humans, see this study: https://www.theguardian.com/uk/2003/may/09/science.arts and this funny clip from the Simpsons: https://www.youtube.com/watch?v=no_elVGGgW8.

This absurd statement demonstrates that infinity is incomprehensible, but usually the speaker misses the point, and says it to prove that if you try a random thing over and over again good things might end up happening.  Which is usually wrong.  If you try a random thing, good things, bad things, or nothing might end up happening.

In statistics and the scientific method, every study must affirmatively show that a certain result has significant probability within your data set - usually 95% or 99% - in order for such result to have statistical significance.  If there is no statistical significance, it is assumed that the result does not exist.  The original assumption is always that a certain result or correlation does not exist, and is called the null hypothesis, Why is this?  It is elegant, but it is not always right.  The null hypothesis makes sense when testing arbitrary ideas.  For example, the idea I wrote above that the world was created by a snail can only be described as arbitrary.  The null hypothesis is that such idea is wrong.  The evidence in favor of such idea is scant, so the null hypothesis wins.

But what about ideas that are not so arbitrary?  The idea that things are ordered rather than random, and the idea that progress is designed rather than lucky.  By default, statistics says you should not presume a correlation or pattern exists until you can prove it beyond 95% probability, so the ineffable becomes false.  This is the bias of the null hypothesis.  It has dramatic results in the real world.

I find it quite bizarre to argue that the evolution of millions of species of plants and animals, each of which has different and valuable skills which give them the greatest chance at survival using the least amount of evolutionary capital, was completely random.  Evolutionary capital the degree of variation from the previous species.  It takes energy and effort to move the mean of a set of data, and the farther you move that mean, the more energy it takes.  In animals, the mean is the current gene pool and each species must adapt by changing its gene pool in order to survive and thrive.  The genes of various animals might try different alterations to add genetic features to their offspring and see what gives them a better chance of survival.  Any genetic alteration must be present in a significant number of the offspring in order to take effect and change an animal population.  Therefore, the genetic alteration that is the smallest change from the parent's genes and gives the children the greatest chance of survival gives you the most bang for your buck and is most likely to occur.  This is random in the sense that it occurs through trial and error.  But it cannot really be RANDOM.  

I admit I am no expert on genetics; however I know there are a staggering number of combinations of genes and alleles that make up the DNA of an animal.  From what I understand, it is similar to a long computer program, and gene has a different function and impact on the physical body.  A random alteration of such a complex process would rarely be beneficial to the body as the whole.  It would be quite lucky to randomly change a gene and get a positive result.  But that is what many people believe occurs in evolution, because randomness is the null hypothesis.  We are not advanced enough scientifically to find the pattern; therefore we cannot prove it exists; therefore, statistically it does not exist.  This is a flaw in statistics.  On top of this first alleged randomness - the random gene alteration - is a second bit of randomness which is the randomness of life and survival of the species.  So out of billions of potential gene variations, you randomly find one that is beneficial to the species.  Say you have a fish and an (allegedly) random gene variation of a bigger dorsal fin allows it to swim 2% faster than its brothers.  How much does swimming 2% faster increase the fish's chance of survival?  The fast fish might be eaten by surprise, or catch a disease, or get caught in a net.  Whether the more theoretically fit fish survives is itself a product of chance.  (Again, it is not random, as the fast fish is more likely to survive than its brothers).  Combine these two layers of chance - the chance of having a good gene mutation, and the chance of such good gene mutation surviving.  If both of those layers of adaptation were truly random, evolution would be extremely inefficient.  But it is not.  Yet we must presume randomness because it is the null hypothesis and we cannot prove anything else.

I did not mean to go into such detail about evolution.  There are more practical aspects of the bias of null hypotheses.  An egregious one is the null hypothesis that the markets are efficient.  In order to prove the markets are inefficient, you must prove there is a strategy that exists to consistently beat the markets.  The null hypothesis is that each strategy does not beat the markets. The problem is once a strategy is known and studied, the market adapts to become more efficient and incorporate that strategy into its pricing of companies.  So the market is smart but not perfect and not efficient.  But the null hypothesis is they are efficient, and people fall for that.

Plenty of clever sports statisticians fall victim to the bias of the null hypothesis when they confidently proclaim that there is no such thing as choking under pressure in sports, there is no such thing as a clutch performer, and there are no hot or cold streaks.  For each of these tests, the null hypothesis is that the phenomenon does not exist, and that all variation in performance is a random deviation from each player's mean skill level.  But people are not robots, and the null hypotheses are wrong.


No comments:

Post a Comment