The Signal and the Noise
Author: Nate Silver
Table of contents
Gutenberg's invention of the printing press in 1440 was the first information technology revolution. Making information available to the masses led to an explosion of ideas. This had unintended and unpredictable consequences.
Before then, books were a luxury product for nobles. The difficulty and slowness in producing them meant it took tremendous effort even to prevent the amount of recorded knowledge decreasing over time.
30x the number of books were produced in the first century after the printing press was invented that before, allowing a store of human knowledge to accumulate.
The amount of information increases much faster than our ability to differentiate useful from non-useful information or understand what best to do with it.
Our natural shortcut when dealing with more information than our brains can process is to pick out the parts we like and ignore the rest. We befriend the people who make the same choices as we do. This leads to polarisation and isolation in the national, political and religious spheres.
In Shakespeare's time, "prediction" referred to something a fortune teller might tell you, whereas "forecast" was a plan you created under uncertain conditions with prudence, wisdom and industriousness.
The Industrial Revolution largely began in Protestant countries who had a free press. It led to incredible economic growth, far outpacing the rise in population.
There is danger whenever information grows faster than our understanding of how to process it.
The computer age began around 1970, with computers being used in lab and academic settings. Computers were used to produce models of the world. It was the peak time in terms of huge amounts of theory being applied to small amounts of data. Eventually we realised how crude and assumption-laden the models were. Even though computers enabled great precision, the models did not lead to accurate predictions.
There was a "productivity paradox" in the 1970s-1980s whereby this led to a temporary reduction in economic and scientific productivity. Productivity improved in the 1990s as we became more realistic in understanding what new technology could do for us.
Stories of prediction are often those of long-term progress but short-term regress
In an age of Big Data we're generating huge amounts of data, most of which is very recent.
Data-driven predictions can succeed—and they can fail. It is when we deny our role in the process that the odds of failure rise. Before we demand more of our data, we need to demand more of ourselves.
The start of the 21st century was bad for Americans in terms of prediction:
- Failure to see the September 11th attacks coming despite having relevant information.
- Failure to predict the financial crisis.
- Failure to predict recessions.
- Repeated failures to predict political outcomes.
- Predictions of earthquakes that didn't happen and failures to predict those that did.
- Fields like biomedical research were plagued by findings that could not be replicated outside of the original lab.
Humans are wired to detect and respond to patterns without hesitation.
Our brains are necessarily selective in the information they can store and process. Unless we actively work to understand our biases, having access to more information may be useless or even harmful.
The more informed partisans are about global warming the less they agree with each other.
The amount of total information is increasing far faster than the amount of useful information.
Capitalism and the internet create conditions where ideas and information, whether good or bad, spread efficiently.
We like to, and need, to predict things - but we're not naturally good at it.
It isn't possible to make perfectly objective predictions. But a belief that objective truth exists and is desirable to understand is a precondition for improving them.
The fact that many theories that we've tested so far have been wrong or incomplete suggests that many we can't or haven't tested as yet probably also are.
The solution is to change our attitudes in a way more congruent to Bayes' theorem. We shoudl become more comfortable with probability and uncertainty, and more aware of the assumptions and beliefs that we bring along with us when trying to predict something.
We think we want information when we really want knowledge.
A Catastrophic Failure of Prediction
The financial crisis was a prediction failure. The default rates for CDOs were > 200x higher than S&P ratings agency predicted.
Disastrous failures of predictions tend to come from:
- Focussing on signals that tell us what we want to see, not what's really out there.
- Ignoring risks that are hard to measure, even when they're huge threats.
- Not understanding how inaccurate our approximations and assumptions are.
- Ignoring uncertainty, even when it's intrinsic to our problem.
Human beings have an extraordinary capacity to ignore risks that threaten their livelihood
"Unknown unknowns" are dangerous, but not as much as risks we know about but incorrectly believe we mitigated.
From Douglas Adams:
The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at or repair.
- Risk: something you know the probability of ahead of time so can account for it.
- Uncertainty: a risk that's hard to measure, no way to know if your estimate is off by orders of magnitude.
Ratings agencies had incorrectly modelled mortgage default risks as uncorrelated. They treated uncertainty as risk.
We should avoid mistaking confident conclusions for accurate conclusions. Likewise, precise forecasts are often incorrectly assumed to be accurate forecasts.
Sensible buyers avoid trading in a market where the seller knows far more about the product than they do. But this is mitigated when a trusted friend or organisation vouches for the product. The latter is effective what the ratings agencies did.
Two types of feedback:
- Negative feedback: e.g. supply and demand: when prices go up, sales go down. This is usually good in a market economy.
- Positive feedback: e.g. people estimating the value of houses by comparing to other houses, assuming if one sells for a large price then they all will. Or investors purchasing products at higher prices because everyone else seems confident about them. This is where bubbles come from.
Economists have never been able to predict major economic indicators such as the employment rate very successfully.
Four aspects of prediction failure with regards to the financial crisis include:
- Homeowners and investors thought that rising home values meant that they'd continue to rise, despite historic evidence to the contrary. This created a bubble.
- Ratings agencies and banks didn't understand how risky mortgage-backed securities were. They didn't miss the housing bubble as such, but their forecasts as to the impact a collapse would have were overconfident and had poor assumptions in terms of risk.
- People didn't realise that a housing crisis could trigger a global financial crisis, because of the high leverage involved.
- Just after the crisis happened people failed to predict the extent of the economic problems it'd cause.
For each of the above, people missed an important bit of context when working with their data. The events they were trying to predict were out of sample.
The data ratings agencies used was based on US housing since the 1980s. Home prices had never declined in sync during that period, thus the data couldn't have described what would happen when they did.
Forecasters tend to ignore out-of-sample issues as they tend to weaken the relationships in the data and result in less powerful models, which personal and professional incentives discourage.
The amount of knowledge in the world may be increasing, but so is the gap between what we know and what we think we know.
Are you Smarter than a Television Pundit?
Political analysts failed to predict the demise of the USSR because it was necessary to integrate many different arguments originating from different sides of the political spectrum. Most pundits and forecasters show poor performance along with overconfidence when predicting political events, although some do better than others.
Philip Tetlock classifies experts as being like hedgehogs or foxes. The reference is to Archilochus, a Greek poet, who wrote that “The fox knows many little things, but the hedgehog knows one big thing.”
How foxes think:
- Multidisciplinary: incorporate ideas from different disciplines from anywhere on the political spectrum.
- Adaptable: Switch approaches if what you're doing isn't working.
- Self-critical: Acknowledge your mistakes.
- Tolerant of complexity: Realise the world is complicated, perhaps it's impossible to resolve certain problems.
- Cautious: Express predictions in qualified probabilistic terms.
- Empirical: Rely more on observation than theory.
How hedgehogs think:
- Specialised: focus on 1-2 big problems. Be skeptical of the opinions of outsiders.
- Stalwart: Stick the same approach, using new data to refine the original model.
- Stubborn: Blame mistakes on bad luck.
- Order-seeking: Assume that the world follows simple governing relationships once you eradicate the noise.
- Confident: Don't hedge or change predictions.
- Ideological: Expect that the solutions to most problems are manifestations of a grander theory.
Foxes are better forecasters. They appreciate that data is noisy and know what they don't know. They can separate their ideas of how the would should be from their analysis of how it actually is and will be.
Foxes get better at forecasting with experience. Hedgehogs get worse, possibly because the more facts they have the more they can use them to confirm their biases.
The incentives for public intellectuals who want to attract attention are generally to make bold and confident predictions about dramatic changes.
Partisan ideology makes political predictions particularly hard for hedgehogs. Forecasters who support one party rarely predict more favourable outcomes for their opponents compared to the average consensus.
Politics is often infused with drama, which makes it hard to make good predictions in general. Meaningful political news appears only now and then. But political news is broadcast every day, most of it being trivial stories designed to hide the fact that they're unimportant, increasing the noise.
Principles for good forecasting:
- Make probabilistic forecasts. Describe a range of possible outcomes. If you forecast something as being 90% likely to happen you are implicitly forecasting it as 10% likely to not happen. Over time these probabilities will be accurate for a good forecaster.
- Make the best forecast you can today, irrespective of what you forecast in the past. Per Keynes, "When the facts change, I change my mind". Correcting course is not a sign of weakness or cheating. Those offering critiques may be misunderstanding something like politics as being akin to physics, run by deterministic, knowable and predictable universal laws. Political forecasting is more like poker. But if your forecasts are constantly and dramatically changing then consider whether you have a bad model or you're trying to predict something as yet unpredictable.
- Seek consensus. Whilst people like to make dramatic counter-intuitive forecasts, and sometimes it is right to do so, it's much more likely that a forecast in line with the consensus will be more accurate. Group forecasts are typically 15-20% more accurate than individual ones.
Fundamentals based models are worse predictors of presidential elections than models that combine economic data with polling and other types of data.
Objective isn't a synonym for quantitative. It means to ignore our personal biases and prejudices, which is desirable but unobtainable in reality. For instance, we make our own choices as to what method we use to forecast something. We introduce assumptions into our models.
Our best defence is to recognise what assumptions we're making and question our own decisions.
All I Care About Is W’s and L’s
Baseball has an extremely rich associated dataset - large numbers of variables from the hundreds of games that are played in big leagues each year.
The rules of the game are very orderly; players take turns and are mostly responsible for their own statistics, making it relatively easy to analyse causality.
A good prediction system for baseline must:
- Account for the context of a player's statistics.
- Distinguish skill from luck.
- Understand how their performance changes as they age
Players tend to improve until they're in their late twenties, peaking at 27. Their performance declines when they reach mid-thirties. This is the aging curve, which is also seen in:
- Olympic gymnasts (who peak in teenage years).
- Poets (in their 20s)
- Chess players (30s)
- Applied economists (40s)
- Fortune 500 CEO (average age is 55).
One common bias statisticians hold is that if something isn't easy to quantify then it doesn't matter much.
Certain findings such as OBP being a more useful statistic than batting average have proven to be correct. Winning these arguments may have given the stats community undue complacency or dismissiveness about areas which are more ambiguous. Baseball statistics get less useful the further you are away from the major leagues and when you're trying to predict things into the further future.
The scouts have traditionally looked at five tools in lieu of statistics: hitting for power, hitting for average, speed, arm strength, and defensive range. Some of those are really statistics though.
Sanders believes that the player's mental toolbox predicts baseball success, incorporating five factors:
- Preparedness and work ethic
- Concentration and focus
- Competitiveness and self-confidence
- Stress management and humility
- Adaptiveness and learning ability
These are often useful traits in other careers too, including for forecasters.
Making a good forecast is about weighing information appropriately, not excluding qualitative information. Often it's possible to translate qualitative information into quantitative information to test within a statistical model.
The lines between qualitative and quantitative information change over time. In baseball the introduction of Pitch f/x technology is changing a type of information previously only available from scout's observations into a quantitative measure.
We instinctively like to classify information into small numbers of categories. A risk is that when something doesn't neatly fit into a category we ignore or misjudge it.
The information revolution has lived up to its billing in baseball, even though it has been a letdown in so many other fields, because of the sport’s unique combination of rapidly developing technology, well-aligned incentives, tough competition, and rich data.
For Years You’ve Been Telling us that Rain Is Green
The catastrophic effect of hurricane Katrina on New Orleans had been predicted days before the levees were breached, saving many lives. However almost 20% of its inhabitants didn't evacuate, with 1,600 dying.
Two thirds of the survivors had stayed because they thought the storm wouldn't be as bad as predicted. Other were confused by the way the evacuation order was delivered.
Just because we can predict something doesn't mean we can alter it. For a forecast to be useful someone must take action based on it.
Humans have always tried to predict things in their environment. The original religious ideas of predestination was largely replaced by scientific determinism. That idea, summarised metaphorically via Laplace's demon, was basically that if you knew everything about the present state of the world, and you knew all the laws that govern the universe, then you could in theory make perfect predictions.
Against determinism is the idea of probabilism, suggesting that it would be impossible for us to know with certainty the exact conditions of the universe. At first this was mostly epistemological - we simply didn't have the tools to measure things accurately. But when quantum mechanics was discovered some scientists and philosophers argued that the universe itself behaves probabilistically. Heisenberg's uncertainty principle is usually interpreted to mean that perfect predictions are impossible.
Weather happens at a larger, molecular, level though so we don't need quantum physics. Historically it was a lack of computational power that held back our ability to predict it rather than a lack of theoretical understanding.
Computers aren't fast enough to calculate the effect of every molecule in the earth's atmosphere, so they approximate it by breaking down the atmosphere into pixels (aka grids, matrices, lattices).
Whilst computing power has improved exponentially in recent decades, the accuracy of weather forecasts has only risen slowly. This is because the obvious way of improving accuracy - reducing the size of the grid - is a multi-dimensional problem that requires 16x the computation power to reduce the grid size by half. However this isn't the main obstacle.
Chaos theory applies to systems that:
- are dynamic; the behaviour of the system at one point in time influences its behaviour in the future.
- are non-linear; operating by exponential rather than additive relationships.
Weather is such a system. These factors make it very hard to forecast. A small change in initial conditions can produce a huge and surprising difference in outcomes even though no randomness is involved.
The first problem is that we have inaccuracies in our data or assumptions. A linear operation is quite forgiving - adding 6 to 5 instead of 5 to 5 just gives 11 instead of 10. But exponential operations are much more dangerous. If we raise 5 to the power of 6 instead of 5 we end up with 15,625 instead of 3,215. In dynamic systems where the output at one stage is the input to another this can get out of control quickly.
For weather, we cannot observe the atmosphere with infinite precision. Modern forecasters usually make several forecasts with very small intentional differences in their original conditions, e.g. a slight increase in pressure in the first model, a slight increase in wind in the second model. They then report the forecast probabilistically. A 40% chance of rain means that in 40% of the simulations showed an outcome involving rain.
Today's forecasts are a combination of computer and human judgement. Humans look at a graphical visualisation of the computer forecast and adjust for any outliers they see. We're still better than computers at this, who struggle to recognise patterns that aren't exactly the same as each other. Our visual cortex helps us abstract out patterns and organisation through any distortions. Human involvement improves the accuracy of precipitation forecasts by 25% and temperature forecasts by 10%.
In recent times the average forecast of high temperature is out by about 3.5 degrees, a big improvement from recent decades. The average miss in hurricane forecasting is now 100 miles.
In 1940 Americans had a 1 in 400k chance of being killed by lightening. Now it's 1 in 11 million. This is influenced by changes in living patterns, enhanced communication technology and medical care, as well as better weather forecasts.
3 definitions of forecast quality in the weather forecasting community:
- Accuracy: did the weather match the forecast?
- Honesty: irrespective of outcome, was the forecast the best one the forecaster could make at the time based on the data?
- Economic value: did it help people make better decisions?
Ideally we should incentivise both accuracy and honesty, but that's not always the case.
Weather forecasting models must beat two baselines:
- Persistence: assumes tomorrow's weather will be the same as today's.
- Climatology: assumes of weather on a date will match the long term historical average of that date.
Forecasting models are less accurate the further in advance they try to predict the weather. Forecasts more than about a week away are worse than climatology methods. But yet commercial forecasters continue to produce them.
The perception of accuracy, not accuracy itself, is really the primary goal of commercial weather forecasting. They rarely forecast a 50% chance of rain as consumers look at that result as being indecisive. Most are deliberately biased towards forecasting a higher amount of precipitation than the real forecasts show because consumers notice the failure to predict rain more than the failure to not predict rain.
Evacuations themselves can be deadly so the decision to evacuate due to forecast weather events is hard. Authorities have to convert a probabilistic decision into a deterministic one.
Having survived one hurricane makes people less likely to evacuate for the next one.
It is forecasting’s original sin to put politics, personal glory, or economic benefit before the truth of the forecast. Sometimes it is done with good intentions, but it always makes the forecast worse.
Desperately Seeking Signal
Earthquakes cause more human deaths than hurricanes, in part because they are hard to predict.
For a seismologist:
- Prediction is a specific and definitive claim about when and where an earthquake will happen.
- Forecast is a probabilistic statement usually over a longer time frame - e.g. there's a 60% chance of an earthquake within this region in the next 5 years.
The USGS website provides tools to forecast earthquakes, but they do not claim to be able to predict them.
USGS forecasts use the Gutenberg-Richter law in their forecasts. It states that the frequency of earthquakes drops off as the magnitude of them increases in a power law distribution. This means you can forecasts the number of (rare) big earthquakes from knowing the number of (plentiful) small ones or vice versa. It works at an individual region level as well as globally.
However it doesn't tell us when the earthquake will strike, just the expected rate over a time frame.
Seismologists ideally want time-dependent forecasts that don't assume the probability of an earthquake is constant over time. At present, if a city is forecast to have an earthquake every 50 years then we have to assume that there's a 1 in 50 chance of it happening every year, irrespective of how long it's been since the last one.
Poorer countries don't have the resources to make "just in case" plans for rare but potentially devastating events. Thus forecasts are not always actionable.
There are some patterns, e.g. aftershocks are most likely to occur very soon after an earthquake, although they're typically less powerful. Half of major earthquakes are preceded by detectable foreshocks. But none of this has proven reliable enough to predict major earthquakes accurately.
When developing forecasting models they should be judged by their forward-looking predictions. Predicting the past isn't a sign of usefulness.
When we mistake noise for signal this is statistical overfitting; models that provide a too-specific solution to a general problem. The opposite, failing to capture as much signal as you could, is called underfitting, but is seen less often in practice.
Most real world tasks require induction, inferring structure from observed evidence. Overfitting is more likely in cases where data is limited and noisy and we don't have a good theoretical understanding of the mechanisms involved in the event.
Overfitted models often score better on typical statistical tests, but this is irrelevant; we usually need to explain the real world, not fit past noise.
The theory of complex systems claims that simple things can interact with each other in mysterious ways. These systems tend to remain static most of the time but will then suddenly fail catastrophically. They're not random, but rather are so complex to model that it may never be possible to predict them.
Scientists have been put on trial for manslaughter for failing to predict earthquakes, which is ridiculous. Forecasters must always prioritise the truth, and not let politics distort their findings.
How to drown in three feet of water
Political polls are often reported with a error margin, whereas economic predictions usually only present a single number giving us the incorrect impression that they are extremely accurate. In reality, economic forecasts have often failed even to "predict" a recession when we are already in one.
The perception of their accuracy is much higher than their actual accuracy. But some forecasters feel like expressing the appropriate level of uncertainty would threaten their reputation. In reality, probabilistic outcomes are key to scientific forecasts. The same overconfidence has been found in other fields too, including medical research, political science, finance, and psychology.
A prediction interval is a range of the most likely outcomes a forecast is predicting. So a 90% prediction interval should cover 90% of the possible real life outcomes.
The true 90% prediction interval on economist's forecasts is around 6.4 percentage points of GDP (i.e. +/- 3.2 pp). Thus hearing that GDP will grow by 2.5 percent could realistically mean anything from a growth of 5.7% to a loss of 0.7%.
Economic forecasters get large amounts of feedback but don't choose to correct their overconfidence. This is probably because there is little real incentive to make good predictions.
3 challenges for economic forecasters:
- It's hard to determine cause vs effect from economic statistics.
- The economy always changes, so the true explanations of what is happening today might not be the same tomorrow.
- The data they have to build their forecasts from is frequently of bad quality.
It's hard to determine cause vs effect from economic statistics.
Some companies offer to provide up to 4 million different economic indicators. This tempts economists to test all of them in their forecasts, even though the sample of what they want to predict may be tiny, e.g. recessions (the US has had only 11 since the end of World War 2). Over-fitting is rife, as well as breaches of the "correlation does not imply causation" rule. For example, unemployment rate can be both a leading and a lagging indicator.
Vicious circles can emerge such as businesses not hiring due to low consumer demand but the low consumer demand is because businesses aren't hiring so customers can't afford their product. Consumer confidence affects consumer behaviour, causing feedback loops.
Feedback loops between economic forecasts and economic policy are also challenging - if a recession is forecast then the government may change policy to try and avoid it.
- Forecasters therefore have to predict political decisions as well as economic ones.
- As the historical economic data relate to the policy decisions at the time you have to take into account the fiscal and monetary policy that was in place at the time.
Goodhart's Law suggests that once policymakers target a specific variable then it no longer works as an economic indicator. This is similar to the observer effect whereby when you begin to measure something its behaviour changes, whilst statistical models often assume that variables are independent.
The economy always changes
The relationship between different economic variables changes over the course of time.
Don't discard data, particularly when forecasting rare events. Forecasters may have felt a severe recession in the US in 2007 was unlikely because their dataset didn't include years in which there was a severe recession.
Sometimes discarding data is rationalised as that there's been some real shift in the problem you're analysing. This may be valid, but in reality you usually don't know when the next paradigm shift will happen. An economic model based on the idea that nothing major will ever change is useless, but to know when the changes will occur is hard.
The source data is bad quality
Like the atmosphere in the context of weather predictions, the economy is a dynamic system where everything affects everything else and there is uncertainty with regards to its initial conditions. But unlike weather forecasting which can utilise relatively simple laws of physics, human behaviour and its impact on feedback loops makes the economy hard to predict.
The "Big Data" inspired sentiment that you don't need theory when you have access to tremendous amounts of information is wrong for forecasting. Statistical inferences based on theory are much stronger.
Like in almost every field it's ever been tried in, aggregating economists' forecasts improves their accuracy; 20% more accurate when forecasting GDP, 10% for unemployment and 30% for inflation.
Most economists rely on judgement as well as statistics. This has been shown to improve the accuracy of forecasts. But it introduces a risk of bias.
The phenomenon of "rational bias" suggests that if you have little reputation to lose then it makes sense to take a big risk when offering up predictions - there's little to lose and you'll gain a lot of credibility if it turns out to be correct. But if you have a good reputation already, you might be biased in the direction of staying in line with other forecasters even when the data disagrees so as not to look foolish if your prediction doesn't happen. These effects likely worsen forecasts.
a bad forecast can make the real economy worse.
2 ideas to address these biases, supply side and demand side:
- Create a market for accurate economic forecasts. Prediction markets where people can bet on outcomes give people a real stake in being accurate vs just looking good to their followers.
- Reduce the demand for over-confident and inaccurate forecasts. As consumers, we have to improve how we consume forecasts.
- Disregard over-confident forecasters with black box models.
- Require reporting of estimates with the appropriate margins of error.
- Recognise that the level of confidence someone expresses in their prediction does not necessarily indicate how accurate it is.
Flu and other infectious diseases are very challenging to predict.
Extrapolation during prediction assumes that the current trend will continue forever. Some of the most problematic prediction failures have come from doing this.
It's most risky in fields such as population growth and disease where the thing you're studying is growing exponentially. Precise predictions are essentially impossible when extrapolating on an exponential scale. What is actually possible often results in too broad a prediction range to be useful.
One of the most useful variables for forecasting the spreads of a disease is its R0. That's its basic reproduction number, measuring the number of uninfected people that on average catch a disease from a single infected person. In theory, left to its own devices, diseases with an R0 > 1 will eventually spread to the entire population.
R0 for various diseases:
- Malaria: 150
- Measles: 15
- Smallpox: 6
- HIV/AIDS: 3.5
- SARS 3.5
- H1N1 (1918) 3
- Ebola (1995) 1.8
- H1N1 (2009) 1.5
- Seasonal flu 1.3
However, it's hard to estimate R0 accurately until the disease has already swept through the population.
The other key variable is the case fatality rate: number of deaths caused by a disease divided by number of cases of that disease. Both sides of that ratio are also hard to assess until after the disease has made its way into a population.
Infectious diseases are another area where the act of prediction alters the way people behave.
- Self fulfilling predictions / prophecies are cases where the prediction can make itself true.
- For example a business might try and predict consumer preferences, but also later influences them towards matching their prediction via marketing.
- People and doctors are more likely to identify that they're suffering from a disease when it's currently in the media. There's a very strong correspondence between e.g. diagnoses of autism and the number of articles about autism in US newspapers.
- Self cancelling predictions are cases where the prediction undermines itself.
- The goal of infectious disease predictions often includes raising awareness to change the behaviour of the public in order to reduce spread. A very effective flu prediction may look to have been wrong because it motivated people to behave in safer ways.
Building statistical models is similar to drawing maps. They must both be detailed and honest enough to represent something of the underlying landscape, but without so much detail that you lose your way.
The most basic mathematical model of infectious disease is the SIR model. Each person is either Susceptible, Infected or Recovered from the disease. A vaccination allows someone to move from S to R without becoming ill. However this model requires many assumptions, some of which are unrealistic. They include:
- Assuming that everyone in a population behaves the same way.
- Assuming that everyone is equally as susceptible, equally vaccinated and meet each other at random.
But complex models also fail, typically appearing to be more precise without being any more accurate. We need "sophisticated simplicity".
Agent-based modelling - modelling the behaviour at a per-person level - is ambitious and well-resourced, but can be undermined by a lack of detailed data. Agent based models for infectious diseases are also hard to test as major epidemics are rare. And if they work as they should then they might self-cancel their otherwise correct prediction.
If you can’t make a good prediction, it is very often harmful to pretend that you can.
Per the Hippocratic Oath, "First, do not harm."
A good model can be useful even when it fails.
Language can be thought of as an approximating model of the world that we use to communicate to each other. Each language has some words with no direct counterparts in another language, even though they're both trying to explain the same world.
Statistical models are tools to help us comprehend how the universe works, not the universe itself.
Less and Less and Less Wrong
Successful gamblers think of the future as being fields of probability rather than no-lose bets coming from perfect theories. When the probability of an event is out of line from the odds offered by bookmakers, the gambler will bet on it. In the long term, the wins and losses will average out to something with positive expected value.
Bayes argues that whilst it's possible that the world is full of certainty, the way we learn about it is not. We need to gather more and more evidence to get closer to the truth and reduce our epistemological uncertainty. Rationality is a probabilistic matter.
Bayes' theorem looks at conditional probability; the probability that a theory is true if a certain event happened. It requires knowing or estimating 3 quantities:
- The probability of the event happening if the hypothesis is true.
- The probability of the event happening if the hypothesis is false.
- The "prior probability": the probability you would have assigned to the hypothesis being true before you knew whether the event happened or not.
With those we can estimate the probability we're interested in, the "posterior probability" that the theory is true give what we observed.
This approach helps highlight when our gut feeling is likely wrong.
Some priors are strong and resilient, others quickly bow to the observed evidence.
We can repeatedly apply Bayes' theorem as more evidence comes in, using our previous posterior for our new prior.
Bayes can be used to explore why our predictions are more likely to fail in the era of Big Data. There's an exponential increase in the amount of information available to us and potential hypotheses to investigate. But the number of true causal relationships in the data is far smaller and does not increase anything like as fast as the amount of information itself. Even whilst we may be fairly accurate in classifying the truth of each hypothesis, there are so few true hypotheses that Bayes suggests most of our scientific "findings" may be false.
Uncomfortable with the idea of the "subjective" Bayesian prior, Fisher et al developed "frequentism" - a set of statistical tools that use only the data collected from a sample population to test hypotheses.
That school of thought assumes here is that all the uncertainty in a statistical question comes from the fact we can only collect data from a sample of a population, not the whole population itself. The only error is sampling error. However if your samples are biased in any way this assumption doesn't hold.
Frequentism ignores human error, which is the usual reason why predictions fail. The techniques rely on many assumptions so are not really any more objective.
Frequentist approaches discourage researchers from considering the plausibility of their hypotheses. They acknowledge that correlation doesn't always imply causation, but don't encourage us to consider which correlations are in fact causal.
We all have biases. The Bayesian approach explicitly acknowledges them and provides a framework to understand how we react to new information.
Making predictions is the best way to test our beliefs. Those who make the most accurate predictions are the most objective. This is in line with the idea that the scientific community will eventually converge towards the truth as new evidence is found.
Rage Against the Machines
Technology should be thought of as a labour-saving device, not a replacement for our thinking.
Playing chess is a kind of prediction problem.
Advantages of computers:
- Computers can perform calculations very quickly.
- They won't make errors, unless they're errors in its programming.
- They won't become lazy.
- They won't play emotionally, become over-confident or despondent.
Advantages of humans:
- Human minds are flexible and can change approaches to solutions rather than having to follow code.
- We can imagine.
- We can reason.
- We can learn.
In general computers are now better at playing chess than humans.
Bayes teaches us that prediction is an information-processing activity. We can use heuristics when the fully determined solution to a problem exceeds our capabilities, at the risk of introducing biases. We can't make perfect decisions unless we can process all the available information. Acknowledging this imperfection is important.
Computers find abstract and open-ended problems hard. They can use statistics to learn from historical data.
In general, if your model produces an unexpected result then you should lean towards it being a bug - but this is not always the case.
Garbage In Garbage Out: if you give a computer bad code or data then it won't produce good predictions.
Computers are best at forecasting topics where the relevant system has quite simple and well-understood laws which involve solving equations a large number of times.
The Google search results you see are Google's prediction of which results you'll find most useful. They can only use human evaluators on a certain number of representative queries. They also run 10,000 experiments a year. Employees are asked to come up with a lot of creative ideas, which are then tested with Google's data. Most of the ideas don't work so are discarded, but the best ones will be integrated.
A combination of humans using chess computers may be the best "chess player" at present.
We should neither worship at the altar of technology nor be frightened by it.
Computers are created by humans. Is 'artificial intelligence' really artificial given humans designed the systems?
The Poker Bubble
Playing poker requires the same skill as making predictions in general does: making probabilistic judgements in the midst of uncertainty.
Players try to learn which cards their opponents might hold and how that would affect the decisions they make. This prediction has to be probabilistic, refined as the hand plays out. Certainty only comes at the end of the hand.
Good players starting predicting before the game begins, e.g. through looking at opponent's past behaviour, or via stereotypes based on ethnicity, age and sex. Even if the stereotype is only right just over 50% of the time that is still a useful edge.
Thereafter, a Bayesian process is used to learn from their current playing style.
To defend against this, a good player must make their own play unpredictable. Very good and very bad players can appear to behave similarly in this way.
The Pareto Principle of Prediction suggests that the first 20% of your effort can give you 80% of your predictive accuracy. This 20% may largely be based on having the right data, technology and incentives. This implies that the worst forecasters are much worse than the average forecaster than the best forecasters are better than the same average forecaster.
Often it might not be how accurate your predictions are in absolute terms that matters, but rather how accurate they are relevant to your competition. It can be easy to make profits by being reasonably good in fields where other people have bad incentives, bad habits, fewer resources than you or are overly adherent to tradition.
Luck and skill are not opposites; both are required in poker, basketball etc.
Overconfidence is a problem for all fields of forecasting.
The US is a results-oriented society. We tend to believe that the rich deserve to be rich. In reality, success comes from a combination of hard work, talent, opportunities and environment; a combination of noise and signal. We emphasise the former more often, except when we ourselves face a challenge in which case we attribute it to bad luck (attribution error).
This also goes for predictions - people are given credit for an accurate prediction even if it was a product of luck. Other times we incorrectly blame bad predictions on bad luck. We usually attribute more skill than we should to successful predictions. We can hardly ever be absolutely certain whether a particular prediction's accuracy came from skill or luck.
We should judge predictions more rigorously, which can often be done via measuring how accurate forecasts are on average in the long term. But this is slow or hard to do in some fields, in which case the only solution might be to evaluate the process rather than the outcome. Is the forecaster using methods that we know correlate with accurate forecasts?
We must recognise and be comfortable with the fact our world contains both signal and noise.
If you can't Beat 'em'
The volume of stock trading has increased incredibly rapidly in recent years.
Traditional economics assumes that trades only happen when they make both parties better off. However stock trading is rarely driven by this. Instead, it mainly reflects different predictions as to the future returns of a stock. Fundamentally the stock market is a set of predictions about the future earnings and dividends of a company.
Free-market capitalism and Bayes' theorem come from the same tradition. The "Invisible Hand" could be thought of as a Bayesian process whereby prices are updated in response to changes in supply and demand. Both seek to achieve consensus within crowds.
Using betting markets to forecast economic variables like GDP or whether a certain newsworthy event will happen can be beneficial - they provide an incentive to make accurate forecasts.
The efficient market hypothesis claims that under certain conditions it is impossible to outpredict markets. Recent bubbles and busts have lowered the popularity of this view.
Markets reflect humans' collective judgement and hence are fallible.
Combining multiple forecasts is almost always better than taking just one. However:
- A forecast being better than another doesn't mean that it's actually good.
- This principle holds when the forecasts are being made independently of each other. It does not apply in betting markets where people can react to each others' behaviour.
- The aggregate forecast isn't necessarily better than the best individual forecast (although it may be).
The efficient-market hypothesis suggests that the market is unpredictable. Some investors will do better than others in the short term, but over the long term they will be unable to beat the market.
Data on trading performance of an individual is noisy. We mistake luck for skill and fail to take small sample sizes into account. In general you're best off selecting the mutual fund with lowest fees. Buying a market tracker fund guarantees that you will do as well as the average investor.
Forms of the efficient-market hypothesis:
- Weak: stock-market prices cannot be predicted from analysing past statistical patterns in isolation.
- Semi-strong: Analysing the fundamentals of a business (financial statements, business model etc.) cannot produce predictions that will beat the market in the long term.
- Strong: Even private information such as insider secrets are quickly incorporated into prices so will not let you consistently outperform the market.
Qualifications to the theory:
- It is talking about returns on a risk-adjusted basis. The theory allows investors to make an above average return if it's proportionate to a higher level of risk.
- Profits are measured after the cost of trading has been deducted. So a theoretically very slightly profitable trading strategy might not be considered profitable here.
Opposition to the efficient-market hypothesis comes from:
- A demonstration that some investors do consistently beat the market.
- That the returns are predictable, e.g. a sustained increase in stock prices in the late 1990s. If you can predict a bubble in real time then it violates the hypothesis. Many investors did this during the housing bubble.
The price to earnings ratio (P/E) has centred at a value around 15 over time. When the ratio is lower stocks are cheap compared to earnings and vice versa. Historically when the P/E is very high the returns have been negative and vice versa. These patterns are only informative in the long term, saying nothing about the situation next month or year.
Competition between traders may explain why stock prices are so unpredictable in the short term even if they're predictable in the long term. Traders are short-term focussed. If you don't make money within 90 days you may be fired. Most trades are made with someone else's money.
These factors may change the incentives that underlie the efficient-market hypothesis. It's rational for traders to lose money for their firms if it means sticking with the popular opinion, leading to less chance of getting fired. They're optimising for career prospects rather than trading profits.
Herding behaviour comes from psychological reasons. "Follow the crowd" is usually a good heuristic if you don't know better - wisdom of the crowds. But it can be that we trust each other too much and start a bubble of reinforcing each other's opinions rather than correcting each other's mistakes.
These days we share so much information with each other that we're less independent from each other. If a trader has a different opinion to the popular one, it's most often because they've become too confident in their forecasting abilities.
The efficient market hypothesis is self-defeating in that if everyone believed in the theory then no-one would create a market in the first place.
Polls show that the American public is bad at choosing when to buy stocks. Their timing tends to be the opposite of what you would want to do to profit.
Often following the crowd is right. If you want to tread a different path you should require strong evidence as to why you're correct.
Perhaps we can detect market bubbles even if we can't prevent them. The P/E ratio has been a fairly reliable indicator. But regulating against them might prove difficult. Banning short-selling makes it harder to pop bubbles. But we'll never detect bubbles at all if we assume that the markets and their prices are always right.
A Climate of Healthy Skepticism
Thinking about causality can help us make much more accurate and reliable predictions.
The claims about global warming would be much more open to doubt if they had no grounding in a causal mechanism - in this case the "greenhouse effect".
In their 1990 report, the International Panel on Climate Change (IPCC) classified 2 findings as being 100% certain.
- The greenhouse effect exists and causes the Earth to be warmer than it otherwise would be.
- A prediction: human activities are increasing the concentration of greenhouse gases (carbon dioxide, methane, chlorofluorocarbons (CFCs) and nitrous oxide) in the atmosphere causing additional warming. Water vapour will increase in response to this warming and lead to even higher temperatures.
Scientists require extremely high levels of proof in order to conclude that a hypothesis is definitely true.
Since the 1980s the terminology "greenhouse effect" has been phased out with preference given to terms like "global warming" and "climate change". This is because scientists are expanding the implications of the theory. However the lack of cause embedded in the newer terms leads to people forming incorrect beliefs about it.
As Bayesians we should have more confidence in a hypothesis that is backed up by strong and clear casual relationships. New evidence against a theory should reduce our estimate of its likelihood but be weighed against the context of other things we already know about the topic.
CO2 circulates around the plan, thus in order for emissions-reduction targets to be useful they must be applied by all countries.
There are 3 sources of climate change skepticism:
- Self-interest, e.g. from the fossil fuel industry. There's no need to imagine a conspiracy - this is a straightforward rational response by companies that have a financial incentive to keep the status quo. We should not confuse this with attempts to make accurate predictions.
- Contrarianism. In any debate some people will take the "unpopular" view, seeing themselves as persecuted outsiders. Climate change might be especially prone to this as the data involved is noisy and the predictions' implications are not immediately viscerally experienced.
- Science. Some parts of the scientific community have concerns about some parts of the science. In 2008, the large majority of polled climate scientists agreed that climate change is happening and caused by human activity. But there was disagreement and doubt about the accuracy of the models used to forecast climate change. Scientific critiques should be taken seriously.
Armstrong and Green allege that the IPCC forecasts are lacklustre, because:
- Simple agreement among forecasters might reflect bias rather than accuracy.
- Global warming is too complex to forecast reliably.
- The forecasts don't account for uncertainty properly; they are too confident.
People misunderstand consensus. It does not mean a unanimous or simple majority view. It implies "broad agreement after a process of deliberation" - how science should work.
The risks of consensus include herding or groupthink. Some participants may be more influential due to e.g. charisma or status rather than because they have the best ideas.
Studies of consensus-driven predictions have mixed results, whereas when individual members of the group submit independent forecasts that are later aggregated this almost always improves accuracy.
It's important to consider many models; each may have different assumptions and different bugs.
Climate scientists have doubts about their forecasting models - films such as Al Gore’s An Inconvenient Truth do not represent the scientific consensus well.
Meteorologists may be critical of climate science based on their own experience of the difficulties in predicting weather. But climate refers to the long term trends of the planet, whereas weather is about short term deviations.
Like meteorologists, climate change scientists have the advantage of understanding the physical laws governing the system they're trying to forecast. But they do not get fast and frequent feedback in the same way that meteorologists do, making it harder for them to calibrate their predictions.
There are 2 ways to forecast hurricanes.
- Statistical, requiring only a database of past hurricanes' behaviour. Very little meteorological knowledge is required. But there's a limit on how good this approach can be, especially for rare events.
- A simulation of the physical mechanics of the relevant parts of the world. This is harder to do and requires understanding the root causes of what's being forecasted. But it can be more accurate.
Armstrong and Green's criticism of climate forecast comes from their study of forecasts in fields like economics where there are very few physical models and a poor understanding of causal relationships. This is not the case with climate.
Heuristics like Occam's razor are catchy, but can be hard or counterproductive to apply.
The IPCC has developed terminology to describe how certain they are about a finding. For example "likely" means that there's at least a 66% chance of the prediction being correct.
Acknowledging uncertainty isn't the same as estimating it accurately. There's uncertainty about how much uncertainty there is in climate science.
3 components of climatology uncertainty:
- Initial condition uncertainty. These are the short-term factors that compete with the long-term greenhouse effect signal, such as weather. These are less important for longer term predictions.
- Scenario uncertainty. This concerns things like the amount of greenhouse gases in the atmosphere. This is low in the short term but increases over time because emissions are influenced by e.g. future political or economic decisions.
- Structural uncertainty. This is about how accurate our understanding of climate system dynamics is. It can increase a little over time and be subject to self-reinforcing errors.
Combining the above factors, predictions for around 20-25 years into the future are the most certain.
To measure the accuracy of predictions you have to have a way to measure the outcome. For climate change, several organisations provide estimates from thermometer readings. More recently satellite technology has been used. The estimates are usually quite similar to each other.
From 1990-2011 the average temperature increase was a little lower than IPCC estimates. However their forecasts were based on "business as usual", whereas there were some efforts to reduce carbon emissions - an example of scenario uncertainty.
Uncertainty in forecasts does not mean we shouldn't act. It might be that the climate forecast uncertainty was one reason some action was taken. Governments seem happy to dedicate huge resources towards economic or military programs based on far less reliable forecasts of outcomes than those available for the climate.
Critics say that there were once predictions of global warming. This was based on the theory that sulfur emissions would cool temperatures more than carbon would increase them. However this was not the dominant view in the scientific literature.
Sulfur may be part of the reason that the IPCC's 1990 forecast was too high, due to Mount Pinatubo's eruption in 1991 releasing sulfur.
The media will tend to highlight only the most extreme claims from either side, even when most scientists disagree with them.
Forecasts should find a reasonable baseline case they can use if their more complex modelling doesn't work out. For climate this might be a simple model that just extrapolates from trends of CO2 levels and temperatures. It turns out this would have been quite accurate.
We should be skeptical of forecasters who say science isn't important to their jobs or scientists who say forecasting isn't important to them.
The history of temperature rises includes periods of flat or negative trends in the same way that the stock market tends to go up over time but often has negative periods.
Prefer forecasts that are expressed in probabilistic terms - they more accurately reflect our forecasting abilities.
Thinking about Bayes' theorem shows us that carefully quantifying uncertainty is necessary for scientific progress
For climate science there is a bad incentive to make overly confident claims in that they can feel more persuasive when trying to mitigate long term disasters with short-term solutions. Today's political and cultural institutions are not good at solving these kinds of problems. Expressing uncertainty is often seen as a bad strategy.
The truth of global warming seems to be mostly one the side of argument: that the greenhouse effect is real and exacerbated by manmade emissions, and that these will make the planet warmer and lead to mostly unfavourable outcomes.
The idea that if only a few more people could be persuaded of the science we would immediately resolve the political problems on this issue is misguided.
Science may be almost the opposite of politics. Science inevitably tends to move slowly towards a consensus truth if done well. But politics today seems to be polarising ever more away from consensus.
In science we acknowledge that real data is noisy and rarely all points to the precisely the same conclusion. But in politics people are not expected to give any credence to the viewpoints of one's opponents. Saying something inconvenient, even if it is true, is seen as a mistake. Partisans are supposed to show equal convictions about the set of economic, social and foreign policy issues that align with their party's views, even though when they have little obvious relationship to each other.
What you don’t know can hurt you
After the fact, the attack on Pearl Harbour seemed predictable. But at the time, we'd mistaken our ignorance for knowledge, fitting everything we observed into the dominant theory of the time, which was that sabotage from within the country was more likely than an attack from outside.
Wohlstetter defined a signal as something that tells us something useful about the intentions of our enemy. Noise from his point of view is the confusion generated by competing signals.
In this framework, the absence of a signal might be important - e.g. a lack of Japanese radio transmissions signalled part of the attack on Pearl Harbour was in progress. Too many signals can mean it's impossible to determine what their overall meaning is.
After an event has happened people look back and feel like the signal was obvious. Some Americans even think that Pearl Harbour and the September 11 attack were so obviously predictable that their government must have been complicit in their execution.
However, it's only easy to select the relevant signals after we know the event has taken place. We need to increase our skill at identifying the relevant signals, not signals in general.
It's often necessary to make a decision on which signals we should focus on, but the risk is we choose them in a biased and self-serving way. We prioritise signals that are in line with our preferred theory of the world or imply better outcomes.
We sometimes confuse something being unfamiliar with it being improbable. We develop "mind-blindness" to things we're not familiar with, a type of anosognosia. The defence is to admit what we do not know. This is a version of the availability heuristic. We over-estimate how likely something is to happen if it's close to us in time and space and vice versa.
If you can spell out that something is dangerous or unpredictable then it's a known unknown. It's rarely a binary - few things are 100% unpredictable; we can make some kind of rough forecast about them. Problems occur when we do not consider a possibility at all: the unknown unknowns.
According to the 9/11 Commission Report, the most important systemic failure involved in not understanding the importance of the signals potentially warning us about the 9/11 attack is one of imagination. The signals were not in line with how we thought terrorists behaved, so we deprioritised them
Analysing historical data can help us understand future risk. This an extra tool rather than a substitute for the signal analysis that the intelligence community does.
The mathematics of terrorist attacks are similar to earthquakes in that they obey a power law. In the same way that magnitude 5 and 6 earthquakes imply the possibility of a magnitude 7 earthquake, the existence of terrorist attacks with smaller casualty counts like the Lockerbie bombing or Oklahoma City imply that something with a bigger number of deaths like 9/11 was possible. 9/11 wasn't an unimaginable outlier but rather an extension of the existing pattern.
Power laws imply that disasters that are far worse than we've experienced so far are very possible, but infrequent. They don't say anything about where the next huge disaster will happen; just that over the long term it is likely to.
Unlike earthquakes, terror attacks can potentially be prevented.
The power law suggests that terrorist attacks with 10,000+ causalities might occur. These would likely involve weapons of mass destruction e.g. nuclear weapons.
Political science Professor Graham Allison thinks that it's likely that we'd see a nuclear attack on the US, not based on statistics, but rather due to:
- Motive: Osama bin Laden already said he wants to kill millions of Americans, which probably requires a nuclear attack.
- Opportunity: It would not be so difficult to smuggle a nuclear weapon into the US.
- Means: It might not be so difficult for terrorists to acquire a nuclear weapon. This is the area he thinks we should focus on to reduce the risk of such an attack.
Michael Levi and others are skeptical. Levi says that groups often have aspirations they don't act on because they don't think they will be successful in doing so.
- A failed attempt would get further unwanted attention to the group, as well as harm their credibility.
- A nuclear attack would be hard to pull off, requiring cooperation from many participants, some with highly specialised technical knowledge, any of which might defect or be caught.
- The goal of terrorism is to inflict terror, not necessarily kill as many people as possible. Other methods may be more effective.
Terrorist groups are weak and unstable; similarly to new restaurants 90% of them fail within a year.
Rumsfeld thinks biological attacks may be more likely, requiring less technical expertise and potentially shutting down parts of society for months if it were contagious.
For earthquakes, the Gutenberg–Richter law shows us that the energy released by earthquakes increases exponentially as a function of magnitude. Magnitude 7 earthquakes are 10x rarer than 6, but overall will do more damage than all the magnitude 6 earthquakes combined. Similarly, even if extremely harmful terrorist attacks are very rare, they represent most of the harm, and hence signal detection of these should be prioritised.
Since 1982, the "broken windows" theory of crime deterrence suggests that by focussing on low-level crime such as vandalism the police can instill a climate of lawfulness which will prevent bigger crimes. The evidence for this isn't strong, but it was embraced by police departments as it made for much easier goals. Likewise the "security theater" surrounding taking flights is more for show than preventing terrorism.
The power law distribution may exist because of competition between terrorists and counter-terrorism forces, not in spite of it. There's an equilibrium between terrorists and society, balancing freedom and security, which differs in different places and times.
The power law theory suggests that Israel has seen fewer large-scale terrorist attacks than expected, suggesting that their security choices have made a difference.
A Bayesian approach to thinking may be helpful for national security analysis. Its probabilistic approach is helpful for making decisions in conditions of high uncertainty.
Natural laws don't change a lot, so over time we will come to a better understanding of which of nature's signals are important. But there's no reason to expect that human behaviour is becoming more predictable, it may be the opposite. The organisation of society is becoming more complicated and technology such as the internet changes how we relate to each other.
We have access to huge amounts of information, but most of it isn't of any use. We need to become better at distinguishing signal from noise. For this we should start thinking about prediction and probability using the Bayes' framework.
Bayes' theorem is based on a expressing the likelihood of a real-world event probabilistically. It relies on the idea that your perceptions of the world are only approximations of the truth. Your initial predictions might be poor quality, but Bayes' allows you to improve them as you gain new information. There's evidence we can learn to be better at this.
By necessity our brains process information by approximating it. Often this is useful and good. The problem occurs when we mistakenly assume our approximation is the reality.
Thinking probabilistically will make you slow down and consider the quality of your thinking, eventually improving your decision making.
Bayes forces us to quantify how likely we think an event is to occur even before we apply data to the problem, our prior belief. This should be based on our own past experience, and ideally the collective experience of the rest of society. Even a solely common-sense driven prior is useful to ensure we aren't too credulous of a given statistical model.
We must not pretend we don't have any prior beliefs. Work to reduce your biases, but if you think you have none then you probably have a lot.
Make a lot of forecasts. Update them each time you receive new information. Trial and error is a form of this, used very successfully by big companies.
If we believe our ideas are valuable then we must be willing to test them via falsifiable hypotheses and prediction.
Distinguishing the signal from the noise requires both scientific knowledge and self-knowledge: the serenity to accept the things we cannot predict, the courage to predict the things we can, and the wisdom to know the difference.
We're biased to think we are better at prediction than we really are.
May we arise from the ashes of these beaten but not bowed, a little more modest about our forecasting abilities, and a little less likely to repeat our mistakes.