Tuesday, July 22, 2008

Data Inaccuracies in Polls and Surveys

From www.dis-order.net.

Salon.com published an interesting article yesterday by Paul Maslin and Jonathan Brown, discussing an inaccuracy in the standard approach to political polling. They say that phone surveys only focus on landlines, which ignore people who only have cell phones. They have a fairly detailed discussion on why this is the case, and how much this can affect polls -- essentially, as the number of people who only use cell phones increases, polls can become less and less accurate. This is especially true since a specific type of demographic owns cell phones and avoids land lines (younger, more technical people), meaning the polls can become quite biased (and thus inaccurate).

Alternatives to Political Polling

So the first question that comes up is, "Are there alternatives to phone calls?" Even with the rise of the Internet, political polling is still very dependent on random phone calls. The basic problem is getting a random sample -- you can't do that with e-mails or site visits. So never trust those CNN or Fox News polls.

One alternative is using a prediction market. These act just like stock markets, but people buy and sell shares in a specific event -- you then make a profit if the event takes place, and lose money if it does not.

What is exciting about prediction markets is that, with enough people participating, they aggregate individuals' knowledge and can provide a reasonably accurate probability around a specific event. In fact, markets have been known to provide better predictions than those of experts. A lot of major companies, such as Google, HP, Best Buy, and many others, are using these now, and one can get a good overview by exploring Google's work, and Wolfers' and Zitzewitz's paper.

So how accurate are these markets for the upcoming elections? Well, the Iowa Electronic Market pretty much shows a 50-50 split on the 2008 Presidential race, while Intrade.com gives Obama a 2-to-1 lead. Of course, not everyone participates in these markets, and I'm sure it is easy to argue that Obama supporters (read: younger, more technology-friendly people) are more likely to use sites like this.

Is This Cell Phone Problem An Isolated One?

When reading newspapers or magazines, people often feel more comfortable with numbers than they do with qualitative or subjective discussions. This is a major problem -- yes, numbers do not lie, but the definitions used to get those numbers can often be misleading. The way surveys are designed, and the way "random" samples are chosen, can often bias results quite a bit.

One area where this is a very big problem is poverty measurements. Poverty is often defined with regards to how much of a family's income is spent on food and shelter. International comparisons, however, are murky -- the way you define baskets of goods (e.g. nutritional requirements, staple foods, etc.) can change quite a bit between countries. One of the biggest criticisms of surveys focusing on poverty has been that they are household surveys -- people without homes are often missed. Indeed, finding such people can be very tricky in the first place.

Oftentimes, running surveys and collecting data is extremely difficult. A great overview of this, in an international development context, is Martin Ravallion's "How Well Can Method Substitute for Data? Five Experiments in Poverty Analysis". Statisticians, mathematics, and other researchers are constantly trying to find new analytical tools to make models and analysis more accurate, but bad data can rarely be fixed after it has been collected.

In general the important thing is to critically analyze the definitions and methods used in surveys and polls. The best piece of advice I ever got on this issue was that numbers and methodologies tell stories just like words do, and it is important to read between the lines.

1 comment:

Craig said...

Just thought I would point out that the two sources you cite for prediction market results are showing two different things. The IEM one is for voter share. The Intrade one is Winner-Take-All. If you compare apples to apples, both IEM and Intrade give comparable probabilities of Obama winning (a little over 60% at the moment).