Creative Commons License

Google, Unemployment, and the Future of Data

Wednesday, July 08, 2009 at 06:10 PM EDT

Google may eventually solve the problem of finding data on the web. Too bad its first effort reports the wrong numbers for unemployment.

Since leaving public service, I have occasionally pondered whether to start a company / organization to transform the way that data are made available on the web. The data are out there, but they remain a nuisance to find, a nuisance to manipulate, and a nuisance to display. I cringe every time I have to download CSV files, import to Excel, manipulate the data (in a good sense), make a chart, and fix the dumb formatting choices that Excel makes. All those steps should be much, much easier.

There are good solutions to many of these problems if you have a research assistant or are ready to spend $20,000 on an annual subscription. With ongoing technology advances, however, there ought to be a much cheaper (perhaps even free) way of doing this on the net. With some good programming, some servers, and careful design (both graphic and human factors), it should be possible to dis-intermediate research assistants and democratize the ability to access and analyze data. At least, that’s my vision.

Many organizations have attacked various pieces of this problem, and a few have even made some headway (FRED deserves special mention in economics). But when you think about it, this is really a problem that Google ought to solve. It has the servers, software expertise, and business model to make this work at large scale. And with its launch of a search service for public data it has already signaled its interest in this problem.

As a major data consumer, I wish Google every success in this effort. However, I’d also like to use their initial effort, now almost three months old, as a case study in what not to do.

Google’s first offering of economics data is the unemployment rate for the United States (also available for the individual states and various localities). Search for “unemployment rate united states” and Google will give you the following graph:

Your first reaction should be that this is great. With absolutely no muss and no fuss, you have an excellent (albeit sobering) chart of the unemployment rate since 1990. I would add myriad extensions to this – e.g., make it easier to look at shorter time periods, allow users to look at the change in the unemployment rate, rather than the level, etc. – but the basic concept is outstanding.

Unfortunately, there is one major problem: That’s the wrong unemployment rate.

Click over to the Bureau of Labor Statistics, open a newspaper (remember them?), or stay right here on my blog – all of them will tell you that the unemployment rate in June was 9.5% not 9.7%.

That may not seem like a big difference, but the principle is huge: if Google wants to be a data provider, it would be nice if the data are correct.

So what’s going on here? Well, the answer is a bit technical, but it boils down to the phrase “not seasonally adjusted.” See that on the chart? Google is reporting, absolutely correctly, the unemployment rate in its not-seasonally-adjusted form. However, economists, policymakers, and the punditocracy all focus on the seasonally-adjusted unemployment rate, which came in at 9.5%.

The reason for seasonal adjustment is simple: unemployment rises and falls at certain points during the year because of seasonal hiring patterns. If you want to discern how the economy is doing, it makes sense to strip out those seasonal variations. That’s what happens in virtually all economic data. (There are situations in which you might be interested in the unadjusted data. If you run the agency that pays unemployment insurance, for example, your monthly spending will depend on the unadjusted data. But those cases are very rare.)

My point here is not that Google goofed. It did, and it should switch to the seasonally-adjusted data as soon as possible. But that’s not the moral.

The moral is that data rarely speak for themselves. There’s almost always some folklore, known to initiates, about how data should and should not be used. As the web transforms the availability and use of data, it’s essential that the folklore be democratized as much as the raw data themselves.

How should that work in this case? Well, I would recommend that Google default users into the seasonally-adjusted data, since that’s what the vast majority of users will want. Most of those users never need to learn about seasonal adjustment. But Google should also allow users to access (”opt-into”, if you will) the non-seasonally-adjusted data, as long as they are also presented with something like the paragraph above on the rationale for seasonal adjustment. Viola, democratized data and democratized folklore.

P.S. Of course, there’s lots more folklore about the unemployment rate (U3 vs. U6, for example). More on that in the future.