Charles Petzold



Google Books: A Bibliographic Disaster

September 7, 2007
New York, N.Y.

This is not a blog entry by an author who thinks that Google Books is evil because it promotes the idea of free books.

This is a blog entry by an author who sees Google Books as an invaluable research tool but who gets frustrated and enraged every time he tries to use it.

I know there's a Feedback page on Google Books where problems can be reported. But the problems I experience are so fundamental that the Feedback page seems grossly adequate. In short, Google Books seems to have been put together with a deficient sense of bibliographic integrity, which is a real problem if you're trying to assemble an online library.

Here's today's real-life example:

I wanted to get a sense of the historic occurrence of the phrase "Turing Machine" in books and periodicals published since the phrase was first coined by Alonzo Church in the March 1937 issue of The Journal of Symbolic Logic in a review of Alan Turing's 1936 paper that introduced the concept. I was most interested in the earliest references I could find to "Turing Machine" outside the periodicals and books devoted to mathematical logic.

This is an ideal job for Google Books — or so I thought. I began very simply on the Google Books home page with the phrase "Turing Machine." This turned up 1,345 hits but, alas, no way to sort them by date. Instead, I needed to impose my own chronological limits by going to the Advanced Book Search page and typing a range in the Publication Date fields. Alternatively, you can perform the search like this:

to get all references to "Turing Machine" with publication dates between 1800 and 1930, for example. There shouldn't be any, of course, but that search turns up 63 hits. A few of these are amusing and result from word breaks at the end of lines, for example, "lec-turing-machine" in an Edith Wharton novel.

But what you soon discover is that articles in periodicals seem to categorized not by the date of the article but by the date that the periodical began publication! Or, the date is simply incorrect.

The first hit from this search is apparently a book entitled Fundamenta Mathematicae with an author indicated as Polska Akademia Nauk and a line that says "Published 1920." But the little scan you get is a bibliography that has dates of 1959.

The second hit is an issue of Analog Science Fiction/science fact with a "Published 1930" date and an anachronistic reference to "Turing machine." The page that Google Books generates includes a map of the world with push-pins indicating all the places mentioned in the issue, but no clue as to when the issue was actually published.

The third hit is the Edith Whaton novel.

The fourth hit is apparently a pamphlet that Google says is entitled "The Problem Presented by Periodicals in College and University Libraries" but which the thumbnail says is entitled "The Future of the Book" with an indication of "Published 1930."

The fifth hit is book with a reference to a "sculp-turing machine."

The sixth hit is an article in the The Quarterly Review of Biology — precisely the type of thing I was looking for — and although it says "Published 1926" it also indicates that this particular article — the title of which is unavailable — is from "v.65 1990."

The seventh hit is an article from the Annals of the New York Academy of Sciences with an indication of "Published 1877" but the little scan from the article contains a date of 1949. What date is this particular issue? No indication whatsoever.

The eighth hit is an 1897 book with a reference to an "ice-manufac-turing machine." I'm not complaining about hits like this. A few of these would be amusing if everything else weren't so screwed up.

I could go on and on, but you can try your own searches and encounter your own frustrations.

Google Books is not an online library. It's a massive heap of digitized books and periodicals thrown together with a complete disregard for what these objects actually are and when they were published.