Unglue.it joins GITenberg

Unglue.it has been of two minds about public domain ebooks. On the one hand, we recognize that the public domain contains the greatest literary works ever produced, and ebooks of these works need to be on any serious reader’s ebook shelf. On the other hand, there are plenty of web sites already focused on the public domain- Project Gutenberg is the grandaddy of them all. Other websites- Manybooks.net and Feedbooks.com to call out the best – have done a pretty good job of taking public domain books and make them easy to find and download. Some sites are based in countries where books enter the public domain sooner than in the US. You can get books like “The Great Gatsby” on Feedbooks or ebooks@Adelaide, even while they’re under copyright here in the US.

To be frank, the websites focusing on public domain books haven’t met many of the needs of libraries. I can search for Huckleberry Finn on many library catalogs and be told that the only ebook held by the library is checked out. There’s a good reason for that. Many of the ebooks in Project Gutenberg aren’t formatted so well in for epub or mobi, despite the high quality of the plain text digitization. With 50,000 texts in Project Gutenberg, it’s hard to tell which ones are top quality and which ones would cause support problems for overworked librarians. We loaded a few hundred titles from Project Gutenberg into Unglue.it to see what happened.

As you might expect, these classics accumulated a lot of faves, and so we’d occasionally go and clean up some ebook files. Since we use Github to manage our website code, the natural thing to do was to put the cleaned-up ebook files in Github, in case someone else wanted to use them – there was no obvious way to get them into Project Gutenberg itself. I thought it would be cool if more of project Gutenberg was in Github.

Then I discovered GITenberg. Back in 2012, Seth Woodworth, an ebook technologist, wanted nicer ebook editions of classics from Project Gutenberg. And Github was the obvious platform collaboration. So he created a Github organization, named it “GITenberg”, and created thousands of Github repositories for Gutenberg texts. It was a no-brainer for Unglue.it to join the effort.

When I heard about the Knight News Challenge for Libraries, I suggested to Seth that GITenberg might have a chance. Working together, we wrote up a proposal, adding some library spin.

There were 676 entrants in the News Challenge, and believe it or not, GITenberg was one of 22 entries to receive funding. The team has been awarded a $35,000 “Prototype Grant”, which will allow us to spend some real development time to start turning the idea into something that really works. More to the point, we have a deadline (in late June!) for demonstrating the GITenberg concept.

But aside from 45,000+ repos on GitHub (a significant achievement by itself) GITenberg has so far been more concept than reality. If you try to adopt a repo and submit a pull request, you’ll become aware that the GITenberg of today is more of a sketch than a working system. To make it a working system, we’ll have to assemble a lot of cooperating components. Thankfully most of the components we need exist, and people are working on them. This became very clear at the Hack Day sponsored by New York Public Library in January.

So what does this all mean for Unglue.it?

The obvious benefit is that the quality of public domain ebooks in unglue.it will get a big boost if GITenberg succeeds – the work in GITenberg will be 100% free and open, and Unglue.it will be making sure that all that data flow really works. But in the bigger picture, the machinery that gets built for GITenberg will offer solutions for free ebooks in general. New ways to collaborate around free and open metadata is something that Unglue.it really needs if it is to become the comprehensive database for freely licensed ebooks that we’ve been striving towards.

unglue.it