Leave a comment

70,000 Free Ebooks, 1.8 Million Served

70,681 and 1,853,832 as of November 1, 2019, but who’s counting?

It’s been a while since we’ve reported on our progress building the free ebook catalog of all free ebook catalogs, but yes, we’ve been busy. Since people ask, here are some statistics.

The two biggest components of the catalog are Project Gutenberg (56,986 works) via GITenberg and Directory of Open Access Books (9,747 works). That leaves 3,949 books that we’ve gathered from all over, and those are some of the most interesting one. Free Ebook aficionados may wonder about the ~3400 Gutenberg titles and ~12,000 DOAB titles we’re missing. In the case of Gutenberg we’re omitting a bunch of entries that aren’t books (yes, Gutenberg includes a million digits of pi). DOAB also has a lot of items that aren’t books or aren’t open-access enough for our purposes (we need to be able to serve them from Unglue.it).

Almost all of our catalog is available in PDF (70,507 titles), but EPUB version are available for 60,282 titles. 59,364 titles are available for Kindle. Half of our downloads are PDF, a third are EPUB and a sixth are for Kindle.

The licensing for our books is also interesting. Except for Project Gutenberg books which are 99.9% public domain (and thus no license applies.)

licenses

Ebook publishers really like the NC licenses, although CC BY is gaining popularity. I think the NC licenses are used so the publisher can make money on selling print copies.

Over three quarters of the books in our catalog are in English. German, French, Finnish, Portuguese, Italian, Dutch and Spanish combine to make a fifth of the catalog,  80 languages are represented in all. These proportions definitely reflect the collection bias, not the prevalence of free ebooks in one language or another.

languages

A catalog with 100,000 really-free ebooks seems attainable within one or two years. Our focus should be those books not covered by the huge scanned-book archives like Hathitrust and Internet Archive, the born digital, mission-oriented books. We’ve been working with Internet Archive and Internet-in-a-Box to make the catalog available in more ways, but there’s so much work to be done.

Time to do some fundraising, maybe.

57,000 free GITenberg ebooks are now in Unglue.it

booksplusgitOver the past month, the Free Ebook Foundation’s GITenberg program has rebuilt and refreshed over 57,000 ebooks from Project Gutenberg, and has loaded them into Unglue.it. These books are mostly in the Public Domain. They join our collection of about 7000 open-licensed books that are still in-copyright. As a result of the growth of our free ebook collection, we recently passed the million-downloads mark, including half a million reflowable EPUB or Kindle texts. Unglue.it metadata is automatically updated using a system devised by a team of students from Stevens Institute of Technology.

GITenberg is a prototype project that explores how Project Gutenberg might work if all the Gutenberg texts were on Github, so that tools like version control, continuous integration, and pull-request workflow tools could be put to work. We hope that Project Gutenberg can take advantage of what we’ve learned; work in that direction has begun but needs resources and volunteers.  Go check it out!

This influx of content has put some stress on Unglue.it’s search capability. We understand that users often want to filter out the “classic” books. This is now possible using our “no-Gutenberg” facet. We’re working on deploying a revamped, mobile friendly redesign of Unglue.it that we think you’ll like. (Another team of Stevens students worked on that!)

For more details see the blog post on Go To Hellman

Translating the Khasis’ hidden treasure

You probably have never heard of Soso Tham. We certainly hadn’t until a few weeks ago. You’ve probably never even heard of the Khasi people, 1.6 million of whom live in the foothills of the Himalaya. Soso Tham is the best known poet of the Khasi language. If you’ve never heard of the Khasi in the first place, that’s just an abstraction.

But imagine being poet Janet Hujon, who grew up in Shillong, the capital of the Meghalaya state in eastern India. She lives in England, where most folks, like us, probably haven’t heard of Khasi, Shillong, OR Maghalaya. She writes in English, but the stories she heard as a child continue to fire her imagination and the Khasi culture, as embodied in the poetry of Soso Tham, form the foundation of her world view and literary work. It must frustrate her that Soso Tham’s poetry is inaccessible to the English speaking world.

When you love a poem, you have to find a way to share it. At unglue.it, we’re all about sharing the writing we love, and Janet Hujon’s heart must be extra-extra large to for her take on the monumental task of finally sharing Soso Tham’s work with the rest of us.

We have the privilege of being able to help Janet and Open Book Publishers share her new translation of Tham’s masterpiece The Old Days of the Khasis with the world, under a Creative Commons license. The campaign ends midnight Friday (EDT), and already, $1,400 has been raised.

“I was motivated to write this book not only because ‘Tales of Darkness and Light’ is not widely known in English, but also because it is not well-known by many in India. There are those (who are not native speakers of Khasi) who recognise the poem’s greatness but they are few and far between, partly because Khasi is a minority language. Northeast India has ‘exotic’ connotations, because unlike the dominant Hindu and Muslim identities we did not have a script until relatively recently and were considered ‘backward’. Those old colonial prejudices towards the East have played a role in shaping the rest of India’s comparative ignorance of life in Northeast India and translating Tham’s work into English has, perhaps paradoxically, offered one way to address this issue.

What really made me take on the challenge to translate, however, was my late father’s belief that I should do it. The relatives of the poet also felt that my long association with English and the fact that I still speak my own language made me the ideal candidate to carry the torch! I hope the book will do justice to their faith in me.” 

— Janet Hujon, March 2018

If you want to learn more about the Khasi and their matrilineal society, you should watch this 20 minute documentary on YouTube.

To get a sample of Janet Hujon’s evocative poetry, here’s a selection of her poems (also Creative Commons licensed!)

To help us help Janet, go to https://unglue.it/work/291736/ and chip in.

Unglue.it has resumed crowdfunding

Government funding for the humanities, the arts, and education has come under attack. The President’s budget proposal announced in March would eliminate the NEH and NEA. The US Department of Education wants partners to develop open educational resources, but has no funding to support them. So when the Free Ebook Foundation’s strategic planning process began last year, it was clear that our most pressing challenge was to diversify funding mechanisms for free ebooks that advance the humanities, the arts, and education.

Guess what! That’s exactly why we built Unglue.it. In fact, the Foundation itself was created because we felt that Unglue.it could best succeed in its mission as part of a charitable non-profit organization. We’ve been working to revise and re-focus the platform, and so we’re resuming our efforts to raise money for new free ebook projects.

cover of jewish unions in americaThe first of these projects is “The Jewish Unions in America: Pages of History and Memories” an ebook from Open Book Publishers (OBP). A memoir of life as an immigrant worker in New York and originally published in Yiddish, it’s been brilliantly translated by Maurice Wolfthal and will soon be available to read for free online and in affordable print editions, because of OBP’s strong commitment to making works like this as available as possible. OBP usually manages to break even on books using a combination of sales, a library subscription service and grant funding here and there, but wants to be able to publish books on merit rather than funding availability. For this book, ungluers, including donors to the Foundation, will be the “here and there”.

To support this campaign, go to https://unglue.it/work/252946/ and click “Support”. Ungluers can now choose to make their support a tax-deductible donation rather than a pledge. Facebook users can donate in support of the campaign at the Free Ebook Foundation Facebook page, or share with their friends.

Ungluing campaigns for 2 other free ebook projects are being prepared, and we welcome new project submissions. We’re also exploring ways for donors to support groups or categories of books.

Unglue.it Website is now Open Source

As part of our shift to operation as a community-supported 501(c)3 not-for-profit organization, we’ve opened up the source code to the Unglue.it web application and website. You can now report issues, help us fix bugs, or run your own version of unglue.it from the git repository on GitHub. (You can’t use the name unglue.it without our permission, the name is a trademark of the Free Ebook Foundation.)

Unglue.it is a Django application written in Python running with a MySQL backend on Amazon Web Services. We use Vagrant to build production and test servers; we use a Jenkins instance for continuing integration and testing.

In the coming weeks and months, we’ll be adding our development roadmap to Github, and we’ll mark issues that are suitable to be worked on by volunteers. The main focus of Unglue.it has shifted from crowdfunding for free ebooks to the cataloguing and distribution of free ebooks, but this isn’t so obvious from the website design and documentation. We started Unglue.it before practices such as responsive design matured; we want to make it work much better on mobile.

We’re particularly happy of the work we’ve done to make free books available via APIs; any facet or list on the website can be accessed as ONIX, MARC, and OPDS feeds; there are also facilities to push ebooks via FTP to other sites. Code that imports ebooks from other sources (ONIX, MARC, OAI-PMH) has been a more work because metadata is always messy.

Other areas of our code show the signs of disruptions long past, particularly the payment module, which was designed for Paypal, redesigned for Amazon Payments, then redesigned again for Stripe. Not something we’d wish on anyone, but it works!

commitsThe trickiest part of opening up the source code has been password hygiene. We had to comb through the entire git history (over 6,000 commits!) to find and deactivate passwords, accounts and secret keys that had been put into the repo. To allow us to continue using the open repo without exposing secrets, we’re using Ansible Vault to encrypt all the secrets. A master key to the vault decrypts the vault during the server configuration process; this master key never leaves the secure environment of the admin’s computer.

There isn’t a master key to building a strong community around a project for the public benefit. Luckily, we can get some pointers by reading Karl Fogel’s Open-Licensed book “Producing Open Source Software “, a new version (2.0) of which is available on Unglue.it!

1 Comment

DOAB and Project Gutenberg books in Unglue.it

Slow and steady. That’s how we’ve been improving Unglue.it, turning it into a better place to find free ebooks. A lot of that work has been invisible; our new APIs are being used by organizations like New York Public Library to offer ebooks that deliver value without draining acquisition budgets. We’ve also installed tools that ebook creators will be able to use to better understand how their ebooks are being used. We’ve improved our data model to support relationships between works. So for example, when Peter Suber’s book on Open Access is translated into another language, links between the works are displayed on the unglue.it page. Similarly, Richard Herley’s The Stone Arrow is linked to its sequel, The Flint Lord. And have you noticed that author names are clickable?

Our biggest effort over the last year has been the expansion of our database of free ebooks. Two big sources are worth noting:

  • doabDirectory of Open Access Books (DOAB). DOAB has been tracking books written by academics and published with peer-review, often by university presses. Any book that’s in DOAB now has a page in Unglue.it, and it’s labeled as such. We’ve added a DOAB facet so you can restrict your browsing to books from DOAB You can use the DOAB label as a mark of quality and know that a book is being relied upon by scholars, scientists, and researchers.
  • gtbgProject Gutenberg. Project Gutenberg is the oldest and largest collection of public domain ebooks. Through GITenberg, we’ve been exploring ways to make this collection more discoverable and maintainable. So far, we’ve loaded about 5,000 ebooks from GITenberg into Unglue.it. GITenberg allows programatic access to the ebooks, unlike Project Gutenberg, so Unglue.it can do things like send them to your Kindle. You can use GitHub to suggest improvements to these books, and to their metadata. And we’ve added a Project Gutenberg facet to help you browse these books.

For both DOAB and Project Gutenberg, your Unglue.it “Faves” help us rank the books, and help other ungluers (and our library partners) know which of them to pay more attention to.

We have a lot improvements to make. Don’t hesitate to make suggestions, either in the comments here or by email to unglue.it support. Another way you can support Unglue.it is to put our featured ebook widget on your website.

Free eBooks by ISBN

After reflecting on the coming demise of xISBN, we decided to add an endpoint for free ebooks to the unglue.it API.

The API documentation is at https://unglue.it/api/help

With an API key, you can check if there’s a free ebook for any ISBN. ISBNs can be 10 or 13 digits, and can include dashes. This service returns all free-licensed ebooks for a work associated with an ISBN, and for each ebook includes information about file type, rights, and the provider hosting the file.

For example, here’s how to get a list of ebook files for “Homeland”.

JSON: https://unglue.it/api/v1/free/?isbn=9780765333698&format=json&api_key={your_api_key}&username={your_username}

{
 "meta": {"total_count": 3},
 "objects": [
    {"filetype": "pdf", "href": "/download_ebook/2576/", "provider": "Internet Archive", "rights": "CC BY-NC-ND"},
    {"filetype": "epub", "href": "/download_ebook/2577/", "provider": "Internet Archive", "rights": "CC BY-NC-ND"},
  {"filetype": "mobi", "href": "/download_ebook/2578/", "provider": "Internet Archive", "rights": "CC BY-NC-ND"}
    ]
 }

XML: https://unglue.it/api/v1/free/?isbn=9780765333698&format=xml&api_key={your_api_key}&username={your_username}

<response>
 <objects type="list">
 <object>
 <href>/download_ebook/2576/</href>
 <filetype>pdf</filetype>
 <provider>Internet Archive</provider>
 <rights>CC BY-NC-ND</rights>
 </object>
 <object>
 <href>/download_ebook/2577/</href>
 <filetype>epub</filetype>
 <provider>Internet Archive</provider>
 <rights>CC BY-NC-ND</rights>
 </object>
 <object>
 <href>/download_ebook/2578/</href>
 <filetype>mobi</filetype>
 <provider>Internet Archive</provider>
 <rights>CC BY-NC-ND</rights>
 </object>
 </objects>
 <meta type="hash"
 ><total_count type="integer">3</total_count>
 </meta>
</response>

We’ll soon be integrating Gitenberg ebooks into this feed, too.