Translating the Khasis’ hidden treasure

You probably have never heard of Soso Tham. We certainly hadn’t until a few weeks ago. You’ve probably never even heard of the Khasi people, 1.6 million of whom live in the foothills of the Himalaya. Soso Tham is the best known poet of the Khasi language. If you’ve never heard of the Khasi in the first place, that’s just an abstraction.

But imagine being poet Janet Hujon, who grew up in Shillong, the capital of the Meghalaya state in eastern India. She lives in England, where most folks, like us, probably haven’t heard of Khasi, Shillong, OR Maghalaya. She writes in English, but the stories she heard as a child continue to fire her imagination and the Khasi culture, as embodied in the poetry of Soso Tham, form the foundation of her world view and literary work. It must frustrate her that Soso Tham’s poetry is inaccessible to the English speaking world.

When you love a poem, you have to find a way to share it. At unglue.it, we’re all about sharing the writing we love, and Janet Hujon’s heart must be extra-extra large to for her take on the monumental task of finally sharing Soso Tham’s work with the rest of us.

We have the privilege of being able to help Janet and Open Book Publishers share her new translation of Tham’s masterpiece The Old Days of the Khasis with the world, under a Creative Commons license. The campaign ends midnight Friday (EDT), and already, $1,400 has been raised.

“I was motivated to write this book not only because ‘Tales of Darkness and Light’ is not widely known in English, but also because it is not well-known by many in India. There are those (who are not native speakers of Khasi) who recognise the poem’s greatness but they are few and far between, partly because Khasi is a minority language. Northeast India has ‘exotic’ connotations, because unlike the dominant Hindu and Muslim identities we did not have a script until relatively recently and were considered ‘backward’. Those old colonial prejudices towards the East have played a role in shaping the rest of India’s comparative ignorance of life in Northeast India and translating Tham’s work into English has, perhaps paradoxically, offered one way to address this issue.

What really made me take on the challenge to translate, however, was my late father’s belief that I should do it. The relatives of the poet also felt that my long association with English and the fact that I still speak my own language made me the ideal candidate to carry the torch! I hope the book will do justice to their faith in me.” 

— Janet Hujon, March 2018

If you want to learn more about the Khasi and their matrilineal society, you should watch this 20 minute documentary on YouTube.

To get a sample of Janet Hujon’s evocative poetry, here’s a selection of her poems (also Creative Commons licensed!)

To help us help Janet, go to https://unglue.it/work/291736/ and chip in.

Unglue.it has resumed crowdfunding

Government funding for the humanities, the arts, and education has come under attack. The President’s budget proposal announced in March would eliminate the NEH and NEA. The US Department of Education wants partners to develop open educational resources, but has no funding to support them. So when the Free Ebook Foundation’s strategic planning process began last year, it was clear that our most pressing challenge was to diversify funding mechanisms for free ebooks that advance the humanities, the arts, and education.

Guess what! That’s exactly why we built Unglue.it. In fact, the Foundation itself was created because we felt that Unglue.it could best succeed in its mission as part of a charitable non-profit organization. We’ve been working to revise and re-focus the platform, and so we’re resuming our efforts to raise money for new free ebook projects.

cover of jewish unions in americaThe first of these projects is “The Jewish Unions in America: Pages of History and Memories” an ebook from Open Book Publishers (OBP). A memoir of life as an immigrant worker in New York and originally published in Yiddish, it’s been brilliantly translated by Maurice Wolfthal and will soon be available to read for free online and in affordable print editions, because of OBP’s strong commitment to making works like this as available as possible. OBP usually manages to break even on books using a combination of sales, a library subscription service and grant funding here and there, but wants to be able to publish books on merit rather than funding availability. For this book, ungluers, including donors to the Foundation, will be the “here and there”.

To support this campaign, go to https://unglue.it/work/252946/ and click “Support”. Ungluers can now choose to make their support a tax-deductible donation rather than a pledge. Facebook users can donate in support of the campaign at the Free Ebook Foundation Facebook page, or share with their friends.

Ungluing campaigns for 2 other free ebook projects are being prepared, and we welcome new project submissions. We’re also exploring ways for donors to support groups or categories of books.

Unglue.it Website is now Open Source

As part of our shift to operation as a community-supported 501(c)3 not-for-profit organization, we’ve opened up the source code to the Unglue.it web application and website. You can now report issues, help us fix bugs, or run your own version of unglue.it from the git repository on GitHub. (You can’t use the name unglue.it without our permission, the name is a trademark of the Free Ebook Foundation.)

Unglue.it is a Django application written in Python running with a MySQL backend on Amazon Web Services. We use Vagrant to build production and test servers; we use a Jenkins instance for continuing integration and testing.

In the coming weeks and months, we’ll be adding our development roadmap to Github, and we’ll mark issues that are suitable to be worked on by volunteers. The main focus of Unglue.it has shifted from crowdfunding for free ebooks to the cataloguing and distribution of free ebooks, but this isn’t so obvious from the website design and documentation. We started Unglue.it before practices such as responsive design matured; we want to make it work much better on mobile.

We’re particularly happy of the work we’ve done to make free books available via APIs; any facet or list on the website can be accessed as ONIX, MARC, and OPDS feeds; there are also facilities to push ebooks via FTP to other sites. Code that imports ebooks from other sources (ONIX, MARC, OAI-PMH) has been a more work because metadata is always messy.

Other areas of our code show the signs of disruptions long past, particularly the payment module, which was designed for Paypal, redesigned for Amazon Payments, then redesigned again for Stripe. Not something we’d wish on anyone, but it works!

commitsThe trickiest part of opening up the source code has been password hygiene. We had to comb through the entire git history (over 6,000 commits!) to find and deactivate passwords, accounts and secret keys that had been put into the repo. To allow us to continue using the open repo without exposing secrets, we’re using Ansible Vault to encrypt all the secrets. A master key to the vault decrypts the vault during the server configuration process; this master key never leaves the secure environment of the admin’s computer.

There isn’t a master key to building a strong community around a project for the public benefit. Luckily, we can get some pointers by reading Karl Fogel’s Open-Licensed book “Producing Open Source Software “, a new version (2.0) of which is available on Unglue.it!

1 Comment

DOAB and Project Gutenberg books in Unglue.it

Slow and steady. That’s how we’ve been improving Unglue.it, turning it into a better place to find free ebooks. A lot of that work has been invisible; our new APIs are being used by organizations like New York Public Library to offer ebooks that deliver value without draining acquisition budgets. We’ve also installed tools that ebook creators will be able to use to better understand how their ebooks are being used. We’ve improved our data model to support relationships between works. So for example, when Peter Suber’s book on Open Access is translated into another language, links between the works are displayed on the unglue.it page. Similarly, Richard Herley’s The Stone Arrow is linked to its sequel, The Flint Lord. And have you noticed that author names are clickable?

Our biggest effort over the last year has been the expansion of our database of free ebooks. Two big sources are worth noting:

  • doabDirectory of Open Access Books (DOAB). DOAB has been tracking books written by academics and published with peer-review, often by university presses. Any book that’s in DOAB now has a page in Unglue.it, and it’s labeled as such. We’ve added a DOAB facet so you can restrict your browsing to books from DOAB You can use the DOAB label as a mark of quality and know that a book is being relied upon by scholars, scientists, and researchers.
  • gtbgProject Gutenberg. Project Gutenberg is the oldest and largest collection of public domain ebooks. Through GITenberg, we’ve been exploring ways to make this collection more discoverable and maintainable. So far, we’ve loaded about 5,000 ebooks from GITenberg into Unglue.it. GITenberg allows programatic access to the ebooks, unlike Project Gutenberg, so Unglue.it can do things like send them to your Kindle. You can use GitHub to suggest improvements to these books, and to their metadata. And we’ve added a Project Gutenberg facet to help you browse these books.

For both DOAB and Project Gutenberg, your Unglue.it “Faves” help us rank the books, and help other ungluers (and our library partners) know which of them to pay more attention to.

We have a lot improvements to make. Don’t hesitate to make suggestions, either in the comments here or by email to unglue.it support. Another way you can support Unglue.it is to put our featured ebook widget on your website.

Free eBooks by ISBN

After reflecting on the coming demise of xISBN, we decided to add an endpoint for free ebooks to the unglue.it API.

The API documentation is at https://unglue.it/api/help

With an API key, you can check if there’s a free ebook for any ISBN. ISBNs can be 10 or 13 digits, and can include dashes. This service returns all free-licensed ebooks for a work associated with an ISBN, and for each ebook includes information about file type, rights, and the provider hosting the file.

For example, here’s how to get a list of ebook files for “Homeland”.

JSON: https://unglue.it/api/v1/free/?isbn=9780765333698&format=json&api_key={your_api_key}&username={your_username}

{
 "meta": {"total_count": 3},
 "objects": [
    {"filetype": "pdf", "href": "/download_ebook/2576/", "provider": "Internet Archive", "rights": "CC BY-NC-ND"},
    {"filetype": "epub", "href": "/download_ebook/2577/", "provider": "Internet Archive", "rights": "CC BY-NC-ND"},
  {"filetype": "mobi", "href": "/download_ebook/2578/", "provider": "Internet Archive", "rights": "CC BY-NC-ND"}
    ]
 }

XML: https://unglue.it/api/v1/free/?isbn=9780765333698&format=xml&api_key={your_api_key}&username={your_username}

<response>
 <objects type="list">
 <object>
 <href>/download_ebook/2576/</href>
 <filetype>pdf</filetype>
 <provider>Internet Archive</provider>
 <rights>CC BY-NC-ND</rights>
 </object>
 <object>
 <href>/download_ebook/2577/</href>
 <filetype>epub</filetype>
 <provider>Internet Archive</provider>
 <rights>CC BY-NC-ND</rights>
 </object>
 <object>
 <href>/download_ebook/2578/</href>
 <filetype>mobi</filetype>
 <provider>Internet Archive</provider>
 <rights>CC BY-NC-ND</rights>
 </object>
 </objects>
 <meta type="hash"
 ><total_count type="integer">3</total_count>
 </meta>
</response>

We’ll soon be integrating Gitenberg ebooks into this feed, too.

1 Comment

Unglue.it Goes Non-Profit

Since its beginning 4 years ago, Unglue.it has been a part of Gluejar, Inc., a privately held for-profit company. We initially thought Unglue.it would be mostly about crowd-funding books into the public commons. While unglue.it has always put a public benefit at the center of its mission, the for-profit status made sense for a crowdfunding business. Over the past two years, Unglue.it has shifted into the nuts and bolts of distributing and promoting freely-licensed ebooks, because we realized how dysfunctional the commercial ebook supply chain had become. The for-profit status made less and less sense.

Over the last year, we’ve also started working on GITenberg, and effort to improve the ebooks in Project Gutenberg. To our great surprise and pleasure, we got grant funding for this work from the Knight Foundation, and fiscal sponsorship from the Miami Foundation. Suddenly, our eyes opened to the realization that we would be better able to continue our work as part of a non-profit entity.

FEFlogo2So a bunch of us have created the Free Ebook Foundation. It will be the corporate home for both Unglue.it and GITenberg. There might even be some new projects. We’re really excited about it.

There’s a lot to do in setting up a non-profit. We’ve applied for charitable tax status (it usually takes several months to receive it). We’ll be creating new accounts for the Foundation and transferring over licenses, subscriptions, and assets. We hope to have everything switched over by the end of the year. Unglue.it users should not notice much differences.

Q. Will Unglue.it continue to be developed?

A. Yes! The combination with GITenberg gives us more resources to work on Unglue.it. Expect new distribution agreements to be announced soon.

Q. Can I donate to the Free Ebook Foundation?

A. Not just yet. When we receive confirmation of our tax status, we’ll start offering ways for you to support us on the website.

Q. Will the software running Unglue.it be released as open source?

A. We expect that most of the software will be released under appropriate open source licenses.

Q. Will Unglue.it continue to run crowd-funding campaigns?

A. Only to the extent that doing so is consistent with its tax status, yes.

Q. Will Gluejar, Inc. continue to exist?

A. Yes, Eric Hellman will continue his patent and privacy consulting as businesses of Gluejar. He has helped companies invalidate bad patents through the Inter Partes Review process, and has begun helping libraries identify privacy leakages in their digital services.

Q. The logo is ugly. Could I design you a better one?

A. Oooooh please!

1 Comment

Unglue.it joins GITenberg

Unglue.it has been of two minds about public domain ebooks. On the one hand, we recognize that the public domain contains the greatest literary works ever produced, and ebooks of these works need to be on any serious reader’s ebook shelf. On the other hand, there are plenty of web sites already focused on the public domain- Project Gutenberg is the grandaddy of them all. Other websites- Manybooks.net and Feedbooks.com to call out the best – have done a pretty good job of taking public domain books and make them easy to find and download. Some sites are based in countries where books enter the public domain sooner than in the US. You can get books like “The Great Gatsby” on Feedbooks or ebooks@Adelaide, even while they’re under copyright here in the US.

To be frank, the websites focusing on public domain books haven’t met many of the needs of libraries. I can search for Huckleberry Finn on many library catalogs and be told that the only ebook held by the library is checked out. There’s a good reason for that. Many of the ebooks in Project Gutenberg aren’t formatted so well in for epub or mobi, despite the high quality of the plain text digitization. With 50,000 texts in Project Gutenberg, it’s hard to tell which ones are top quality and which ones would cause support problems for overworked librarians. We loaded a few hundred titles from Project Gutenberg into Unglue.it to see what happened.

As you might expect, these classics accumulated a lot of faves, and so we’d occasionally go and clean up some ebook files. Since we use Github to manage our website code, the natural thing to do was to put the cleaned-up ebook files in Github, in case someone else wanted to use them – there was no obvious way to get them into Project Gutenberg itself. I thought it would be cool if more of project Gutenberg was in Github.

Then I discovered GITenberg. Back in 2012, Seth Woodworth, an ebook technologist, wanted nicer ebook editions of classics from Project Gutenberg. And Github was the obvious platform collaboration. So he created a Github organization, named it “GITenberg”, and created thousands of Github repositories for Gutenberg texts. It was a no-brainer for Unglue.it to join the effort.

Gitenberg_full

When I heard about the Knight News Challenge for Libraries, I suggested to Seth that GITenberg might have a chance. Working together, we wrote up a proposal, adding some library spin.

There were 676 entrants in the News Challenge, and believe it or not, GITenberg was one of 22 entries to receive funding. The team has been awarded a $35,000 “Prototype Grant”, which will allow us to spend some real development time to start turning the idea into something that really works. More to the point, we have a deadline (in late June!) for demonstrating the GITenberg concept.

But aside from 45,000+ repos on GitHub (a significant achievement by itself) GITenberg has so far been more concept than reality. If you try to adopt a repo and submit a pull request, you’ll become aware that the GITenberg of today is more of a sketch than a working system. To make it a working system, we’ll have to assemble a lot of cooperating components. Thankfully most of the components we need exist, and people are working on them. This became very clear at the Hack Day sponsored by New York Public Library in January.

So what does this all mean for Unglue.it?

The obvious benefit is that the quality of public domain ebooks in unglue.it will get a big boost if GITenberg succeeds – the work in GITenberg will be 100% free and open, and Unglue.it will be making sure that all that data flow really works. But in the bigger picture, the machinery that gets built for GITenberg will offer solutions for free ebooks in general. New ways to collaborate around free and open metadata is something that Unglue.it really needs if it is to become the comprehensive database for freely licensed ebooks that we’ve been striving towards.