On-demand publishing tool for free ebooks

Popular devices such as the Apple iPad, Amazon Kindle, Opera Reader, and Sony Reader has made ebooks increasingly popular. However, before the huge amount of classic material can be enjoyed on these devices, it must be converted to one (or more) open ebook formats.

The main goal of this master thesis project is to design, build and deploy an on-demand publishing tool for free ebooks that automates this process.

The publishing tool should be a set of command-line scripts for on-demand automatic batch conversion of publication units (books consisting of several tagged HTML-files, metadata files and pagination files) into standard, open ebook formats of good quality that also includes all available metadata and pagination data.

Around the world are several repositories of free books stored as tagged HTML files accompanied my Metadata files, such as:

The student will cooperate with Projekt Runeberg and the project will be targeted towards processing the material that already exists in the Runeberg repository. Runeberg uses a subset of HTML for mark-up, but the source files are produced over a span of several years and may be of varying quality. The system need to be robust with respect to mark-up. It may be a good idea to also develop a tool to analyse the source files and point out problems for possible manual correction before generating the ebook.

In addition to creating the publishing tool, the student should evaluate the mark-up and metadata standards used by the Runeberg project, as well as public metadata standards such as Dublin Core, DAISY, and IDPF.org, comment on their suitability for ebook creation, and suggest improvements to the mark-up and metadata used in Projekt Runeberg.

The tools created in this project should use a widely available scripting language (e.g. Perl or Python) for portability. The student is expected to make use of existing free software tools (e.g. ImageMagick) for tasks such as automatic cover creation, and the student should try to engage a user community on sourceforge.net to augment the design and testing of the publishing tool.

Emneord: free software, dublin core, publishing on demand, free culture, public domain, metadata
