Monday, December 29, 2008

Reference: GutenMark

GutenMark Home Page
Attractively formatting Project Gutenberg texts
What is GutenMark?

GutenMark is a command-line tool for automatically creating high-quality HTML or LaTeX markup from Project Gutenberg etexts. As of April 2008, there is also a graphical front-end called GUItenMark that greatly simplifies usage for casual users. Both Windows and Linux 'x86 are supported. Mac OS X is also supported, though in some respects it lags the others. Limited iPhone support is also possible.

In combination with other freely-available conversion tools GutenMark aims to convert Project Gutenberg etexts into publication-quality Postscript or PDF, for print-on-demand applications. The goal is for this conversion to be completely automatic, without manual markup or editing, but for the forseeable future some manual intervention will almost always be needed—at least, if your standards are at least as high as mine.

I took the Project Gutenberg plain text file of The Adventures of Sherlock Holmes and ran it through this.

Amazingly, this:

To Sherlock Holmes she is always THE woman.

was transformed to this:

To Sherlock Holmes she is always the woman.

As it should be!

I was impressed with the available options and did some light testing. It could be a very useful tool for Project Gutenberg etexts that have only a plain text version available.

On the other hand, I also downloaded the Project Gutenberg HTML of the same Holmes and it was superior.

But this tool remains a very painless way of changing those text files into a format that can then go on to further processing to create an eBook.

No comments: