Simple ThML Markup in Microsoft Word

Version 0.93, Wednesday, August 19, 1998

The Christian Classics Ethereal Library uses a markup language known as ThML (Theological Markup Language) for electronic texts. This markup language encodes a great deal of information about the text, such as document structure, notes, scripture references, subjects, synchronization points, and much more. The inclusion of all of this information will make it possible for users to access the CCEL in ways that are not possible in paper libraries. You will be able to see what your favorite authors had to say on a particular scripture passage, for example, or look up everything in the church fathers on a particular topic.

To support these applications, texts must have special formatting codes called "markup" added. This document tells how to prepare electronic texts for the CCEL in Microsoft Word. It describes the basics of preparing and submitting a document and then discusses particular ways you can format a document (using a simpler subset of the required formatting) to make integration with the CCEL easier.

At the most basic level, to prepare a document for the CCEL, just prepare the document in Word to look pretty much like the book from which it came. Include all of the text of the book exactly as it appears. Use fonts, sizes, and styles to look similar to the print edition. It is not necessary to retain margins, line breaks, page breaks, or unambiguous end-of-line hyphenation of the print edition.

Insert footnotes as ordinary footnotes in Word, using the Insert | Footnote… menu item or the Insert-Footnote-Now shortcut, Alt+Ctrl+F. If there are images or tables, use the normal Word facilities. Then save the document in RTF format and email it to the CCEL ([email protected]) or upload it via anonymous FTP to ccel.wheaton.edu, in the incoming directory. You could also put it on a floppy disk and mail it via the US postal service.

Most contributors will do that sort of formatting and stop there. A few may wish to do further formatting that will make it easier to add the text to the CCEL. This further formatting involves using paragraph styles and inserting special codes. To do so, you have to get the document template used for the CCEL, ThML093.doc, and put it in the Templates folder, inside the Microsoft Office folder, probably in your Program Files folder. (The template can be downloaded from the ThML web page, http://ccel.wheaton.edu/ThML.) Then open the document you are working on and choose Format | Style.… Click the Organizer button and open ThML093.doc in the right-hand window. Copy all of the styles from ThML093.doc into the document you are formatting.

 

Paragraph and Character Styles

Much of the formatting in Word is done by applying character and paragraph styles to the document. Paragraph style sheets are named groupings of styles for paragraphs, such as single-space, indent first line, Times New Roman 11-point, etc. A paragraph style can be applied to a paragraph by selecting it from the left-most dropbox on the formatting toolbar. The ThML template provides several paragraph styles that should be used for formatting documents—styles such as Body Text, Body Text First Indent, Heading 1, Verse, BlockQuote, and others.

Character styles are similar to paragraph styles, except that they only contain character formatting and they may occur within a paragraph style. To of the character styles used for ThML are "HTML Markup" and "Default". Keyboard shortcuts have been provided for certain common paragraph and character styles:

Style Name

Shortcut Keys

Description

Default (character)

ctrl-alt-d

Default paragraph font

Heading 1

ctrl-alt-1

Level-1 heading

Heading 2

ctrl-alt-2

Level-2 heading

Heading 3

ctrl-alt-3

Level-3 heading

Heading 4

ctrl-alt-4

Level-4 heading

HTML (character)

ctrl-alt-h

HTML (or XML) markup

P

ctrl-alt-p

Normal paragraphs

P_First

ctrl-alt-r

First paragraph of a section

Verse

ctrl-alt-v

Poetry, verse, etc.

XML

ctrl-alt-x

XML (or HTML) markup

 

So, for example, if you want to format a chapter title, you could place the cursor on the line containing the chapter title and press ctrl-alt-2. The remainder of this paper will describe how to use a few essential paragraph styles and a few additional codes. A full description of how to format documents in Word for the CCEL can be found in the paper ThML Markup in Microsoft Word for the Christian Classics Ethereal Library.

XML and HTML Markup

When paragraph styles are not sufficient, special markup codes called XML or HTML tags are used, for example, <foreign lang="el">logos</foreign> or <pb n="37" />. These tags are represented in a Word document as text that is red, hidden, Courier New text. This style may be applied to text by using the XML style: select the markup and press ctrl-alt-x.

Document Structure

Headings

Headings for the preface, table of contents, and index, chapter titles, section heads, and the like should all be formatted using the styles Heading 1, Heading 2, Heading 3, or Heading 4. These styles can also be applied with ctrl-alt-1, etc. and viewed or modified in the outline view of a document.

Page Breaks

It is often useful to know the page breaks from the print edition of a book. They may be used as targets for subject index entries that identify the page of the entry or to display a text with the pagination of the print edition. Page breaks are marked by the insertion of <pb /> tags, with the n attribute giving the page number of the upcoming page (e.g. <pb n="37"/> or
<pb n="xii" />). These elements should appear at the start of the identified page.

Paragraphs

Normal paragraphs of text may be formatted with the P style. The first paragraph of each section or chapter, if formatted differently, may be formatted with the P_First style. By default, P is indented and P_First is not.

Block Quotes

The BlockQuote paragraph style should be used for extended quotations. A BlockQuote paragraph is normally indented on both sides. There is also some extra space before and after a BlockQuote paragraph.

Verse

Theological books often contain verse -- poetry, hymns, or versified presentation of material such as the Psalms. Verse is often typeset with varying levels of indentation. These are represented with Verse 1, Verse 2, and Verse 3 paragraph styles. In the example below, the first and third line of each stanza is of style Verse 1, the second Verse 2, and the fourth Verse 3.

O God, a world of empty show,

Dark wilds of restless, fruitless quest

Lie round me wheresoe'er I go:

Within, with Thee, is rest.

And sated with the weary sum

Of all men think, and hear, and see,

O more than mother's heart, I come,

A tired child to Thee.

Sweet childhood of eternal life!

Whilst troubled days and years go by,

In stillness hushed from stir and strife,

Within Thine Arms I lie.

Thine Arms, to whom I turn and cling

With thirsting soul that longs for Thee;

As rain that makes the pastures sing,

Art Thou, my God, to me.

G. Ter Steegen

Scripture

In theological texts, scripture passages may be cited, quoted, or explained. Citations refer to a passage, but quotes include the text of a passage in the document. Citations and quotations do not need to be marked, as there will be a program to find them automatically. However, explanations or commentary should be marked.

Explanation or commentary on a passage will be marked with the <scripCom> tag. That is, if a book contains an explanation of the meaning of a scripture passage, you can mark it up as in this example:

<scripCom passage="Mark 7:16">Mark 7:16. This admonition seems to apply to most everyone . . .</scripComm>

Foreign Languages

Passages in foreign languages may be marked with the foreign tag and the lang attribute. For example, the Hebrew passage <foreign lang="he" dir="rtl">yhwh</foreign> may be marked as shown. The optional dir attribute specifies the direction of the text, rtl or ltr, and the lang attribute values are as specified in ISO 639. Some examples are Dutch: nl, English: en, French: fr, German: de, Greek: el, Hebrew: he, Latin: la, Spanish: es, Portuguese: pt, Russian: ru.

If the language uses characters not available in the Latin-1 character set, they may be represented in Unicode by selecting a Unicode font such as Lucida Sans Unicode and using Insert-Symbol as in this Greek example (λογος) and this Hebrew example (הלהי). This is the preferred approach, but it is also possible to use an appropriate font. For example, <foreign lang="el" style="font-family: SIL Galatia">logov</foreign>. The Greek and Hebrew fonts used for the CCEL are the excellent, freeware SIL Galatia and SIL Ezra fonts and related software from the Summer Institute of Linguistics, used here in a Greek example (logov) and a Hebrew example (hwhy).

Horizontal Rules

 

Horizontal rules that span 30% of the page can be inserted with a paragraph using the HR30 style. These would be rendered in html as <hr align="center" width="30%">. The above paragraph is an example. The paragraph below, of style HR, represents a horizontal rule that spans the entire page.

 

Conclusion

Electronic texts formatted according to these guidelines can be converted to html by computer program (after a bit more formatting). Furthermore, all of this special formatting will make the library much more usable: it will enable automatic conversion of texts to other formats, automatic construction of subject and scripture indexes for the whole library, and other uses yet to be imagined.


This document (last modified August 19, 1998) from Believerscafe.com