original in en Egon Willighagen
Joined the Dutch LF team in 1999 and became second editor earlier this year. Is an informational chemistry student at the University of Nijmegen. Plays basketball and enjoys hiking.
The first part of this article will focus on the format of DocBook documents. When DocBook is introduced, i will try to explain what tools are needed to convert these DocBook documents to PDF documents which can be viewed with Acrobat.
DocBook [1] is an SGML application developed to markup documents, just like HTML marks up web documents. In contrast to HTML, DocBook offers no information on the layout of the document. That is the reason why DocBook documents need to be converted to other formats before they can be viewed. Conversion to other formats is done by tools which apply a certain stylesheet to the DocBook document.
Later in this article will be explained what stylesheet you must use for this conversion and what tool applies the stylesheet to the DocBook document. First we are going to see how documents are put together.
DocBook is able to markup two kinds of documents: articles and books. Since they are in principle the same, I will use the article markup as an example. Before I will give an example of a simple article document, first some basic principles about DocBook.
DocBook is in principle a SGML application, just like HTML. But there is also an XML version of DocBook. The XML version is more strict, but easier to read and therefore to easier learn. Since XML itself is also an SGML application, all SGML tools can still be used. The main difference between the SGML and XML variant are the following (and this holds for every XML application):
Now that we covered these important formalities, we can start writing articles in DocBook.
<?xml version="1.0"?> <article> <title>Writing DocBook articles</title> <artheader> <abstract> This article describes how you can use DocBook to develop PDF documents and will cover tools you need to edit DocBook articles and tools to translate them to PDF documents. </abstract> <author> <firstname>Egon</firstname> <surname>Willighagen</surname> </author> <date></date> </artheader> </article>
Not that difficult I would say. We have started an article with a title, a short abstract, a date on which it was written and the name of the author.
The next step is to add sections to the article by making use of section elements:
<?xml version="1.0"?> <article> <title>Writing DocBook articles</title> <artheader> ... the articles header ... </artheader> <section> <title>Introduction</title> </section> ... other sections ... </article>
We have now added an Introduction section to the article. Additional section elements can be used to give Results, Conclusion or any other section.
All text is contained in para elements, comparable with HTML's p elements:
<section> <title>Introduction</title> <para> DocBook is an SGML application developed to markup documents, just like HTML marks up webdocuments. </para> </section>
But besides text a lot of other elements are available. In the rest of this section it is shown how information like examples, lists, pictures and some others can be inserted into the article.
Adding examplesExamples can be added with the use of the example element, like in the following example where an example program is given:
<example> <title>Perl program that converts an XML document into a HTML page.</title> <programlisting> #!/usr/bin/perl -w use diagnostics; use strict; use XML::XSLT; my $XSLTparser = XML::XSLT->new(); $XSLTparser->open_project ("file.xml", "stylesheet.xsl", "FILE", "FILE"); $XSLTparser->process_project; $XSLTparser->print_result(); </programlisting> </example>But example can also contain text, pictures and other information. Adding lists
Like in HTML DocBook can also contain lists. Lists are defined by the itemizedlist element that may contain one or more listitem elements:
<itemizedlist> <listitem> <para>an item</para> </listitem> <listitem> <para>another item</para> </listitem> <listitem> <para>and again an item</para> </listitem> </itemizedlist>Note that here also the text is contained in a para element. Text must always be contained within this element!
Lists can as well be orderd. In that case you can use the orderedlist element instead of the itemizedlist element. By adding a numeration parameter (e.g. <orderedlist numeration="Arabic">) you can set the number type.
Adding picturesImages can be put into the article:
<mediaobject> <imageobject> <imagedata fileref="some_picture.gif" format="gif"/> </imageobject> <textobject> <para> If you were not using <productname>Lynx</productname> you could now see a picture. </para> </textobject> </mediaobject>You can see that beside the picture itself also a text is given. As a matter of fact i could have also added a movie. The stylesheet processor that would convert the DocBook document into PDF could then choose the best medium, which would probably be the picture.
Also note that the word Lynx has mark up. This is a feature specific for mark up language where layout is seperated from information. The article simply states that Lynx is a product of which Lynx is the name. The stylesheet later describes that the productname must be shown in a specific layout, for example, italic. In the following section we will see some additional markup for words.
Markup of wordsAs was shown in the picture example just above, words themselves can have markup. In the table below are some markup elements given for words:
Element | Description |
---|---|
abbrev | An abbreviation, especially one followed by a period.
Example: <para><abbrev>e.g.</abbrev> means for example.</para> |
acronym | An acronym
Example: <para><acronym>DSM</acronym> (chemical company) means "De StaatsMijnen" (=The State Mines).</para> |
Some persons email address
Example: <para>My email is <email>[email protected]</email></para> | |
keyword | One of the article keywords
Example: <para>In my humble opinion <keyword>chemistry</keyword> is very important.</para> |
Now that a short introduction is given about DocBook elements, it is time to move on and start making a PDF document.
Once we have a DocBook document we can convert them to several formats. Besides the obvious PDF, we could also convert the document to a website, a PostScript document, a Tex source file or a RTF (Rich Text Format) document that can be read with WordPerfect, Word, StarWriter and other wordprocessors. But in this article we are only concerned with conversion into a PDF document.
DocBook documents can be written with any editor like Vi and Nedit.
Even better is Emacs: Norman Walsh wrote an Emacs major mode for docbook
[3]
which adds some usefull aspects, like completing element names
or inserting complete template elements.
Besides making your own test article, you can also download
my version
which contains the examples given in this article.
As explained in the beginning of this article we need both a stylesheet and a tool that uses this stylesheet to convert the DocBook article to the PDF format. The stylesheet actually does not convert DocBook directly into PDF, but a TeX step is in between. The stylesheet we use are Norman Walsh's Modular DocBook Stylesheets which [4] are written in DSSSL.
To use these stylesheet DSSSL stylesheet for conversions we need a DSSSL processor. The processor I used is called Jade [5] and was developed by James Clark (he stopped supporting this tool). It is replaced by OpenJade [6], but I haven't used that tool yet.
On my Debian system Walsh's Modular Stylesheets for conversion to PDF are installed in /usr/lib/sgml/stylesheets/dsssl/docbook/nwalsh/print/ which is given with the "-d" parameter for Jade. The "-t" option tells Jade to use a TeX backend:
egonw@localhost> ls -al total 3 -rw-r--r-- 1 egonw egonw 2887 Apr 8 22:06 docbook_article.xml egonw@localhost> jade -t tex -d /usr/lib/sgml/stylesheets/dsssl/docbook/nwalsh/print/docbook.dsl docbook_article.xml egonw@localhost> ls -al total 21 -rw-r--r-- 1 egonw egonw 2887 Apr 8 22:06 docbook_article.xml -rw-r--r-- 1 egonw egonw 17701 Apr 8 22:29 docbook_article.texAs you can see Jade generates a TeX file. This TeX file can then be converted to a PDF file with the pdfjadetex tool contained in the JadeTeX package [7]:
egonw@localhost> ls -al total 21 -rw-r--r-- 1 egonw egonw 2887 Apr 8 22:06 docbook_article.xml -rw-r--r-- 1 egonw egonw 17701 Apr 8 22:29 docbook_article.tex egonw@localhost> pdfjadetex docbook_article.texThis produces a nice docbook_article.pdf. Note that a lot of layout is added like the article title at the top of each page and the use of a different font for the program listing. When I started working with DocBook most time was consumed to understand what combinations I could have. This article shows only one such combination.
The DocBook XML language is very extensive. And so are the means of converting them into other formats. This article only gives a very short introduction. Questions can be posted on the talkback pages for this article. More information can be found at references [8] and [9]. Note that this last reference itself is completly written in DocBook!
Advanced topics that are not covered by this article but are available with DocBook are: