Abstract:This article is a basic introduction to the new web markup language XML and the transformation language XSL. Here I show how the Apache web server can be configured using the servlet engine JServ, to do client side XML/XSL transformation using Apache's Cocoon servlet.
Future updates for this article will be located at http://www.inconn.ie/article/cocoon.htm
(The domain name is currently non-functional but is expected soon.)
The eXtensible Markup Language (XML) is a powerful new web markup language (ISO approval in February 1999). It is a powerful way of separating web content and style. A lot has been written about XML, but to be used effectively in web design the technologies behind it must be understood. To this end I have added my own two pence worth to the already vast amount of literature out there on the subject. This article is not a place to learn XML, nor is it a place where the capabilities of XML are explored to their fullest, but is is a place where the technologies behind XML can be put in practice immediately.
Before I go any further, I should recommend the two sites where definitive information on XML can be obtained. The first is the World Wide Web Consortium (W3C) site http://www.w3.org/. The W3C are responsible for the XML specification. The second site is the XML frequency asked questions site (http://www.ucc.ie/xml/) which will answer any other questions. I also recommend the XML pages hosted by IBM, http://www.ibm.com/xml/, where you will find a wide range of excellent tutorials and articles on XML.
The original web language, SGML (around since 1986) is the mother of all mark-up languages. SGML can be used to document any conceivable system; from complex aeronautical design to ancient Chinese dialects. However, it suffers from being over complex and unwieldy for routine web applications. HTML is basically a very cut down version of SGML, originally designed with the scientific publishing community in mind. It is a simple mark-up language (it has been said "anyone with a pulse can learn it") and with the explosion of the web it is clear that the people with pulses have spoken. Since its foundation the web has grown in complexity and it has long outgrown its lowly beginning in the scientific community.
Today web pages need to be dynamic, interactive, back-ended with databases, secure and eye catching to compete in an ever more crowded cyberspace. Enter XML, a new mark-up language to deal with the complexities of modern web design. XML is only 20 percent as complex as SGML and can handle 80 percent of SGML situations (believe me when you are talking about coding ancient Chinese dialects, 80 percent is plenty). In the following section I will will briefly compare two markup examples, one in HTML and the second is XML, demonstrating the benefits of an XML approach. In the final section I will show you how to set up an Apache web server to serve an XML document so that you may begin immediately to start using XML in your web design.
The following example is a very simple HTML document that everyone will be familiar with:
<html>
<head>
<title>This
is my article</title>
</head>
<body>
<h1
align="center">This is my article</h1>
<h3
align="center">by <a
href="mailto:[email protected]">EoinLane</a></h3>
...
</body>
</html>
Two important points can be made about this document.
The XML equivalent is as follows:
<?xml
version="1.0"?>
<page>
<title>This is my article</title>
<author>
<name>Eoin Lane</name>
<mail>[email protected]</mail>
</author>
...
</page>
The first thing to note is that this document, along with all other valid XML documents, is well formed. To be a well formed document every tag must have an open and close brace. A program searching for the mail address then has only to locate the text in between the opening and closing tags of mail.
The second and crucial point is that this XML document contains just data. There is nothing in this document that dictates how to display the author's name or his mail address. In practice it is easier to think about web design in terms of data and presentation separately. In the design of medium to large web sites, where all the pages have the same look and only the data is changing form page to page, this is clearly a better solution. Also it allows a division of labour where, style and content can be handled by two different departments, working independently. It also allows the possibility of having one set of data with a number of ways of presenting it.
An XML document can be presented using two different methods. One is using a Cascading Style Sheet (CSS) (see http://www.w3.org/style/css/) to markup up the text in HTML. The second is using a transformation language called XSL, which converts the XML document into HTML, XML, pdf, ps, or Latex. As to which one to use, the W3C (the people responsible for these specification) has this to say:
Use CSS when you can, use XSL when you must.
They go on to say:The reason is that CSS is much easier to use, easier to learn, thus easier to maintain and cheaper. There are WYSIWYG editors for CSS and in general there are more tools for CSS than for XSL. But CSS's simplicity means it has its limitations. Some things you cannot do with CSS, or with CSS alone. Then you need XSL, or at least the transformation part of XSL.
So what are the things you cannot do with CSS? In general everything that needs transformations. For example, if you have a list and want it displayed in lexicographical order, or if words have to be replaced by other words, or if empty elements have to be replaced by text. CSS can do some text generation, but only for generating small things, such as numbers of section headers.
XSL (eXtensible Stylesheet Language) is the language used to transform and display XML documents. It is not yet finished so beware! It is a complex document formating language that is itself an XML document. It can be further subdivided in two parts: transformation (XSLT) and formatting objects (sometimes referred to as FO, XSL:FO or simply XSL). For the sake of simplicity I will only deal with XSLT here.
As of the 16th of November 1999 the World Wide Web Consortium has announced the publication of XSLT as a W3C Recommendation. This basically means that XSLT is stable and will not change in the future. The above XML document can be transformed into a HTML document and subsequently displayed on any browser using the following XSLT file.
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/XSL/Transform/1.0">
<xsl:template match="page">
<html>
<head>
<title>
<xsl:value-of
select="title"/>
</title>
</head>
<body bgcolor="#ffffff">
<xsl:apply-templates/>
</body>
</html>
</xsl:template>
<xsl:template match="title">
<h1 align="center">
<xsl:apply-templates/>
</h1>
</xsl:template>
<xsl:template match="author">
<h3 align="center">
by <xsl:apply-templates/>
</h3>
</xsl:template>
<xsl:template match="mail">
<h2 align="left">
<xsl:apply-templates/>
</h2>
</xsl:template>
</xsl:stylesheet>
To learn more about XSLT, I recommend the XSLINFO site (http://www.xslinfo.com/ as a good starting point. Also I found the revised Chapter 14 from the XML Bible to be very good. This revision is based on the specifications that eventually became the recommendation.
With the arrival of the next generation of browsers, i.e. Netscape 5 (currently under construction http://www.mozilla.org/) this transformation with be done client side. When an XML file is requested the corresponding XSL file will be sent along with it, and the transformation will be done by the browser. Currently there are a lot of browsers only capable of displaying HTML, and until then the transformation must be done server side. This can be accomplished by using Java servlets (Java server side programs).
The Cocoon servlet is such a servlet, written by some very clever people at Apache (http://www.apache.org/). It basically takes an XML document and transforms it using a XSL document. An example of such a transformation would be to convert the XML document into HTML so that the browser can display it. So if your web server is configured to run servlets, and you include the cocoon servlet, then you can start designing your web pages using XML. The rest of this article will show exactly how to do this.
I have tested the following instructions on a fresh installation of Red Hat 6.0, so I know it works.
First set up the Apache web server. On Red Hat this comes
pre installed but I want you to blow it away using: rpm
-e --nodeps apache tar zxvf apache_1.3.9.tar.gz ./configure --prefix=/usr/local/apache
--mandir=/usr/local/man --enable-shared=max make make install /usr/local/apache/bin/apachectl start
As of October, IBM have released the Java Development Kit 1.1.8 for Linux. It claims to be faster than the corresponding Blackdown's (http://www.blackdown.org/) and Sun's JDKs. Download IBM JDK (see http://www.ibm.com/java/). Again tar and unzip this into the /usr/local/src/jdk118 directory. Next, download the JavaSoft's JSDK2.0, the solaris version (not JSDK2.1 or any other flavours you might be tempted to get) and tar and unzip it - again I put it in /usr/local/src/JSDK2.0. Add the following or equivalent to /etc/profile to make them available to your system.
JAVA_HOME="/usr/local/src/jdk118"
JSDK_HOME="/usr/local/src/JSDK2.0"
CLASSPATH="$JAVA_HOME/lib/classes.zip:$JSDK_HOME/lib/jsdk.jar"
PATH="$JAVA_HOME/bin:$JSDK_HOME/bin:$PATH"
export PATH CLASSPATH JAVA_HOME JSDK_HOME
java -version
at the command prompt, and you should get back the following messagejava version "1.1.8"
servletrunner
and if all goes well you should get back the following:servletrunner starting with settings:
port = 8080
backlog = 50
max handlers = 100
timeout = 5000
servlet dir = ./examples
document dir = ./examples
servlet propfile = ./examples/servlet.properties
Again, download the latest ApacheJServ (version 1.0 at this time,
although version 1.1 is in it's final beta stage) from Apache's Java Site
(http://java.apache.org/)
and expand it into /usr/local/src/ApacheJServ-1.0/. Configure, make and
install it using the following instructions: ./configure
--with-apache-install=/usr/local/apache --with-jsdk=/usr/local/src/JSDK2.0 make make install
Include /usr/local/src/ApacheJServ-1.0/example/jserv.conf
/usr/local/apache/bin/apachectl restart
Now comes the moment of truth, point your web browser to http://localhost/example/Hello and if you get back the following two lines:Example Apache JServ Servlet
Congratulations, Apache JServ is working!
Finally, download the latest version of Cocoon (version 1.5 at this time) from Apache's Java Site
(http://java.apache.org/).
Cocoon is distributed as a Java jar file and can be extracted using the command
jar. First, create the directory
/usr/local/src/cocoon and then expand the cocoon jar file
into it:
mkdir /usr/local/src/cocoon jar -xvf Cocoon_1.5.jar
Locate the file jserv.properties which you will find in the directory /usr/local/src/ApacheJServ-1.0/example/ and at the end of the section that begins:
# CLASSPATH environment
value passed to the JVM
wrapper.classpath=/usr/local/src/cocoon/bin/xxx.jar
wrapper.classpath=/usr/local/src/cocoon/bin/fop.0110.jar
wrapper.classpath=/usr/local/src/cocoon/bin/openxml.106-fix.jar
wrapper.classpath=/usr/local/src/cocoon/bin/xslp.19991017-fix.jar
repositories=/usr/local/src/cocoon/bin/Cocoon.jar
repositories=/usr/local/src/ApacheJServ-1.0/example
repositories=/usr/local/src/ApacheJServ-1.0/example,/usr/local/src/cocoon/bin/Cocoon.jar
servlet.org.apache.cocoon.Cocoon.initArgs=properties=/usr/local/src/cocoon/bin/cocoon.properties
The JServ engine is now properly configured and all that is left for us to do it to tell Apache to direct any call to an XML file (or any other file you want Cocoon to process) to the Cocoon servlet. For this we need the JServ configuration file, jserv.conf mentioned earlier (again in the same directory). Include the following line:
ApJServAction .xml
/example/org.apache.cocoon.Cocoon
In order to access the cocoon documentation and examples add the following lines to the alias section of your http.conf file:
Alias /xml/ "/usr/local/src/cocoon/"
<Directory "/usr/local/src/cocoon/">
Options Indexes
MultiViews
AllowOverride
None
Order allow,deny
Allow from all
</Directory>
Alias /xml/ example/"/usr/local/src/cocoon/example/"
<Directory "/usr/local/src/cocoon/example/">
Options Indexes
MultiViews
AllowOverride
None
Order allow,deny
Allow from all
</Directory>
Restart the web browser for this to take effect:
/usr/local/apache/bin/apachectl restart
store.memory = 150000
The Cocoon 1.x series has basically been a work in progress. What started out as a simple servlet for static XSL transformation has grown into something much more. With this ongoing development, design considerations taken at the beginning of the project are now hampering future developments as the scale and the scope of the project becomes apparent. To add to this, XSL is also a work in progress, although the current version of XSLT has become a W3C Recommendation (as of November, 16 1999).
Cocoon 2 intends to address these issues and provide us with a servlet for XML transformations that is scalable to handle large quantities of web traffic. Web design of medium to large sites in the future will be based entirely around XML, as its benefit become apparent, and the Cocoon 2 servlet will hopefully provide us with a way to use it effectively.
Even as I have been writing this article, Apache have opened a new site dedicated exclusively to XML (see http://xml.apache.org/). The cocoon project has obviously grown beyond all expectations, and with the coming of Cocoon 2 will be a commercially viable servlet to enable design of web sites in XML to become a reality. The people at Apache deserve a lot of credit for this so write to them and thank them, join the mailing list and generally lend your support. After all this is open source code and this is what Linux is all about.