<?xml version="1.0" ?>
<?xml-stylesheet type="text/xsl" href="/stylesheet.xsl"?> 
<!DOCTYPE doc [ <!ENTITY copy "&#169;"> ]>

<page 	name="Server-side XML Without Tears"
	author="Hugh Sparks"
	description="Server-side XML processing with Redhat 9, Apache, Tomcat, and Cocoon"
	date="December 31, 2003"
	version="1.3"
	xmlns="http://www.csparks.com/XMLWithoutTears"	
>

<int name="Introduction: Why do this?">	
	<p>
	You like the idea of using your own markup for web pages.
	You've created some XML documents and XSL stylesheets
	that suit your bizarre tastes. When you publish these documents,
	you discover that many people are still using Microsoft
	Internet Explorer 5.x or KDE Konqueror and can't appreciate
	your vital missives.
	</p>

	<p>
	If this sounds familiar, you need server-side XML processing:
	This allows you to use all the XML/XSL tricks and tools but
	still serve up plain-vanilla HTML to the vast unwashed.
	</p>
	
	<p>
	A unique feature of the treatment presented here is the
	ability to mix XML documents with normal HTML documents
	on your website. If your site contains XML documents
	configured for client-side processing, these techniques
	will allow you to do server-side processing without
	changing anything. This "transparency" means there
	are no special directories or weird URLS. You can 
	switch back to client-side processing in the future by
	simply turning off the servlet engine.
	</p>
	
	<p>
	If you're just starting out and want to setup server-side XML
	processing with Apache, Tomcat and Cocoon, this is the quick
	fix.
	</p>
	
</int>

<contents  name="Contents"/>

<s name="The Agenda">
	<p>
	1) Install and configure recent versions of:
	</p>
	<code><![CDATA[
	Java SDK
	Jakarta Tomcat
	Jakarta Tomcat Connector JK2
	Apache Cocoon ]]></code>
	
	<p>
	2) Get the Apache web server to handle XML documents by passing
	them to Cocoon WITHOUT rebuilding Apache from source. We will use 
	the stock Redhat httpd rpm.
	</p>
	
	<p>
	3) Minimize fun: You will not be able to play around with the
	Tomcat or Cocoon demos, databases, dancing bears or
	anything else but processing XML files through XSL stylesheets.
	As you can see, we are serious people here.
	</p>
	
	<picture name="No Dancing Bears" url="NoDancingBears.jpg"/>
</s>

<s name="Prerequisites">
	<p>
	You will need your own Linux server or permission to 
	install and configure software on someone else's machine.
	</p>

	<p>
	You need to know about processing XML documents with XSLT.
	A quick introduction is at <link name="XML Web Pages Without Tears."
	ref="XMLWithoutTears"/>
	</p>
	
	<p>
	Although no programming is required, experience with Linux,
	the Bash shell, and software installation from tarballs is
	required. Frankly, it would astonish me to hear that you're
	not a typical software geek if you've read this far.
	</p>	
</s>

<s name="What is all this stuff?">

	<picture name="Pandora's Box" url="PandorasBox.jpg"/>

	<p>
	Java <i>servlets</i> are programs that run on the web server
	to generate responses when special web documnets are requested
	by a client browser.
	</p>
	
	<p>
	Java <i>applets,</i> in contrast, are programs that get
	downloaded to run on the client in when special web documents
	are requested by a browser.
	</p>
	
	<p>
	The Apache web server doesn't know anything about Java, so
	it must use an extension to run servlets. Such a program
	is called a <i>servlet container,</i> or sometimes a
	<i>servlet engine.</i> 
	</p>
	
	<p>
	We'll be using <i>Jakarta Tomcat</i> as our servlet container.
	Tomcat can also function as a fairly complete web server, 
	fill out tax forms, and sort laundry, but we won't using these
	extra capabilities here.
	</p>
	
	<p>
	<i>Apache Cocoon</i> is a Java servlet that processes XML documents
	using XSLT stylesheets. It can do a lot more. In fact, it can do
	more than man was meant to know.
	</p>
			
	<p>
	We have to get Apache talking to Tomcat when special URLs
	are recognized. The software that does this is called a
	<i>connector</i> There are several different connectors
	described at the
	<link name="Jarkart Tomcat Connectors site." ref="TomcatConnectors"/>
	The method presented here, <i>JK2</i>, uses an Apache module
	called <i>mod_jk2.</i>
	</p>

</s>

<s name="Linux distribution dependencies">
	<p>
	This article was developed on a Redhat 9 system, but
	the components described here are installed
	from tar.gz files rather than rpms. In most cases,
	this is the only format available on the the sites that
	originate the software. One exception is the Apache web
	server, which is installed from the binary rpm supplied
	by Redhat.
	</p>
	
	<p>
	In the discussion that follows, I'll use the expression
	$serverRoot to designate the location of the Apache
	configuration and module directories. You don't actually
	have to make a definition for serverRoot, just keep in
	mind where your distribution wants these files.
	</p>

	
	<p>
	In the Redhat distribution, $serverRoot would refer to
	<c>/etc/httpd.</c> 
	</p>
</s>

<s name="Installing the Java SDK">
	
	<p>
	I'm using:
	</p>
	
	<code><![CDATA[
	j2sdk-1.4.2-nb-3.5-linux.bin ]]></code>
		
	<p>
	This version includes "NutBeans", Sun's attempt
	at making an interactive GUI for Java developers.
	If you're not amused by such things, they also offer
	a much smaller "no beans" version. If you're not going
	to do Java programming, this might be a better choice.
	</p>
	
	<p>
	The file is a "shar" archive.
	To install the software, make the bin file executable
	and run it like a program:
	</p>
	
	<code><![CDATA[
	chmod +x j2sdk-1.4.2-nb-3.5-linux.bin
	./j2skd-1.4.2-nb-3.5-linux.bin ]]></code>
	
	<p>
	The installer will offer to put everything in the directory:
	</p>
	
	<code><![CDATA[
	/opt/j2sdk-nb ]]></code>
		
	<p>For Redhat-ish reasons, I changed this to:</p>
	
	<code><![CDATA[
	/usr/java ]]></code>
		
	<p>Go into the /usr/java directory and create a symbolic link: </p>
	
	<code><![CDATA[
	cd /etc/java
	ln -s j2sdk1.4.2 javaHome ]]></code>
		
	<p>Create the file: <c>/etc/profile.d/java.sh</c> containing:</p>
	
	<code><![CDATA[
	# java.sh - Path variables for Java SDK

	export JAVA_HOME=/usr/java/javaHome
	export PATH=$PATH:$JAVA_HOME/bin
	export JAVA_OPTS=-Djava.awt.headless=true ]]></code>

	<p>
	The <i>headless</i> option enabled in the last definition above
	is only required if you operate a headless server:
	a machine that is not running X-Windows. On a headless machine,
	Java programs cannot perform graphical operations without this
	feature.
	</p>
	
	<p>
	Q: Why would someone want Java to do graphics without a
	monitor? <cr/>
	A: Many Java servlets generate off-screen images that
	get downloaded to the client browser.
	</p>
	
	<p>Continuing, we fix the permissions and install the definitions:</p>
	
	<code><![CDATA[
	chmod a+rx /etc/profile.d/java.sh
	source /etc/profile.d/java.sh ]]></code>
	
	<p>To test your path, try:</p>
	
	<code><![CDATA[
	which java
	which javac ]]></code>
		
	<p>If you get good answers, you are ready to go on.</p>
</s>

<s name="Installing Jakarta Tomcat">

	<p>I'm using:</p>
	
	<code><![CDATA[
	jakarta-tomcat-4.1.29.tar.gz ]]></code>
		
	<p>Unpack the archive and create a symbolic link:</p>
	
	<code><![CDATA[
	cd /usr/local/src
	tar xzf jakarta-tomcat-4.1.29.tar.gz
	ln -s jakarta-tomcat-4.1.29 tomcat ]]></code>
			
	<p>Create the file: <c>/etc/profile.d/tomcat.sh</c> containing:</p>
	
	<code><![CDATA[
	# tomcat.sh - Path variables for the Tomcat servlet container

	export TOMCAT_HOME=/usr/local/src/tomcat
	export PATH=$PATH:$TOMCAT_HOME/bin
	export LD_ASSUME_KERNEL=2.2.5 ]]></code>
	
	<p>
	The definition for <c>LD_ASSUME_KERNEL</c> is some kind
	of stability hack suggested by the release notes. On my
	server, Tomcat had to be restarted once a day to prevent
	hang-ups before I read about this trick.
	</p>
	
	<p>Fix the permissions and load the file:</p>
	
	<code><![CDATA[
	chmod a+x /etc/profile.d/tomcat.sh
	source /etc/profile.d/tomcat.sh ]]></code>
		
	<p>To test your path, try:</p>
	
	<code><![CDATA[
	which catalina.sh ]]></code>
		
	<p>
	If you get an answer, you are ready to go on.
	Start tomcat in "testing mode"
	</p>
	
	<code><![CDATA[
	catalina.sh run ]]></code>
		
	<p>
	Wait for the messages to stop.
	It takes longer than you think.
	Fire up your browser and try:
	</p>
	
	<code><![CDATA[
	http://localhost:8080 ]]></code>
		
	<p>If you see the welcome screen from Tomcat, you have success. </p>

	<p>Use &lt;control&gt;C to stop the server before proceeding to the next
	step. </p>

	<picture name="  " url="CheshireCat.gif"/>

</s>
	
<s name="Installing Apache Cocoon">

	<p>
	Recent and future versions of Cocoon are being distributed in
	source form only. It is surprisingly easy to build Cocoon,
	so don't be squeaked.<cr/>
	I'm using:
	</p>
	
	<code><![CDATA[
	cocoon-2.1.3-src.tar.gz ]]></code>
	
	<p>Unpack the archive and create a symbolic link:</p>
	
	<code><![CDATA[
	cd /usr/local/src
	tar xzf cocoon-2.1.3-src.tar.gz
	ln -s cocoon-2.1.3 cocoon ]]></code>
		
	<p>Create the file: <c>/etc/profile.d/cocoon.sh</c> containing:</p>
	
	<code><![CDATA[
	# cocoon.sh - Path variables for cocoon

	export COCOON=/usr/local/src/cocoon
	export PATH=$PATH:/$COCOON ]]></code>
	
	<p>Fix the permissions and load the file:</p>
	
	<code><![CDATA[
	chmod a+x /etc/profile.d/cocoon.sh
	source /etc/profile.d/cocoon.sh ]]></code>
		
	<p>To test your path, try:</p>
	
	<code><![CDATA[
	which cocoon.sh ]]></code>
		
	<p>If you get an answer, you are ready to go on.</p>

	<p>
	This version of cocoon has worked well for me, but
	the "unstable" parts create lots of error messages
	during the compile and also at runtime when starting.
	I like to get rid of as much of this as possible by
	removing the unstable blocks before I build. This
	eliminates all the build and runtime errors. Doing
	this also reduces the size of Cocoon from about
	150 megs down to 50 megs.
	</p>
	
	<p>
	To eliminate the foof, we need to create two files
	in the top level cocoon directory. These contain
	overrides for the build script that will eliminate
	unwanted components.
	</p>
		
	<p>
	Create <c>$COCOON/local.build.properties</c> containing:
	</p>
	<code><![CDATA[
	exclude.webapp.documentation=true
	exclude.webapp.javadocs=true
	exclude.webapp.idldocs=true
	exclude.webapp.samples=true
	exclude.javadocs=true
	exclude.idldocs=true ]]></code>
	
	<p>
	Create <c>$COCOON/local.blocks.properties</c> containing:
	</p>
	<code><![CDATA[
	exclude.block.apples=true
	exclude.block.asciiart=true
	exclude.block.axis=true
	exclude.block.cron=true
	exclude.block.deli=true
	exclude.block.eventcache=true
	exclude.block.jms=true
	exclude.block.linotype=true
	exclude.block.mail=true
	exclude.block.midi=true
	exclude.block.ojb=true
	exclude.block.petstore=true
	exclude.block.portal=true
	exclude.block.precept=true
	exclude.block.proxy=true
	exclude.block.qdox=true
	exclude.block.repository=true
	exclude.block.scratchpad=true
	exclude.block.slide=true
	exclude.block.slop=true
	exclude.block.stx=true
	exclude.block.taglib=true
	exclude.block.webdav=true
	exclude.block.woody=true ]]></code>
	
	<p>Now we can build Cocoon from the source:</p>
		
	<code><![CDATA[
	cd $COCOON
	./build.sh ]]></code>
				
	<p>Test cocoon using its built-in servlet container:<cr/>
	(This test doesn't require Tomcat.)</p>
	
	<code><![CDATA[
	cocoon.sh servlet ]]></code>
		
	<p>Wait for the messages to stop, fire up your browser and try:</p>
	
	<code><![CDATA[
	http://localhost:8888 ]]></code>
		
	<p>If you see the welcome screen, you have success.</p>
	
	<p>
	Use &lt;control&gt;C to stop the servlet before proceeding to 
	the next step.
	</p>

	<picture name="Cocoon Welcome" url="DementedAlienCocoon.jpg"/>

</s>

<s name="Updating the xml parser in Tomcat">
	<p><i>
	You only need to do this with versions of tomcat older
	than 4.1.29</i></p>
	<p>
	First remove the old XML parser:
	</p>
		
	<code><![CDATA[
	rm $TOMCAT_HOME/common/endorsed/xercesImpl.jar ]]></code>
		
	<p>Then copy updated files from Cocoon:</p>
	
	<code><![CDATA[
	cd $COCOON/build/webapp/WEB-INF/lib
		
	cp xerces-*.jar xalan-*.jar xml-apis.jar \
		$TOMCAT_HOME/common/endorsed ]]></code>	
</s>

<s name="Testing Cocoon with Tomcat">
	
	<p>Move the cocoon webapp to tomcat's webapps directory:</p>
	
	<code><![CDATA[
	mv $COCOON/build/webapp $TOMCAT_HOME/webapps
	mv $TOMCAT_HOME/webapps/webapp $TOMCAT_HOME/webapps/cocoon ]]></code>
			
	<p>Start tomcat in your shell window:</p>
	
	<code><![CDATA[
	catalina.sh run ]]></code>
	
	<p>Wait for the messages to stop, fire up your browser and visit:</p>
	
	<code><![CDATA[
	http://localhost:8080/cocoon/ ]]></code>
	
	<p>When you see the welcome screen, you have success.<cr/>
	Use &lt;control&gt;C in the shell window to stop tomcat.</p>
</s>

<s name="Starting Tomcat as a service at boot time">

	<p>
	When using Tomcat in production on a server, it is considered
	wise to run it under a user account rather than <i>root.</i>
	</p>
	
	<p>
	To prepare for this arrangement, we create a tomcat user and group:
	</p>
	<code><![CDATA[
	useradd tomcat 
	# Non-redhat systems may require an additional command to create the group. ]]></code>

	<p>
	Modify the ownership of the distribution files:
	</p>
	<code><![CDATA[
	chown -R tomcat:tomcat $TOMCAT_HOME ]]></code>

	<p>
	If you tinker with Tomcat or Cocoon while logged in as
	root (shame!) you must remember to do this chown step
	again to make sure all the files remain accessible to
	the tomcat user.
	</p>
	
	<p>Create the file: <c>/etc/rc.d/init.d/tomcat</c> containing:</p>

	<code><![CDATA[
	#!/bin/bash
	#
	# Startup script for the Jakarta Tomcat servlet container
	#
	# chkconfig: 345 20 80
	# description: Starts the Tomcat servlet engine
	
	. /etc/init.d/functions
	. /etc/profile.d/tomcat.sh

	if [ ! $TOMCAT_HOME ] ; then
		echo "Please define TOMCAT_HOME in /etc/profile.d/tomcat.sh"
		exit 1
	fi
	
	TOMCAT_LOCK=/var/lock/subsys/tomcat
	
	case "$1" in
		start)
			chown -R tomcat:tomcat $TOMCAT_HOME/*
			action $"Starting Apache Tomcat: " \
				su -l tomcat -c '$TOMCAT_HOME/bin/startup.sh'
			if [ $? = 0 ] ; then
				touch $TOMCAT_LOCK
			fi
			;;
		stop)
			action $"Stopping Apache Tomcat: " \
				su -l tomcat -c '$TOMCAT_HOME/bin/shutdown.sh'
			rm -f $TOMCAT_LOCK
			rm -rf $TOMCAT_HOME/work
			;;
		status)
			if [ -e $TOMCAT_LOCK ] ; then
				echo $"Tomcat appears to be running"
			else
				echo $"Tomcat is not running"
			fi
			;;
		restart)
			$0 stop
			sleep 2
			$0 start
			;;
		*)
			echo $"Usage: $0 {start|stop|status|restart}"
        		exit 1
	esac

	exit 0 ]]></code>
		
	<p>
	Note that the script fixes the tomcat ownership for
	all webapps each time it starts. This is a very quick
	step if you don't compile in all the examples and it prevents
	truely obscure errors when you forget and edit something
	as root.
	</p>
	
	<p>
	Note that the $TOMCAT_HOME/work directory gets deleted when
	we shut down. This is where Tomcat keeps its dynamic state
	information, compiled jsp pages and other detritus. I have
	found that getting rid of these files between runs prevents
	many peculiar and irritating behaviors such as persistently 
	serving old versions of modified web pages and spraying
	on my furniture.
	</p>
	
	<ss name="About the Tomcat startup script">
	
	<p>
	This script implicitly depends on several environment variables.
	These were defined by /etc/profile.d scripts listed in the
	<link name="java" sec="Installing the Java SDK"/> and
	<link name="tomcat" sec="Installing Jakarta Tomcat"/>
	installation procedures: 
	</p>
	
	<code><![CDATA[
	Defined in /etc/profile.d/java.sh:

		JAVA_HOME	: Location of java installation
		JAVA_OPTS	: Options for starting java
	
	Defined in /etc/profile.d/tomcat.sh:

		TOMCAT_HOME	 : Location of Tomcat installation
		LD_ASSUME_KERNEL : Stability fix suggested by the release notes ]]></code>
	
	</ss>

	<ss name="Test the Tomcat startup script">
	<p>
	In the shell window, execute:
	</p>
	<code><![CDATA[
	service tomcat start ]]></code>
	<p>
	You should be able to visit these urls in your browser:
	</p>
	
	<code><![CDATA[
	Tomcat	http://localhost:8080

	Cocoon	http://localhost:8080/cocoon ]]></code>

	<p>
	To install the tomcat script for automatic activation at boot time:
	</p>
	<code><![CDATA[
	chkconfig --add tomcat ]]></code>
	
	<p>
	Leave tomcat running and proceed to the next step.
	</p>

	</ss>
</s>

<s name="Building mod_jk2 from source">

	<p>
	We now build the <i>jk2 connector,</i> an Apache web server module.
	This module lets Apache redirect requests for designated documents 
	to Tomcat. All other documents are handled in Apache as usual.
	</p>

	<p>
	If you peruse the Apache web site, you will find four 
	different connector projects. All of them do pretty much
	the same thing as far as we're concerned.
	</p>
	
	<code><![CDATA[
		mod_jserv	Obsolete, damned, blasted.
		mod_webapp	Deprecated, despised, shamed.
		mod_jk		Tolerated. Probably immoral.
		mod_jk2		Approved. Enabled by default. ]]></code>
	
	<p>
	I used mod_webapp for a year with no problems at all.
	Next, to be fashionable, I tried several versions of mod_jk,
	which never worked quite as well. Early this year, I tried
	mod_jk2 and found that it wouldn't stay running overnight.
	After reverting to mod_jk for a while, I finally switched
	back to the newest version of mod_jk2. I feel better already.	
	</p>

	<picture name="Users of Deprecated Software" url="DeprecatedSoftware.jpg"/>

	<p>I'm using:</p>
	
	<code><![CDATA[
	jakarta-tomcat-connectors-jk2-2.0.2-src.tar.gz ]]></code>

	<p>Unpack the archive to obtain:</p>
	
	<code><![CDATA[
	jakarta-tomcat-connectors-jk2-2.0.2-src ]]></code>
			
	<p>We will call this directory $con for short:</p>
	
	<code><![CDATA[
	export con=/usr/local/src/jakarta-tomcat-connectors-jk2-2.0.2.src ]]></code>
	
	<p>Some libraries in Redhat 9 need new symbolic links:</p>
	
	<code><![CDATA[
	cd /usr/lib
	ln -s libapr.so libapr-0.so ]]></code>
	
	<p>
	You must have the httpd-devel rpm installed, so take
	care of that if necesssary.
	</p>

	<p>Go into the "native2" directory:</p>
	
	<code><![CDATA[
	cd $con/jk/native2 ]]></code>
		
	<p>Run the "pre-configure" script:</p>
	
	<code><![CDATA[
	./buildconf.sh ]]></code>
		
	<p>Run configure:</p>
	
	<code><![CDATA[
	./configure --with-apxs2=/usr/sbin/apxs ]]></code>
	
	<p>Now do the make:</p>
	
	<code><![CDATA[
	make ]]></code>
		
	<p>The make script asks you to run libtool:</p>
	
	<code><![CDATA[
	libtool --finish /usr/lib/httpd/modules ]]></code>
	
	<p>The result is in:</p>
	
	<code><![CDATA[
	$con/jk/build/jk2/apache2/mod_jk2.so ]]></code>
	
</s>

<s name="Installing and configuring mod_jk2">

	<ss name="Configuring Apache">
	
	<p>Copy mod_jk2.so to the Apache modules directory:</p>
	
	<code><![CDATA[
	cp $con/jk/build/jk2/apache2/mod_jk2.so $serverRoot/modules ]]></code>
	
	<p>Create the file: <c>$serverRoot/conf.d/mod_jk2.conf</c> containing:</p>

	<code><![CDATA[
	LoadModule jk2_module modules/mod_jk2.so ]]></code>

	<p>Edit <c>$serverRoot/conf/httpd.conf:</c><cr/>
	You must explicitly configure the port number for your host:
	</p>
	
	<code><![CDATA[
	ServerName www.csparks.com:80 ]]></code>
	
	<p>If you use any virtual hosts, they each need port numbers:</p>
	<code><![CDATA[
	NameVirtualHost *:80

	<VirtualHost *:80>
        	ServerName www.csparks.com
       	 	DocumentRoot /var/www/html
	</VirtualHost>

	<VirtualHost *:80>
		ServerName hardinge.csparks.com
		DocumentRoot /var/www/html/hardinge
	</VirtualHost> ]]></code>

	</ss>
	
	<ss name="Configuring mod_jk2">
	
	<p>
	I found many aspects of the documentation about mod_jk2 at Apache.org
	a bit frustrating. By now, no doubt, that unfortunate state has been
	rectified by the dedicated writers who contribute to the project.
	</p>
	
	<p>
	The jk2 module uses the configuration file <c>$serverRoot/conf/workers2.properties.</c>
	The following exampled worked for me. I choose to keep the shared memory
	file and log file in the Redhat directory for the Apache logs. For other
	Linux distributions, there would be different appropriate locations.
	</p>
	
	<picture name="WARNING: Folklore ahead" url="Folklore.jpg"/>
	
	<p>Create the file: <c>$serverRoot/conf/workers2.properties</c> containing:</p>

	<code><![CDATA[

	[logger.file:]
	level=EMERG
	file=/var/log/httpd/mod_jk2.log

	[channel.socket:]
	info=Ajp13 channel forwarding over a tcp socket
	host=localhost
	port=8009

	[shm:]
	info=Shared memory for multiprocessing
	file=/var/log/httpd/mod_jk2.shm
	size=1048576

	[status:]
	info=Status worker

	[uri:/jkstatus/*]
	info=Display jk2 status page
	group=status

	[uri:/cocoon/*]
	info=Display Cocoon welcome page ]]></code>
		
	<p>
	The <c>[shm:]</c> section sets up a "shared memory" file
	used when running with multiple processes. We're not doing
	that here, but configuring the file prevents multiple error
	messages in the log file. You have to create this file by
	hand:
	</p>
	
	<code><![CDATA[
	dd if=/dev/zero of=/var/log/httpd/mod_jk2.shm bs=1048576 count=1 ]]></code>
	
	<p>
	Some developers like to put this file in the $TOMCAT_HOME/work directory.
	This seems like a good idea, but I like to blast the work directory
	every time Tomcat restarts. Because Apache and mod_jk2 may still be
	running, it doesn't seem like a good idea to delete this file.
	</p>
		
	<p>
	The <c>[uri:/xml/*]</c> section tells Apache to send all URLs
	that begin with "xml/" to Tomcat.
	</p>

	<p>Restart Apache to load the new configuration:</p>
	<code><![CDATA[
	service httpd restart ]]></code>	
	
	<p>The <c>[status:]</c> section configures a url where you should see
	a status report in your browser window:</p>
	<code><![CDATA[
	http://localhost/jkstatus ]]></code>

	</ss>
	
	<ss name="Configuring the Tomcat side">

	<p>
	The $TOMCAT_HOME/conf/server.xml that comes with the distribution
	will work 'out of the box' but I use the following minimal configuration.
	It gets rid of all the examples and disables the Tomcat web server.
	It only allows Tomcat to service requests sent throught the mod_jk2 connector.
	This is, IMHO, a security advantage.
	</p>
	
	<p>
	If you want to preserve the original server.xml file, rename or
	move it somewhere else.
	</p>
	
	<p>Create the file: <c>$TOMCAT_HOME/conf/server.xml</c> containing:</p>
	
	<code><![CDATA[
	<Server port="8005" shutdown="SHUTDOWN" debug="0">
	<Service name="Tomcat-Standalone">
		<Connector className="org.apache.coyote.tomcat4.CoyoteConnector"
			port="8009" minProcessors="5" maxProcessors="75"
			enableLookups="true" redirectPort="8443"
			acceptCount="10" debug="0" connectionTimeout="20000"
			useURIValidationHack="false"
			protocolHandlerClassName="org.apache.jk.server.JkCoyoteHandler"/>
		<Engine name="Standalone" defaultHost="localhost" debug="0">
			<Logger className="org.apache.catalina.logger.SystemErrLogger"/>
			<Host name="localhost" debug="0" appBase="webapps"
				unpackWARs="true" autoDeploy="true">
				<Context path="" docBase="ROOT" debug="0" reloadable="true"/>
			</Host>
		</Engine>
	</Service>
	</Server> ]]></code>
	
	<p>Edit the file: <c>$TOMCAT_HOME/conf/jk2.properties</c> so it contains only this line:</p>
	<code><![CDATA[
	shm.file=/var/log/httpd/mod_jk2.shm ]]></code>
	
	<p>This location for shared memory file must agree with the value set in workers2.properies.</p>
		
	<p>
	Make sure the distribution files still belong to tomcat:
	</p>
	<code><![CDATA[
	chown -R tomcat:tomcat $TOMCAT_HOME ]]></code>
	
	<p>Restart tomcat to load the new configuration:</p>
	<code><![CDATA[
	service tomcat restart ]]></code>	
	
	</ss>

	<ss name="Testing">
			
	<p>In a web browser window try this URL:</p>
	<code><![CDATA[
	http://localhost/cocoon/ ]]></code>
	
	<p>If you see the Cocoon welcome page, all is well.</p>
	
	</ss>
	
</s>

<s name="Configuring log rotation">
	<picture name=" " url="LogRolling.jpg"/>
	
	<p>
	Both Cocoon and Tomcat like to pile up numerous huge log files
	on your server.
	</p>

	<p>
	Tomcat log files can be difficult to manage. By default, Tomcat
	creates and rotates several logfiles by itself, but never deletes the oldest
	ones. We have mitigated this problem in the server.xml file shown above.
	It configures the Engine container to use SystemErrLogger, which goes
	to catalina.out by default.
	</p>
	
	<p>
	Since we aren't using Tomcat to normalize the axis of the Earth or bring
	back the Elder Gods, we don't need the Administrator and Manager
	applications. To eliminiate them and their annoying log files, 
	simply remove or rename these files:
	</p>
	
	<code><![CDATA[
	$TOMCAT_HOME/webapps/admin.xml
	$TOMCAT_HOME/webapps/manager.xml ]]></code>
	
	<p>
	With these changes, we end up with only one log file:
	</p>
	
	<code><![CDATA[
	$TOMCAT_HOME/logs/catalina.out ]]></code>
	
	<p>To manage this file, create: <c>/etc/logrotate.d/tomcat.rotate</c> containing:</p>
	
	<code><![CDATA[
	/usr/local/src/tomcat/logs/catalina.out
	{	copytruncate
		daily
		rotate 5
		missingok
	} ]]></code>
	
	<p>
	Cocoon has a well-behaved loggger that will rotate under
	the control of a configuration file:
	</p>
	<code><![CDATA[
		$TOMCAT_HOME/webapps/cocooon/WEB-INF/logkit.xconf ]]></code>
	<p>
	After becoming weary of editing this large file every time I updated
	Cocoon, I decided to go with the default settings and let logrotate
	take care of the mess. Create the file:
	<c>/etc/logrotate.d/cocoon.rotate</c> containing:
	</p>
	
	<code><![CDATA[
	/usr/local/src/tomcat/webapps/cocoon/WEB-INF/logs/*.log
	{	copytruncate
		daily
		rotate 5
	} ]]></code>
	
	<p>
	Note the use of full path names in the logrotate scripts.
	I found that the shell script variables defined in /etc/profile.d
	for tomcat and cocoon are not available to the logrotate program.
	</p>
	
	<p>
	At this point you might want to stop Tomcat, clean out the
	logfiles and restart to get the "one log to rule them all"
	configuration:
	</p>
	<code><![CDATA[
	service tomcat stop
	rm -f $TOMCAT_HOME/logs/*
	service tomcat start ]]></code>
	
</s>

<s name="Making your website a Cocoon sub-site">
	<p>
	By making your main website a Cocoon subsite, you
	can mix xml files served by Cocoon with all your other
	web documents served by Apache.
	</p>
	
	<p>Stop tomcat:</p>
	<code><![CDATA[
	service tomcat stop ]]></code>

	<p>
	We will make cocoon the default webapp by editing
	<c>$TOMCAT_HOME/conf/server.xml.</c> Inside the &lt;Host&gt;
	element, change the value of the docBase attribute to read:
	</p>
	
	<code><![CDATA[
	docBase="cocoon" ]]></code>

	<p>
	In the top-level cocoon directory, create a symbolic link
	to your Apache web site:
	</p>
		
	<code><![CDATA[
	cd $TOMCAT_HOME/webapps/cocoon
	ln -s /var/www/html xml ]]></code>

	<p>
	The name of this symbolic link must match a trigger url
	configured for mod_jk2.<cr/>
	Edit the file <c>$serverRoot/conf/workers2.properties</c>
	and change the trigger <c>[uri:/cocoon/*]</c> so it reads:
	</p>

	<code><![CDATA[
	[uri:/xml/*]
	info=Access xml documents on the website]]></code>
	
	<p>
	You must restart Apache for this change to take effect:
	</p>
	<code><![CDATA[
	service httpd restart ]]></code>
		
	<p>
	Any xml file on your website will get sent to cocoon if
	it has the "xml/" prefix:
	</p>

	<code><![CDATA[
	http://localhost/xml/test.xml ]]></code>

	<p>
	Note that there is no "xml" directory on your website.
	We'll get rid of the "xml/" prefix completely in the 
	next section.
	</p>
</s>

<s name="Achieving complete transparency">
	<p>
	At this point, you can integrate xml files with the other
	documents on your website. Normal html will be handled by
	Apache while xml files with go to Cocoon via Tomcat. The
	remaining annoyance is that pesky "/xml" path element
	in the URL: This gives away all your secrets!
	</p>
	
	<p>
	The motive for hiding the "/xml" trigger is more than
	cosmetic: You would like to organize your website so that
	someday, when the majority of client browsers support xml,
	you will be able to make them do all the work.
	Toward this end, we will now hide the "/xml" path element
	using Apache's mod_rewrite feature.
	</p>
	
	<p>
	Using mod_rewrite with Tomcat connectors has one or two
	pitfalls that have discouraged some developers. By following
	these guildlines, you will avoid all difficulties.
	</p>
	
	<p>
	The first pitfall concerns the order of url processing:
	we must have the rewrite rules applied before the
	Tomcat connector sends the request to cocoon. This can
	be insured by loading mod_rewrite <i>before</i> we load
	mod_jk2. If you are using the Redhat 9 httpd package, the
	default /etc/httpd/conf/httpd.conf file will automatically
	load all the module configuration files in /etc/httpd/conf.d
	before loading mod_rewrite, so all is well. If you are 
	using your own Apache configuration file, you must insure
	that mod_rewrite loads after mod_jk2.
	</p>
	
	<p>
	The second pitfall concerns virtual hosts. The method for
	dealing with these is given in the configuration examples
	that follow.
	</p>
	
	<p>
	We will be adding some directives to the end of your
	/etc/httpd/conf/httpd.conf file:
	</p>
	
	<ss name="Rewrite directives without virtual hosts">
	
	<code><![CDATA[
	RewriteEngine on
	RewriteRule (.*)\.xml$ xml/$1.xml [P] ]]></code>

	</ss>
	
	<ss name="Rewrite directives with virtual hosts">

	<code><![CDATA[
	NameVirtualHost *:80

	<VirtualHost *:80>
        	ServerName www.yourDomain.com
        	DocumentRoot /var/www/html
		RewriteEngine on
		RewriteRule (.*)\.xml$ xml/$1.xml [P]
	</VirtualHost>

	<VirtualHost *:80>
        	ServerName host1.yourDomain.com
	        DocumentRoot /var/www/html/host1Root
		RewriteEngine on
		RewriteRule (.*)\.xml$ xml/$1.xml [P]
	</VirtualHost>

	<VirtualHost *:80>
        	ServerName host2.yourDomain.com
	        DocumentRoot /var/www/html/host2Root
		RewriteEngine on
		RewriteRule (.*)\.xml$ xml/$1.xml [P]
	</VirtualHost> ]]></code>
	
	</ss>
	
	<p>
	The only difference is the placement of the rewrite
	directives. When using virtual hosts, you must 
	configure mod_rewrite in each virtual host that needs
	to handle xml files.
	</p>
	
	<p>
	In either case, the RewriteRule that does the magic is the
	same. It simply appends the "xml/" path element onto any
	URL that ends with ".xml".
	</p>
	
	<p>
	The "[P]" flag on the end of the rule makes the browser
	display the original URL rather than the rewritten version.
	</p>
	
	<p>
	This method is so successful, there is no way to see the
	original xml file in a web browser.
	In order to force client-side processing for testing, we
	add this rewrite rule:
	</p>
	
	<code><![CDATA[
	RewriteRule (.*)\.XML$ xml/$1.XML [P] ]]></code>
	
	<p>
	You will also need this match pattern in your sitemap.xmap:
	</p>
	
	<code><![CDATA[
	<map:match pattern="**.XML">
		<map:generate src="{1}.xml"/>
		<map:serialize type="xml"/>
	</map:match> ]]></code>
	
	<p>
	With these changes, you can request an xml file
	by changing the URL so it ends with the capital
	letters ".XML". The file will be sent directly to your
	browser without server-side processing.
	</p>
	
	<p>
	Members of the audience that are not tranced-out at this
	point may note that a simpler match pattern will work:
	</p>
	
	<code><![CDATA[
	<map:match pattern="**.XML">
		<map:read mime-type="text/xml" src="{1}.xml"/>
	</map:match> ]]></code>
	
	<p>
	This rule will send the raw xml file to the client browser,
	but it will not allow the browser to view the document source.
	</p>
	
</s>

<s name="Configuring your sitemap.xmap">
	<p>
	You don't need to edit the default Cocoon sitemap in any way.
	Instead, create a sub-sitemap in your website directory.
	</p>
	
	<p>
	You needs will vary, but here is a minimal sitemap that will
	process all XML documents through a single XSL stylesheet.
	</p>
	
	<p>Create the file: <c>/var/www/html/sitemap.xmap</c> containing:</p>
	
	<code><![CDATA[
	<?xml version="1.0" encoding="UTF-8"?>
	<map:sitemap xmlns:map="http://apache.org/cocoon/sitemap/1.0">
	<map:pipelines>

	<map:pipeline>
		<map:match pattern="**.xml">
			<map:generate src="{1}.xml"/>
			<map:transform src="test.xsl"/>
			<map:serialize type="html"/>
		</map:match>
	</map:pipeline>

	</map:pipelines>
	</map:sitemap> ]]></code>
</s>

<s name="Testing the whole thing">
	<p>
	If you've come this far, you're ready to test everything.
	Create an xml document: <c>/var/www/html/test.xml</c> containing:
	</p>
	
	<code><![CDATA[
	<?xml version="1.0"?>
	<?xml-stylesheet type="text/xsl" href="test.xsl"?>
	
	<page name="My XML Web Page">
		<p>Here we see the little man</p>
		<p>Behind the little curtain.</p>
		<p>If Tomcat doesn't drive you nuts,</p>
		<p>Cocoon will almost certain.</p>
	</page> ]]></code>
	
	<p>
	Note: The xml-stylesheet tag in the example above is
	not used by our server-side processing. We include this tag
	to illustrate how the same documents could be set up for either
	client-side or server-side processing.
	</p>
	
	<p>
	Create an XSL stylesheet: <c>/var/www/html/test.xsl</c> containing:
	</p>
	
	<code><![CDATA[
	<?xml version="1.0"?>
	<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

	<xsl:template match="page">
		<html>
		<head><title>
			<xsl:value-of select="@name"/>
		</title></head>
		<body>
			<h3><xsl:value-of select="@name"/></h3>
			<xsl:apply-templates/>
		</body>
		</html>
	</xsl:template>	

	<xsl:template match="p">
		<i><xsl:value-of select="."/></i><br/>
	</xsl:template>
	
	</xsl:stylesheet> ]]></code>
	
	<p>
	Stop and restart everything to make sure you have a clean slate:
	</p>
	
	<code><![CDATA[
	service tomcat stop
	service httpd stop
	service tomcat start
	service httpd start 
	
	It takes Tomcat about 20 seconds to get going...]]></code>
	
	<p>
	Now fire up your browser and visit:
	</p>
	
	<code><![CDATA[
	http://localhost/test.xml ]]></code>
	
	<p>You should see the little man behind the curtain.</p>
	
	<p>
	To force client-side processing, use this URL:
	</p>
	<code><![CDATA[
	http://localhost/test.XML ]]></code>
	
	<p>
	Everything should look the same.
	</p>
	
	<picture
		name="Inspect the result of all your efforts"
		url="InspectResults.jpg"/>

</s>


<references name="Downloads">
	<def ref="DownloadJava"
		name="Java"
		url="http://java.sun.com/j2se/1.4.2/download.html"/>
	<def ref="DownloadTomcat"
		name="Tomcat"
		url="http://jakarta.apache.org/site/binindex.cgi"/>
	<def ref="DownloadConnector"
		name="Tomcat Connectors"
		url="http://jakarta.apache.org/builds/jakarta-tomcat-connectors"/>
	<def ref="DownloadCocoon"
		name="Cocoon"
		url="http://cocoon.apache.org/mirror.cgi"/>
</references>

<references name="Primary References">
	<def ref="Java"
		name="Java at Sun.com" 
		url="http://java.sun.com"/>		
	<def ref="Apache"
		name="Apache Software Foundation" 
		url="http://httpd.apache.org"/>
	<def ref="Tomcat"
		name="Jakarta Tomcat Project"
		url="http://jakarta.apache.org/tomcat/index.html"/>
	<def ref="TomcatConnectors"
		name="Tomcat Connectors"
		url="http://jakarta.apache.org/tomcat/tomcat-4.1-doc/jk2"/>
	<def ref="Cocoon"
		name="Apache Cocoon Project"
		url="http://cocoon.apache.org"/>
</references>

<references name="Other tutorials">
	<def 	name="John Turner - Apache Tomcat HOWTO"
		url="http://www.johnturner.com/howto/"/>
	<def 	name="Michael Cardon - Apache Tomcat on Linux"
		url="http://www.cardon.biz/docs/tomcat/"/>
	<def 	name="Pascal Chong - Apache Tomcat on Linux"
		url="http://www.cymulacrum.net/tomcat/tomcat_toc.html"/>
	<def 	name="Oscar Carrillo - Installing Web Services"
		url="http://daydream.stanford.edu/tomcat/install_web_services.html"/>
	<def	name="James Goodwill - Demystifying Tomcat 4's server.xml File"
		url="http://www.onjava.com/lpt/a/1618"/>
	<def 	name="Hundreds more..."
		url="http://google.com/search?q=(configuring+OR+tutorial)+AND+(tomcat OR cocoon)"/>
</references>

<references name="More of my stuff">
	<def 	ref="XMLWithoutTears"
		name="XML Web Pages Without Tears"
		url="http://www.csparks.com/XMLWithoutTears/index.xhtml"/>
	<def	name="Email complaints and corrections to me"
		url="mailto:hugh@csparks.com"/>
</references>

<s name="Credits and Apologies">
	<p>
	<cite title="Dancing Bears" name="William Beard"/>
	<cite title="Pandora's Box" name="Arthur Rackham"/>
	<cite title="Cheshire Cat" name="Walt Disney Studios&copy;"/>
	<cite title="Alien Egg" name="Twentieth Century Fox&copy;"/>
	<cite title="Scene from Dante's Inferno" name="Gustave Dore"/>
	<cite title="Aebleskivers" name="Joe Nolte"/>
	<cite title="Log Rolling" name="Matt Pranger"/>
	<cite title="New York Art Critics" name="Glen Baxter&copy;"/>
	</p>
</s>

</page>
