Metaware

The Technology Behind the Text

This book is being written in DocBook XML, which is the format that O'Reilly likes for cross publishing to paper, the web, and other forms. We're bringing that vision to the wiki as well. All the Chapters are being automatically translated via a scripted pipeline to Dokuwiki formatting and are posted here periodically.

Building a Doku Wiki from a DocBook XML document

I've hacked together something out of Saxon-B, PERL, tidy and sed all running on Cygwin on my PC. Don't look below if you have a weak stomach.

— Randy

todoku.sh

rm -f $1.doku, temp*.doku
cat $1 | sed 's/†/X/g' | sed 's/—/-/g' | sed 's/…/\.\.\./g' | tidy -xml -bare -w 3000 >temp.tidy.doku
cat temp.tidy.doku | parafix.pl | sed -e 's/<entry>/<entry align="center">/g' >temp.parafix.doku
cat temp.parafix.doku | sed -e 's|\/emphasis>|\/emphasis> |g'| sed -e 's|\/emphasis>  |emphasis> |g' | sed -e 's|\/emphasis> \([^0-9a-zA-Z(]\)|\/emphasis>\1|g' >temp.emphasis.doku
java -jar saxon9.jar -t -s:temp.emphasis.doku -xsl:docbook2dokuwiki.xsl -o:temp.xslt.doku
sed -e 's/&lt;/</g' < temp.xslt.doku | sed -e 's/&gt;/>/g' | sed -e 's|figs/incoming/||' | sed -e 's/amp\;//g' | sed -e 's|#Chapter_|/doku.php?id=chapter|g' |  sed -e 's|\"#Chap_\([0-9]*\)\-|\"/doku.php?id=chapter\1#Chap_\1-|g' | tail +2 >$1.doku
mv $1.doku .

 

Here's what it does

  • Uses sed to translate some non-utf8 characters to plain ASCII: dagger, em-dash, and …
  • Uses tidy to reformat and suppress leading whitespace. Dokuwiki interprets it as significant. Sigh.
  • This messes up trailing whitespace behind emphasis tags, so I fix that with sed.
  • I don't understand xsl: vars yet, so just add an align to every table entry that doesn't explicitly set it.
  • Dokuwiki plugins have their own tag parameter formatting (no equal signs) so I have to fix the longhand lt and gt.
  • I transform inter-chapter IDs into urls.
  • The tail command strips off the top xsl header added by saxon for easy cut and paste.

parafix.pl

#!/usr/bin/perl
$s = <STDIN>;
$inpara = 0;
while (chomp($s) > 0) {
  if (index($s,"<para>") > -1 ) {$inpara = 1;}
  if (index($s,"</para>") > -1 ) {$inpara = 0;}
  if ($inpara) {
    chop($s);
    print ($s);
  } else {
    print ($s . "\n");
  }
  $s = <STDIN>;
}

This perl script simply joins all the lines between para tags. It is very sensitive to input and output line ending conventions. If you shorten the .sh script above, you'll likely break the behavior, so watch out.

docbook2dokuwiki.xsl

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

<xsl:template match="sect1|sect2|sect3|sect4|example|varlistentry"><xsl:if test="@id"><html>&lt;a name='<xsl:value-of select="@id"/>'&gt;&lt;/a&gt;</html></xsl:if><xsl:apply-templates/></xsl:template>

<xsl:template match="sect1/title">
==== <xsl:apply-templates/> ====
</xsl:template>

<xsl:template match="sect2/title">
=== <xsl:apply-templates/> ===
</xsl:template>

<xsl:template match="sect3/title">
== <xsl:apply-templates/> ==
</xsl:template>

<xsl:template match="chapter/title">
===== <xsl:apply-templates/> =====
</xsl:template>

<xsl:template match="title">
** <xsl:apply-templates/><xsl:text> **

</xsl:text></xsl:template>


<xsl:template match="para">
<xsl:apply-templates/><xsl:text>

</xsl:text>
</xsl:template>

<xsl:template match="indexterm"></xsl:template>

<xsl:template match="xref"><html><a href="#{@linkend}"><xsl:value-of select="@linkend"/></a>&amp;nbsp;</html></xsl:template>
<xsl:template match="emphasis">//<xsl:apply-templates/>//</xsl:template>
<xsl:template match="code">''<xsl:apply-templates/>'' </xsl:template>
<xsl:template match="citetitle">//<xsl:apply-templates/>//</xsl:template>
<xsl:template match="citation">//<xsl:apply-templates/>//</xsl:template>
<xsl:template match="itemizedlist/listitem">  * <xsl:apply-templates/></xsl:template>
<xsl:template match="listitem/itemizedlist/listitem">    * <xsl:apply-templates/></xsl:template>
<xsl:template match="orderedlist/listitem">  - <xsl:apply-templates/></xsl:template>
<xsl:template match="listitem/para"><xsl:apply-templates/><xsl:text>
</xsl:text></xsl:template>

<xsl:template match="variablelist"><xsl:apply-templates/></xsl:template>

<xsl:template match="varlistentry/term">
  * **<xsl:apply-templates/>**</xsl:template>

<xsl:template match="varlistentry/listitem">
    * <xsl:apply-templates/></xsl:template>

<xsl:template match="varlistentry/para"><xsl:apply-templates/></xsl:template>

<xsl:template match="informalfigure"><xsl:apply-templates/></xsl:template>

<xsl:template match="figure">
<html><a name="{@id}"><center>&lt;/html&gt;// <xsl:value-of select="@id"/>: <xsl:value-of select="title"/> //&lt;html&gt;</center></a></html><xsl:text>
</xsl:text>
<xsl:apply-templates/></xsl:template>

<xsl:template match="figure/title"></xsl:template>

<xsl:template match="mediaobject"><xsl:apply-templates/></xsl:template>

<xsl:template match="imageobject"><xsl:apply-templates/></xsl:template>

<xsl:template match="imagedata">
<html><center><img width="{@srccredit}" src="http://buildingreputation.com/lib/exe/fetch.php?media={@fileref}"/><xsl:value-of select="."/></center></html><xsl:text>

</xsl:text>
</xsl:template>

<xsl:template match="textobject"><xsl:text>
</xsl:text></xsl:template>

<xsl:template match="table|informaltable">
<html><xsl:if test="@id"><a name="{@id}"><center>&lt;/html&gt;// <xsl:value-of select="@id"/>: <xsl:value-of select="title"/> //&lt;html&gt;</center></a></xsl:if>
<table align="center" border="1">
<xsl:apply-templates/>
</table>
</html>
</xsl:template>

<xsl:template match="table/title"></xsl:template>
<xsl:template match="tgroup"><xsl:apply-templates/></xsl:template>

<xsl:template match="thead">
<thead>
<xsl:apply-templates/>
</thead>
</xsl:template>

<xsl:template match="tbody">
<tbody>
<xsl:apply-templates/>
</tbody>
</xsl:template>

<xsl:template match="row">
<tr>
<xsl:apply-templates/>
</tr>
</xsl:template>

<xsl:template match="ulink">[[<xsl:value-of select="@url"/>|<xsl:apply-templates/>]]</xsl:template>

<xsl:template match="entry"><td align="{@align}"><xsl:apply-templates/></td></xsl:template>
<xsl:template match="entry/itemizedlist"><ul><xsl:apply-templates/></ul></xsl:template>
<xsl:template match="entry/itemizedlist/listitem"><li><xsl:apply-templates/></li></xsl:template>
<xsl:template match="entry/itemizedlist/listitem/para"><xsl:apply-templates/></xsl:template>

<xsl:template match="remark"><xsl:text>
</xsl:text><html>&lt;!-- <xsl:apply-templates/> --&gt;</html><xsl:text>
</xsl:text>
</xsl:template>

<xsl:template match="blockquote/attribution"><xsl:text>&lt;blockquote </xsl:text><xsl:apply-templates/><xsl:text>&gt;
</xsl:text></xsl:template>
<xsl:template match="blockquote/para"><xsl:apply-templates/><xsl:text>
&lt;/blockquote&gt;
</xsl:text></xsl:template>

<xsl:template match="sidebar">
<xsl:text>&lt;note tip&gt;</xsl:text><xsl:apply-templates/><xsl:text>&lt;/note&gt;
</xsl:text></xsl:template>

<xsl:template match="caution">
<xsl:text>&lt;note caution&gt;</xsl:text><xsl:apply-templates/><xsl:text>&lt;/note&gt;
</xsl:text></xsl:template>

<xsl:template match="warning">
<xsl:text>&lt;note warning&gt;</xsl:text><xsl:apply-templates/><xsl:text>&lt;/note&gt;
</xsl:text></xsl:template>

<xsl:template match="tip">
<xsl:text>&lt;note tip&gt;</xsl:text><xsl:apply-templates/><xsl:text>&lt;/note&gt;
</xsl:text></xsl:template>

<xsl:template match="note">
<xsl:text>&lt;note&gt;</xsl:text><xsl:apply-templates/><xsl:text>&lt;/note&gt;
</xsl:text></xsl:template>

<xsl:template match="important">
<xsl:text>&lt;note important&gt;</xsl:text><xsl:apply-templates/><xsl:text>&lt;/note&gt;
</xsl:text></xsl:template>

</xsl:stylesheet>

This is oversimplified. Don't use it. Really. I can't believe I'm using it, but I am in a hurry at the moment. You'll need the box dokuwiki plugin and the art for the annotation such as Tip and Caution - I've got scalable versions of the O'Reilly animal traps and tracks - don't forget those…

metaware.txt · Last modified: 2010/01/29 19:53 (external edit)
www.chimeric.de Creative Commons License Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0