Archives: A Conversion From DocBook SGML 3.x

I’ve added an undergraduate AI project I wrote called A Study of Selected Algorithms Against the Sherlock and Zebra Problems. It was an interesting exercise as the original report was written in DocBook, but an older version (3.x; the SGML variant) than is readily available nowadays.

The document required some work to achieve conversion for this site1. There were, however, only a few ‘patterns’ that needed to be fixed throughout the document.

  1. The document had been split up into smaller files using entities. The first step was was to replace that approach which a single document.
  2. Most SGML entities needed to be replaced with the appropriate UTF-8 characters. For instance ≠ needed to replaced by ≠.
  3. Citations with linked cross references were not readily available with the tools at hand so I replaced them with simple (non-hyperlinked) citations (and updated the bibliography so match the citation notation).
  4. Code blocks no longer use the <literallayout> tags and now use <programlisting> tags.
  5. There were a few syntax errors.
  6. Added the DocBook 4.1.2 DTD and made the file a ‘.xml’ file.

Once those changes were fixed, all that was required was to use Pandoc to convert to ‘GitHub Flavoured Markdown’ using the command line:

pandoc --from docbook --to gfm --output cis4570-project.md ReportFull.xml

  1. This site primarily uses CommonMark, which attempts to apply rigour to the original Markdown specification and related variants). ↩︎