2017-12-02T15:39:54 heirloom nroff -mux Author: Hello alls, Resuming my little sery of articles, I am explaining today how Utmac is linked to the XML world. Troff and Xml We all have in mind the various attempts to produce XML files from a troff document: some aim to be universal, and, dealing with the raw troff requests, can only ouptut non semantic html with hardcoded styles, while others, dedicated to a particular macro, fail to consider the raw troff requests the user may need in his document (cf. the source of ms2html, in which the author comments he is implementing more and more raw troff requests) XML files are nothing else but plain text files with semantic informations. On the other side, a troff document contains structured information which gets its meaning within the context of a macro. When we think at it, we have yet a tool which interprets a troff source within the context of a macro to produce plain text files: nroff. Could we use nroff to produce xml files ? I tried, and it appears that solution works well. The idea is simple: one only has to write a macro file, which interprets all the interface macros (paragraph, headers...), to add XML tags to the output file. For example, here is a simple macro to produce XML paragraphs and headings: .de PP . \" first, we close the previous block . \" by printing its recorded tag . if d xml-block \\*[xml-block] . \" Secondly, we define the closing tag for the block . ds xml-block </p> . \" and last, we print the openning tag. <p> .. .de H1 . if d xml-block \\*[xml-block] . rm xml-block <h1>\\$*</h1> .. Nroff has to be configured to produce a correct xml files: we do not want hyphen, lines don’t need to be adjusted, and, the page length has to be defined correctly. .\" page length is one line .pl 1v .ll 75 .\" don’t adjust nor hyphenates .na .nh .\" Ending macro is doc:end .em doc:end .\" Print header <?xml version="1.0" encoding="UTF-8"?> .\" Open the root tag <utmac> .de doc:end . \" doc:end needs some more space to output text . pl \\n(nlu+3v . \" close the previous block . if d xml-block \\*[xml-block] . \" Close the root tag. </utmac> . \" set correct page length . pl \\n(nlu .. Since the fonts are hierarchical and defined as strings in Utmac, they are easy to implement as well. .ds font-bold0 </B> .ds font-bold1 <B> .nr f-b 0 .ds B \ER’f-b 1-\En[f-b]’\E*[font-bold\En[f-b]] The only real problem of using nroff to produce xml documents is that — along with troff — it is not easy to deal with automatically inserted spaces. I tried to use .chop and \c, but without reliable results. To solve that problem and escape the possible restricted characters a user may insert in his document (’<’, ’>’, and ’&’), I wrote a small post-processor – postxml –, which translates a custom set of tags to xml special characters. Amongst those tags, a special tag removes newlines: #[ becomes < #] becomes > #( becomes & #) becomes ; \n#-\n is deleted from the stream, and is used to delete newlines. So, instead of directly writing xml tags, the nroff macro produces writes those custom tags, which are later translated by postxml. Our paragraph macro becomes: .de PP . if d xml-block \{\ . \" tag to remove unwanted newlines #- . \" closing xml tag \\*[xml-block] . \} . ds xml-block #[/pp#] . \" opening xml tag #[pp] . \" tag to remove unwanted newlines #- .. A preprocessor, prexml, escapes the possible presence of those tags in the user document. The troffxml archive, avaible on provides prexml, postxml, and a two xsl stylesheet to produce html and fodt (flat open document) files, and Utmac provides the macro ux for that purpose. So, the command to produce xml documents from a troff source is: prexml < f.tr | nroff -Tlocale -mux | postxml > f.xml xsltproc utohtml.xsl f.xml > f.html xsltproc utofodt.xsl f.xml > f.fodt Since I believe you want to have a look at the result, you will find, joined to this mail, its xml, html, and fodt versions as produced by this system (which reveals the fodt code block needs some more work...). On my next mail about Utmac, I will present you some goodies. Kind Regards, Pierre-Jean