guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sxml simple, sxml->xml and namespaces


From: Andy Wingo
Subject: Re: sxml simple, sxml->xml and namespaces
Date: Mon, 20 Jun 2016 10:56:26 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux)

Greetings gentle Guiler,

Apologies for the long delay here.  I'm with you regarding namespaces
and sxml->xml.  In the past I made sure to always get the namespaces
attached to the root element via the @ xmlns attributes, and then have
namespaced uses just be local names, not qnames, and that way sxml->xml
works fine.  But, perhaps that doesn't cover all cases in a nice way.
Do you still have thoughts on this patch?  Is the right thing for you?
In any case we need better documentation in the manual about how to deal
with namespaces and SXML, in practice, with examples.

Regards,

Andy

On Wed 08 Apr 2015 22:55, <address@hidden> writes:

> Gentle guile folks,
>
> I'm playing around with (sxml simple) and stumbled upon something
> I think might be a bug. Consider the following snippet:
>
>   #!/usr/bin/guile -s
>   !#
>   (use-modules (sxml simple))
>   
>   ;; An XML with two namespaces (one default)
>   (define the-svg "<svg xmlns='http://www.w3.org/2000/svg'
>        xmlns:xlink='http://www.w3.org/1999/xlink'>
>     <rect x='5' y='5' width='20' height='20'
>           stroke-width='2' stroke='purple' fill='yellow'
>           id='rect1' />
>     <rect x='30' y='5' width='20' height='20'
>           ry='5' rx='8' stroke-width='2' stroke='purple' fill='blue'
>           xlink:href='#rect1' />
>   </svg>")
>   
>   ;; Note how SXML handles QNames (just concatenating NS and
>   ;; local-name with a colon):
>   (define the-sxml
>     (with-input-from-string the-svg xml->sxml))
>   (format #t "~A\n" the-sxml)
>   
>   ;; If we try to serialize this: kaboom!
>   (sxml->xml the-sxml)
>   
> The parsing into SXML goes well, the (format ...) outputs what
> I'd expect. But the (sxml->xml ...) dies with:
>
>   ERROR: In procedure scm-error:
>   ERROR: Invalid QName: more than one colon http://www.w3.org/2000/svg:svg
>
> I had a look at sxml simple and think the problem is that the
> function check-name (which is the one throwing the error) expects
> the name to be a QName (i.e. either a Name or a namespace abbreviation
> plus a colon plus a Name).
>
> But SXML tacks the whole namespaces to names (i.e. the whole
> "http://www.w3.org/1999/xlink";, for example -- not the "xlink").
>
> When serializing to XML, we should go the way back, finding abbreviations
> for the namespaces used, prefixing the names with those abbreviations
> and issuing namespace declarations for those abbreviations (those funny
> xmlns:foo attributes).
>
> I've tried my hand at a patch which "works for me". Basically, what it
> does is to thread an extra parameter "nsmap", representing a mapping
> (namespace -> ns-abbreviation) valid at "this" position and below in
> the tree. When new, unseen namespaces come up, new abbreviations are
> "invented" (ns-abbrev-new), collected and the corresponding declarations
> printed. When recursing to sub-elements, the new mappings are added to
> the nsmap passed down.
>
> The result after the patch for the above example (a bit embellished)
> looks like this:
>
>   <ns1:svg xmlns:ns1="http://www.w3.org/2000/svg";>
>     <ns1:rect y="5" x="5" width="20" stroke-width="2"
>               stroke="purple" id="rect1" height="20" fill="yellow" />
>     <ns1:rect ns2:href="#rect1" y="5" x="30" width="20" stroke-width="2"
>               stroke="purple" ry="5" rx="8" height="20" fill="blue"
>               xmlns:ns2="http://www.w3.org/1999/xlink"; />
>   </ns1:svg>
>   
> Pretty clumsy, but basically correct.
>
> The attached patch is against "GNU Guile 2.0.5-deb+1-3". The relevant
> code hasn't changed up to the current development version.
>
> I'm not very happy with the patch as-is. Among other things,
>
>  - I had a hard time doing what I wanted in a non-clumsy way.
>    Especially, ns-abbr is a strange function and not very clear
>    because it tries to do several things at once: replace the
>    namespace by its abbreviation, signal a new mapping item
>    whenever this abbreviation was new. But how to achieve this
>    elegantly without doing several look-ups?
>
>  - The namespace declarations are tacked at the end of the attribute
>    list. This is plain opportunism: the tag may carry a namespace,
>    and each of the attribute names too. Thus, it's very handy to
>    collect all the unseen mappings (new-namespaces in element->xml)
>    and output them at the end of the attribute list.
>
>    But in XML it is usual to put the namespace declarations before
>    the attributes (the "canonical" XML order even prescribes that).
>
>  - The sxml code is pretty careful to not munge around too much
>    with strings, but to output things ASAP to the port. I think
>    I might be a bit more careful in that department.
>
>  - In other XML libraries the user gets a choice on preferred
>    namespace mappings (e.g. I'd like http://www.w3.org/2000/svg
>    to be the default namespace -- or http://www.w3.org/1999/xlink
>    to be abbreviated as 'xlink'). This could be achieved by
>    passing a function as an optional parameter which gets a try
>    at a new namespace before ns-abbr-new gets at it.
>
> I'd be happy to prepare a patch against whatever version makes
> sense once we get some consensus on how to do it right.
>
> Thanks & regards
> -- tomás
>
> --- /usr/share/guile/2.0/sxml/simple.scm      2012-03-18 20:16:21.000000000 
> +0100
> +++ /home/tomas/lib/guile/sxml/simple.scm     2015-04-08 22:29:30.049277842 
> +0200
> @@ -37,29 +37,38 @@
>  argument, @var{port}, which defaults to the current input port."
>    (ssax:xml->sxml port '()))
>  
> -(define check-name
> -  (let ((*good-cache* (make-hash-table)))
> -    (lambda (name)
> -      (if (not (hashq-ref *good-cache* name))
> -          (let* ((str (symbol->string name))
> -                 (i (string-index str #\:))
> -                 (head (or (and i (substring str 0 i)) str))
> -                 (tail (and i (substring str (1+ i)))))
> -            (and i (string-index (substring str (1+ i)) #\:)
> -                 (error "Invalid QName: more than one colon" name))
> -            (for-each
> -             (lambda (s)
> -               (and s
> -                    (or (char-alphabetic? (string-ref s 0))
> -                        (eq? (string-ref s 0) #\_)
> -                        (error "Invalid name starting character" s name))
> -                    (string-for-each
> -                     (lambda (c)
> -                       (or (char-alphabetic? c) (string-index 
> "0123456789.-_" c)
> -                           (error "Invalid name character" c s name)))
> -                     s)))
> -             (list head tail))
> -            (hashq-set! *good-cache* name #t))))))
> +(define (ns-lookup ns nsmap)
> +  "Look up namespace ns in nsmap. Return its abbreviation or #f"
> +  (assoc-ref nsmap ns))
> +
> +(define ns-abbr-new
> +  (let ((*nscounter* 0))
> +    (lambda ()
> +      (set! *nscounter* (1+ *nscounter*))
> +      (string-append "ns" (number->string *nscounter*)))))
> +
> +(define (ns-abbr name nsmap)
> +  "Takes a QName, SXML style (i.e a symbol whose string value is either a
> +clean local name or a colon-concatenated pair of namespace:name, and returns
> +a list whose car is a string <nsabbrev>:<local-name> and which has as cdr
> +a pair (<namespace> . nsabbrev) whenever <namespace> wasn't found in nsmap"
> +  ;; FIXME check for empty ns (e.g ":foo")
> +  ;; check (worse!) for empty locname (e.g. "foo:")
> +  (let* ((str (symbol->string name))
> +         (i (string-rindex str #\:))
> +         (ns (and i (substring str 0 i)))
> +         (locname (or (and i (substring str (1+ i))) str)))
> +    (if ns
> +        (let ((nsabbr (ns-lookup ns nsmap)))
> +          (if nsabbr
> +              ;; known namespace:
> +              (list (string-append nsabbr ":" locname))
> +              ;; unknown namespace
> +              (let ((nsabbr (ns-abbr-new)))
> +                (list (string-append nsabbr ":" locname)
> +                      (cons ns nsabbr)))))
> +        ;; empty namespace: clean local-name:
> +        (list locname))))
>  
>  ;; The following two functions serialize tags and attributes. They are
>  ;; being used in the node handlers for the post-order function, see
> @@ -82,42 +91,58 @@
>       port))))
>  
>  (define (attribute->xml attr value port)
> -  (check-name attr)
>    (display attr port)
>    (display "=\"" port)
>    (attribute-value->xml value port)
>    (display #\" port))
>  
> -(define (element->xml tag attrs body port)
> -  (check-name tag)
> -  (display #\< port)
> -  (display tag port)
> -  (if attrs
> -      (let lp ((attrs attrs))
> -        (if (pair? attrs)
> -            (let ((attr (car attrs)))
> -              (display #\space port)
> -              (if (pair? attr)
> -                  (attribute->xml (car attr) (cdr attr) port)
> -                  (error "bad attribute" tag attr))
> -              (lp (cdr attrs)))
> -            (if (not (null? attrs))
> -                (error "bad attributes" tag attrs)))))
> -  (if (pair? body)
> -      (begin
> -        (display #\> port)
> -        (let lp ((body body))
> -          (cond
> -           ((pair? body)
> -            (sxml->xml (car body) port)
> -            (lp (cdr body)))
> -           ((null? body)
> -            (display "</" port)
> -            (display tag port)
> -            (display ">" port))
> -           (else
> -            (error "bad element body" tag body)))))
> -      (display " />" port)))
> +(define (element->xml tag attrs body port nsmap)
> +  (let* ((ab (ns-abbr tag  nsmap))
> +         (abname (car ab))
> +         (new-namespaces (cdr ab)))
> +    (display #\< port)
> +    (display abname port)
> +    (if attrs
> +        (let lp ((attrs attrs))
> +          (if (pair? attrs)
> +              (let ((attr (car attrs)))
> +                (display #\space port)
> +                (if (pair? attr)
> +                    (let* ((ab (ns-abbr (car attr) nsmap))
> +                           (abname (car ab))
> +                           (nsplus (cdr ab)))
> +                      (unless (null? nsplus)
> +                        (set! new-namespaces
> +                              (cons (car nsplus) new-namespaces)))
> +                      (attribute->xml abname (cdr attr) port))
> +                    (error "bad attribute" tag attr))
> +                (lp (cdr attrs)))
> +              (if (not (null? attrs))
> +                  (error "bad attributes" tag attrs)))))
> +    ;; Output namespace declarations
> +    (let lp ((new-namespaces new-namespaces))
> +      (unless (null? new-namespaces)
> +        ;; remember: car is namespace, cdr is abbrev
> +        (let ((ns (caar new-namespaces))
> +              (nsabbr (cdar new-namespaces)))
> +          (display #\space port)
> +          (attribute->xml (string-append "xmlns:" nsabbr) ns port))
> +        (lp (cdr new-namespaces))))
> +    (if (pair? body)
> +        (begin
> +          (display #\> port)
> +          (let lp ((body body))
> +            (cond
> +             ((pair? body)
> +              (sxml->xml (car body) port (append new-namespaces nsmap))
> +              (lp (cdr body)))
> +             ((null? body)
> +              (display "</" port)
> +              (display abname port)
> +              (display ">" port))
> +             (else
> +              (error "bad element body" tag body)))))
> +        (display " />" port))))
>  
>  ;; FIXME: ensure name is valid
>  (define (entity->xml name port)
> @@ -133,7 +158,8 @@
>    (display str port)
>    (display "?>" port))
>  
> -(define* (sxml->xml tree #:optional (port (current-output-port)))
> +(define* (sxml->xml tree #:optional (port (current-output-port))
> +                    (nsmap '()))
>    "Serialize the sxml tree @var{tree} as XML. The output will be written
>  to the current output port, unless the optional argument @var{port} is
>  present."
> @@ -144,7 +170,7 @@
>          (let ((tag (car tree)))
>            (case tag
>              ((*TOP*)
> -             (sxml->xml (cdr tree) port))
> +             (sxml->xml (cdr tree) port nsmap))
>              ((*ENTITY*)
>               (if (and (list? (cdr tree)) (= (length (cdr tree)) 1))
>                   (entity->xml (cadr tree) port)
> @@ -158,9 +184,9 @@
>                      (attrs (and (pair? elems) (pair? (car elems))
>                                  (eq? '@ (caar elems))
>                                  (cdar elems))))
> -               (element->xml tag attrs (if attrs (cdr elems) elems) port)))))
> +               (element->xml tag attrs (if attrs (cdr elems) elems) port 
> nsmap)))))
>          ;; A nodelist.
> -        (for-each (lambda (x) (sxml->xml x port)) tree)))
> +        (for-each (lambda (x) (sxml->xml x port nsmap)) tree)))
>     ((string? tree)
>      (string->escaped-xml tree port))
>     ((null? tree) *unspecified*)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]