emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

why unrmail fails with raw-text on version 22 [WAS: Re: help needed with


From: Mark Lillibridge
Subject: why unrmail fails with raw-text on version 22 [WAS: Re: help needed with coding systems (unrmail problems)]
Date: Sat, 08 Jan 2011 21:52:27 -0800

    Ok, I think I now understand why unrmail fails with raw-text*.
Unrmail reads and decodes BABYL files in a subtly different way than
Rmail does.  In particular, it does:

  (with-temp-buffer
    ;; Read in the old Rmail file with no decoding.
    (let ((coding-system-for-read 'raw-text))
      (insert-file-contents file))
    ;; But make it multibyte.
    (set-buffer-multibyte t)
    (setq buffer-file-coding-system 'raw-text-unix)

Not obvious, but important: with-temp-buffer creates a multibyte buffer
so that insert-file-contents is decoding from raw-text to a multibyte
buffer, producing raw 8-bit bytes for x80-xff.  The
(set-buffer-multibyte t) here is a no-op as far as I can tell as the
buffer is already multibyte at that point.


    It then decodes the middle part as Rmail does:

      (unless (and coding-system
                   (coding-system-p coding-system))
        (setq coding-system
              ;; Emacs 21.1 and later writes RMAIL files in emacs-mule, but
              ;; earlier versions did that with the current buffer's encoding.
              ;; So we want to favor detection of emacs-mule (whose normal
              ;; priority is quite low), but still allow detection of other
              ;; encodings if emacs-mule won't fit.  The call to
              ;; detect-coding-with-priority below achieves that.
              (car (detect-coding-with-priority
                    from to
                    '((coding-category-emacs-mule . emacs-mule))))))
      (unless (memq coding-system
                    '(undecided undecided-unix))
        (set-buffer-modified-p t)       ; avoid locking when decoding
        (let ((buffer-undo-list t))
          (decode-coding-region from to coding-system))
        (setq coding-system last-coding-system-used))


    So, Rmail is doing read unibyte, decode, then convert to multibyte
while unrmail does read multibyte then decode.  This produces the same
results for all coding systems except raw-text*.  The reason is that
read raw-text unibyte then convert to multibyte produces a different
result than reading raw-text directly to multibyte!  The later produces
raw bytes while the former produces code points.  Needless to say,
trying to encode from raw bytes instead of code points gives different
results.


    My testing so far shows that this problem can be fixed for version
22 by switching to the Rmail way of doing things (e.g., read unibyte and
only convert to multibyte at the end).  A more complicated solution will
be needed for version 23.  Should I produce a patch for version 22 given
that it will not work for version 23?

- Mark



reply via email to

[Prev in Thread] Current Thread [Next in Thread]