emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: decode-coding-string gone awry?


From: David Kastrup
Subject: Re: decode-coding-string gone awry?
Date: Mon, 14 Feb 2005 21:09:46 +0100
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (gnu/linux)

Stefan Monnier <address@hidden> writes:

>> Give me a clue: what happens if a process inserts stuff with
>> 'raw-text encoding into a multibyte buffer?  'raw-text is a
>> reconstructible encoding, isn't it, so the stuff will get converted
>> into some prefix byte indicating "isolated single-byte entity
>> instead of utf-8 char" and the byte itself or something, right?
>> And decode-encoding-string does not want to work on something like
>> that?
>
> If you want accented chars to appear as accented chars in the
> (process) buffer (i.e. you don't want to change the AUCTeX part),
> then raw-text is not an option anyway.

Yes, I figured as much.  I should better explain what I am doing in
the first place.  AUCTeX does the basic management of the buffer,
creating it, associating processes with it, making a filter routine
for it that inserts the strings after some scanning for keyphrases and
so on.

preview-latex uses all of this folderol, but turns the process output
encoding of its own processes to raw text.  This is something that
AUCTeX does _not_ yet do for its own processes.  AUCTeX's own
process output is more likely to be viewed by the user, anyway.  We
can't hope to get a really readable UTF-8 display for AUCTeX's own
processes at the moment, but AUCTeX's behavior right now leads to
user-readable output in all current cases _except_ when TeX thinks it
is in some Latin-1 locale while working on utf-8 input.

Now with the AUCTeX processes, user readability is the most important
thing.  If AUCTeX can't locate the buffer position exactly, it will at
least locate the line, and that's tolerable for all practical
purposes.

With preview-latex, it is not tolerable.  On the other hand, the
output from preview-latex processes is usually not shown to the user
at all: having an unreadable output buffer due to raw-text encoding is
quite ok.

So that is basically the background why we can easily make the process
raw-text, but quite less easily make the buffer unibyte: AUCTeX will
use the same buffer for its next run, just erasing it, and if it has
turned unibyte, we get into trouble.

> If you don't mind about accented chars appearing as \NNN, then you
> can make the buffer unibyte and use `raw-text' as the process's
> output coding-system.  That's the more robust approach.

If the accented chars (in fact, the whole upper 8bit page) appeared as
\NNN, this would actually mostly be a _win_ over the current situation
where we not too rarely get a mixture of raw bytes and nonsense
characters.  However, I am afraid that this is not quite possible
right now.

We are now in the process of preparing the last major standalone
release of preview-latex.  After that, it will get folded into AUCTeX,
and we will streamline the whole junk.  But in the next weeks, I still
want to get out a preview-latex that works with the current AUCTeX
releases and vice versa.

After that, we will probably make the process encoding raw-text for
the _whole_ of AUCTeX and use a CCL-Program for preprocessing the ^^
sequences into bytecodes again, essentially creating an efficient
artificial illusion of a TeX outputting sane error messages in all
surroundings.

> If that option is out (i.e. you have to use a multibyte buffer),
> you'll have to basically recover the original byte-sequence by
> replacing the
>
>    (regexp-quote (substring string 0 (match-beginning 1)))
>
> with
>
>    (regexp-quote (encode-coding-string
>                   (substring string 0 (match-beginning 1))
>                   buffer-file-coding-system))
>
> [assuming buffer-file-coding-system is the process's output
> coding-system]

The process output coding system being raw-text.  Do I really need to
actually encode raw-text?

>    (regexp-quote (string-make-unibyte
>                   (substring string 0 (match-beginning 1))))
>
> which is basically equivalent except that you lose control over
> which coding-system is used.

I have to admit to being befuddled.  I'll probably have to experiment
until I find something that works and cross fingers.  I don't think I
have much of a chance to actually understand all of the involved
intricacies.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum




reply via email to

[Prev in Thread] Current Thread [Next in Thread]