[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-AUCTeX] 11.84; Locating image position sometimes fails for shift-ji
From: |
Ikumi Keita |
Subject: |
[Bug-AUCTeX] 11.84; Locating image position sometimes fails for shift-jis encoded document. |
Date: |
Wed, 07 Mar 2007 23:29:29 +0900 |
Remember to cover the basics. Including a minimal LaTeX example
file exhibiting the problem might help.
When editing Japanese LaTeX document encoded in shift-jis, preview-latex
sometimes fails to place images on the right place (wrong place or no
image at all). Such cases are often accompanied with error messages
like:
error in process sentinel: Invalid regexp: "Unmatched [ or [^"
and
error in process sentinel: Invalid regexp: "Trailing backslash"
etc.
How to reproduce:
Standard (La)TeX cannot deal with Japanese document encoded in
shift-jis. Japanese (La)TeX variants are necessary to investigate the
case, so I don't give an example here. Instead, I will explain the
detail below.
Background:
I start with a short summary about the shift-jis encoding (SJIS for
short). SJIS is one of the major encodings for Japanese text.
Basically, it represents one Japanese character by two bytes. Examples
of such two-byte sequence are, in hexadecimal form:
8E 82
and
81 5B
While the first byte of the sequence is always 8-bit data (MSB on), the
second byte is not necessary so. In the above two examples, the second
byte of the first example (82) is 8-bit, but the second one (5B) is
7-bit (MSB off). For historical reasons, SJIS is the standard encoding
in the Japanese Windows and Macintosh.
Problem:
The second byte of SJIS text causes a problem in the process of encoding
-> regexp-quote -> decoding. The relavant part of preview.el is in the
function preview-error-quote:
----------------------------------------------------------------------
2603 (defun preview-error-quote (string run-coding-system)
2604 "Turn STRING with potential ^^ sequences into a regexp.
2605 To preserve sanity, additional ^ prefixes are matched literally,
2606 so the character represented by ^^^ preceding extended characters
2607 will not get matched, usually."
2608 (let (output case-fold-search)
2609 (when (featurep 'mule)
2610 (setq string (encode-coding-string string run-coding-system)))
2611 (while (string-match
"\\^\\{2,\\}\\(\\(address@hidden)\\|[8-9a-f][0-9a-f]\\)"
2612 string)
2613 (setq output
2614 (concat output
2615 (regexp-quote (substring string
2616 0
2617 (- (match-beginning 1) 2)))
[...]
2631 (setq output (concat output (regexp-quote string)))
2632 (if (featurep 'mule)
2633 (decode-coding-string output
2634 (or (and (boundp
'TeX-japanese-process-output-coding-system)
2635
TeX-japanese-process-output-coding-system)
2636 buffer-file-coding-system))
2637 output)))
----------------------------------------------------------------------
On Japanese Windows, this function is called with both
run-coding-system and TeX-japanese-process-output-coding-system bound to
'shift_jis-dos, according to the setting in tex-jp.el.
On lines 2609-2610, the multibyte Japanese string is turned into byte
sequence of SJIS encoding. On lines 2615-1617 and 2631, this byte
sequence is transformed through regexp-quote. On lines 2633-2636, the
byte squence is turned back into multibyte string, assuming the sequence
is encoded in SJIS.
However, the second byte of SJIS encoding is sometimes 7-bit data and,
quite unfortunately, sometimes happens to be a meta character of regexp.
In the above example, the second byte of the sequence 81 5B is actually
the char `['. Thus, after the regexp-quote operation, the sequence no
longer turns back into the expected multibyte string, leaving `['
alone. Later, this causes an error 'Invalid regexp: "Unmatched [ or
[^"', not showing the corresponding image in the document buffer.
The second byte of SJIS of another character can be the backslash `\',
which sometimes leads to 'Invalid regexp: "Trailing backslash"'. In
another case it is `^' and the image is placed at the beginning of the
line, not at the right place in the document buffer.
The following example illustrates what is going on:
(let* ((s1 (char-to-string (make-char 'japanese-jisx0208 37 63)))
;; s1 is multybyte Japanese string.
;; Encode s1 with SJIS.
(s2 (encode-coding-string s1 'shift_jis)))
;; At this point s2 is equal to "\201[", being byte sequence of 81 5B.
(setq s2 (regexp-quote s2))
;; Now s2 is "\201\\[".
(setq s2 (decode-coding-string s2 'shift_jis))
;; Then decode back assuming SJIS encoding.
(string-equal s1 s2))
=> nil ;; no longer goes back to the original string s1.
Summary:
We cannot assume that multibyte characters are always encoded with 8-bit
byte sequences only. To cope with the encodings like SJIS, which
contains 7-bit bytes, regexp-quote must not be applied to encoded
strings. It should operate on decoded strings only.
I confirmed that after removing the encoding and decoding operation in
preview-error-quote, the error does not occur and the images are
displayed in the right place for my test case (with the help of
Japanese-capable dvips variant and ghostscript tuned to handle Japanese
postscript files correctly). Of course I understand such a rough
modification is not acceptable. I'm just illustrating one aspect of the
problem.
N.B. In the following log, Japanese texts are replaced with `*'s.
Emacs : GNU Emacs 21.4.1 (i386-mingw-nt5.0.2195)
of 2005-08-28 on CUBE
Package: 11.84
Run buffer contents:
Running `Preview-LaTeX' on `bbb' with ``platex
"\nonstopmode\nofiles\PassOptionsToPackage{active,tightpage,auctex}{preview}\AtBeginDocument{\ifx\ifPreview\undefined\RequirePackage[displaymath,floats,graphics,textmath,sections,footnotes]{preview}[2004/11/05]\fi}"
"\input" "bbb.tex"''
This is pTeX, Version 3.141592-p3.1.9 (sjis) (Web2C 7.5.5)
pLaTeX2e <2005/01/04>+0 (based on LaTeX2e <2003/12/01> patch level 0)
Babel <v3.8g> and hyphenation patterns for english, usenglishmax, ukenglish, ba
sque, bulgarian, coptic, welsh, czech, slovak, german, ngerman, danish, spanish
, catalan, estonian, finnish, french, irish, polygreek, monogreek, ancientgreek
, croatian, hungarian, interlingua, ibycus, bahasa, icelandic, italian, latin,
mongolian, dutch, norsk, polish, portuguese, pinyin, romanian, russian, samin,
slovene, usorbian, serbian, swedish, turkish, ukrainian, dumylang, nohyphenatio
n, loaded.
No auxiliary output files.
(./bbb.tex (c:/usr/share/texmf/ptex/platex/base/jarticle.cls
Document Class: jarticle 2002/04/09 v1.4 Standard pLaTeX class
(c:/usr/share/texmf/ptex/platex/base/jsize10.clo))
No file bbb.aux.
(c:/usr/share/texmf/tex/latex/preview/preview.sty
(c:/usr/share/texmf/tex/latex/preview/prtightpage.def)
(c:/usr/share/texmf/tex/latex/preview/prauctex.def
No auxiliary output files.
(c:/usr/share/texmf/tex/latex/preview/prauctex.cfg))
(c:/usr/share/texmf/tex/latex/preview/prfootnotes.def)
Preview: Fontsize 10pt
)
! Preview: Snippet 1 started.
<-><->
l.3 ************************ \(
l\)
Preview: Tightpage -32891 -32891 32891 32891
! Preview: Snippet 1 ended.(455111+0x208442).
<-><->
l.3 ************************ \(l\)
[1] )
(see the transcript file for additional information)
Output written on bbb.dvi (1 page, 1584 bytes).
Transcript written on bbb.log.
Preview-LaTeX exited as expected with code 1 at Wed Mar 07 17:36:09
Running `Preview-DviPS' with ``dvipsk -Pdl "bbb.dvi" -o
"bbb.prv/tmp440Xaa"/preview.ps''
Preview-DviPS unknown at Wed Mar 07 17:36:09
LaTeX: Invalid regexp: "Unmatched [ or [^"
current state:
==============
Output from running `GSWIN32C.EXE -h':
GNU Ghostscript 7.07 (2003-05-17)
Copyright (C) 2003 artofcode LLC, Benicia, CA. All rights reserved.
Usage: gs [switches] [file1.ps file2.ps ...]
Most frequently used switches: (you can use # in place of =)
-dNOPAUSE no pause after page | -q `quiet', fewer messages
-g<width>x<height> page size in pixels | -r<res> pixels/inch resolution
-sDEVICE=<devname> select device | -dBATCH exit after last file
-sOutputFile=<file> select output file: - for stdout, |command for pipe,
embed %d or %ld for page #
Input formats: PostScript PostScriptLevel1 PostScriptLevel2 PDF
Default output device: display
Available devices:
bbox bit bitcmyk bitrgb bj10e bj200 bjc600 bjc800 bmp16 bmp16m bmp256
bmpgray bmpmono cdeskjet cdj550 cdjcolor cdjmono declj250 deskjet display
djet500 djet500c eps9high eps9mid epson epsonc epswrite ibmpro ijs
jetp3852 jpeg jpeggray laserjet lbp8 lj250 ljet2p ljet3 ljet3d ljet4
ljet4d ljetplus m8510 mswindll mswinpr2 necp6 nullpage pbm pbmraw pcx16
pcx24b pcx256 pcxcmyk pcxgray pcxmono pdfwrite pgm pgmraw pgnm pgnmraw pj
pjxl pjxl300 png16 png16m png256 pngalpha pnggray pngmono pnm pnmraw ppm
ppmraw psmono pswrite pxlcolor pxlmono r4081 st800 stcolor t4693d2
t4693d4 t4693d8 tek4696 tiff12nc tiff24nc tiffcrle tiffg3 tiffg32d tiffg4
tifflzw tiffpack uniprint
Search path:
. ; C:\gs\gs7.07\lib ; C:\gs\gs7.07\kanji ; C:\gs\fonts ;
c:/gs/gs7.07/lib ; c:/gs/gs7.07/kanji ; c:/gs/fonts ; c:/winnt/fonts ;
c:/usr/sysfonts ; c:/windows/fonts ; c:/winnt35/fonts
For more information, see c:/gs/gs7.07/doc/Use.htm.
Report bugs to address@hidden, using the form in Bug-form.htm.
(setq
AUC-TeX-version "11.84"
LaTeX-command-style '(("^j-\\(article\\|report\\|book\\)$" "%(PDF)jlatex
%S%(PDFout)")
("^[jt]s?\\(article\\|report\\|book\\)$"
"%(PDF)platex %S%(PDFout)")
("" "%(PDF)%(latex) %S%(PDFout)"))
image-types '(YUV YCbCrA YCbCr Y XWD XV xpm XCF XC xbm X WPG WMZ WMFWIN32 WMF
WBMP VST
VIFF VID VICAR VDA UYVY UIL TXT TTF TTC TIM TILE tiff TIF TGA
TEXT SVGZ
SVG SUN STEGANO SHTML SGI SFW SCT SCR RLE RLA RGBO RGBA RGB RAS
R PWP
PTIF PSD PS3 PS2 postscript PREVIEW PPM PNM PNG8 PNG32 PNG24 png
PLASMA
PJPEG PIX PICT PICON PGX PGM PFB PFA PDF PDB PCX PCT PCL PCDS
PCD pbm
PATTERN PALM PAL P7 OTB O NULL MVG MTV MSL MPG MPEG MPC MONO MNG
MIFF
MATTE MAT MAP M2V M LABEL K JPX JPG jpeg JPC JP2 JNG JBIG JBG
INFO ICON
ICO ICB HTML HTM HISTOGRAM HDF GRAY GRADIENT GIF87 gif G3 G
FRACTAL FPX
FITS FAX EPT3 EPT2 EPT EPSI EPSF EPS3 EPS2 EPS EPI EPDF EMF DPX
DPS DOT
DNG DCX DCM CUT CUR CMYKA CMYK CLIPBOARD CLIP CIP CIN CAPTION
CACHE C
BMP3 BMP2 BMP BIE B AVS AVI ART A bmp)
preview-image-type 'png
preview-image-creators '((dvipng (open preview-gs-open
preview-dvipng-process-setup)
(place preview-gs-place) (close
preview-dvipng-close))
(png (open preview-gs-open) (place preview-gs-place)
(close preview-gs-close))
(jpeg (open preview-gs-open) (place preview-gs-place)
(close preview-gs-close))
(pnm (open preview-gs-open) (place preview-gs-place)
(close preview-gs-close))
(tiff (open preview-gs-open) (place preview-gs-place)
(close preview-gs-close))
)
preview-dvipng-image-type 'png
preview-dvipng-command "dvipng -picky -noghostscript %d -o
\"%m/prev%%03d.png\""
preview-pdf2dsc-command "pdf2dsc %s.pdf %m/preview.dsc"
preview-gs-command "GSWIN32C.EXE"
preview-gs-options '("-q" "-dSAFER" "-dNOPAUSE" "-DNOPLATFONTS" "-dPrinted"
"-dTextAlphaBits=4" "-dGraphicsAlphaBits=4" "-dWINKANJI")
preview-gs-image-type-alist '((png png "-sDEVICE=png16m")
(dvipng png "-sDEVICE=png16m")
(jpeg jpeg "-sDEVICE=jpeg") (pnm pbm
"-sDEVICE=pnmraw")
(tiff tiff "-sDEVICE=tiff12nc"))
preview-fast-conversion t
preview-prefer-TeX-bb nil
preview-dvips-command "dvipsk -Pdl -i -E %d -o %m/preview.000"
preview-fast-dvips-command "dvipsk -Pdl %d -o %m/preview.ps"
preview-scale-function 'preview-scale-from-face
preview-LaTeX-command '("%`%l \"\\nonstopmode\\nofiles\\PassOptionsToPackage{"
("," . preview-required-option-list)
"}{preview}\\AtBeginDocument{\\ifx\\ifPreview\\undefined"
preview-default-preamble "\\fi}\"%' %t")
preview-required-option-list '("active" "tightpage" "auctex"
(preview-preserve-counters "counters"))
preview-preserve-counters nil
preview-default-option-list '("displaymath" "floats" "graphics" "textmath"
"sections"
"footnotes")
preview-default-preamble '("\\RequirePackage[" ("," .
preview-default-option-list)
"]{preview}[2004/11/05]")
preview-LaTeX-command-replacements nil
preview-dump-replacements '(preview-LaTeX-command-replacements
("\\`\\([^ ]+\\)\\(\\( +-\\([^
\\\\\"]\\|\\\\\\.\\|\"[^\"]*\"\\)*\\)*\\)\\(.*\\)\\'" "\\1 -ini
-interaction=nonstopmode \"&\\1\" " preview-format-name ".ini \\5")
)
preview-undump-replacements '(("\\`\\([^ ]+\\) .*? \"\\\\input\" \\(.*\\)\\'"
"\\1 -interaction=nonstopmode \"&"
preview-format-name
"\" \\2")
)
preview-auto-cache-preamble 'ask
preview-TeX-style-dir nil
)
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Bug-AUCTeX] 11.84; Locating image position sometimes fails for shift-jis encoded document.,
Ikumi Keita <=