emacs-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#20140: closed (24.4; M17n shaper output rejected)


From: GNU bug Tracking System
Subject: bug#20140: closed (24.4; M17n shaper output rejected)
Date: Wed, 16 Feb 2022 15:14:01 +0000

Your message dated Wed, 16 Feb 2022 17:13:56 +0200
with message-id <83y22a26a3.fsf@gnu.org>
and subject line Re: bug#20140: 24.4; M17n shaper output rejected
has caused the debbugs.gnu.org bug report #20140,
regarding 24.4; M17n shaper output rejected
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs@gnu.org.)


-- 
20140: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=20140
GNU Bug Tracking System
Contact help-debbugs@gnu.org with problems
--- Begin Message --- Subject: 24.4; M17n shaper output rejected Date: Wed, 18 Mar 2015 22:20:40 +0000
I am running Emacs 24.4 in a Ubuntu 12.04 Precise Pangolin
installation, for which the version of libm17n-0 is 1.6.3-1.  I am
attempting to induce Emacs to render the Tai Tham script.  There
appears to be a bug/feature in Emacs which makes this unnecessarily
difficult.

To achieve Tai Tham rendering, I added the following in new, loaded file
tai-tham.el:

(defvar tai-tham-composable-pattern
  (let ((table
         ;; C is letters, independent vowels, digits, punctuation and
symbols. '(("C" .
"[\u1A20-\u1A54\u1A80-\u1A89\u1A90-\u1A99\u1AA0-\u1AAD]") ("M" .
"[\u1A55-\u1A5E\u1A61-\u1A7C\u1A7F]"); Mark ("S" . "[\u1A75-\u1A7C]") ;
Marks commuting with sakot ("H" . "\u1A60") ; sakot
           ("N" . "\u1A58"))) ; mai kang lai - also included in M.
;; Which orthographic syllable mai kang lai belongs to can depend on
the font! (regexp "C\\(M\\|HS*C?\\)*\\(NC\\(M\\|HS*C?\\)*\\)*N?"))
    (let ((case-fold-search nil))
      (dolist (elt table)
        (setq regexp (replace-regexp-in-string (car elt) (cdr elt)
                                               regexp t t))))
    regexp))

(let ((elt (list (vector tai-tham-composable-pattern 0
'font-shape-gstring) (vector "." 0 'font-shape-gstring)
                 )))
  (set-char-table-range composition-function-table '(#x1A20 . #x1AAD)
  elt))

I added the following (cut-down) file LANA-OFT.flt to the m17n database:

(font layouter lana-otf nil
      (font (nil nil unicode-bmp :otf=lana)))
(category
 ;; H: SAKOT
 ;; N: Other character with non-zero canonical combining class
 ;; Z: Character with ccc=0 or other with ccc=9 
 (0x0000 0x1A5F ?Z)bug-gnu-emacs@gnu.org
 (0x1A60        ?H)
 (0x1A61 0x1A74 ?Z)
 (0x1A75 0x1A7C ?N)
 (0x1A7D 0xFFFF ?Z)
)

(generator
  (0
    (cond
      ("(H)(N+)" (2 = *) (1 =))
      ("." =)
    ) *
  )
)

(category
 ;; C: Consonant and non-mark (lenient processing)
 ;; H: SAKOT
 ;; P: Preposed vowelbug-gnu-emacs@gnu.org
 ;; R: Medial RA (preposed dependent consonant)
 ;; M: Mark
 (0x1A20 0x1A54 ?C)
 (0x1A55 0x1A55 ?R)
 (0x1A56 0x1A5E ?M)
 (0x1A5F        ?C) ; Unassigned
 (0x1A60        ?H)
 (0x1A61 0x1A6D ?M)
 (0x1A6E 0x1A72 ?P)
 (0x1A73 0x1A7C ?M)
 (0x1A7D 0x1A7E ?C) ; Unassigned
 (0x1A7F        ?M)
 (0x1A80 0x1A89 ?C)
 (0x1A8A 0x1A8F ?C) ; Unassigned
 (0x1A90 0x1A99 ?C)
 (0x1A9A 0x1A9F ?C) ; Unassigned
 (0x1AA0 0x1AAC ?C) ; Punctuation
 (0x1AAD        ?C) ; Can take a vowel!
 (0x1AAE 0x1AAF ?C) ; Unassigned
)

(generator
  (0
    (cond
      ("(C)(R|P)" (2 =) (1 =) )
      ("." =)
    )*
  )
)

(generator (0 otf:lana))

However, much Tai Tham text failed to render properly.  To determine
what was wrong, I added some monitoring code to ftfont.c:

*** ftfont.c.orig       2014-03-21 05:34:40.000000000 +0000
--- ftfont.c    2015-03-18 19:47:30.032718995 +0000
***************
*** 2516,2522 ****
--- 2516,2553 ----
      flt = mflt_get (msymbol ("combining"));
    for (i = 0; i < 3; i++)
      {
+       int k;
+       fprintf(stdout, "mflt_run(");
+       if (gstring.glyphs[0].encoded) {
+       for (k = 0; k < len; k++) {
+         fprintf(stdout, " %d", gstring.glyphs[k].code);
+       }
+       } else {
+       for (k = 0; k < len; k++) {
+         fprintf(stdout, " %4.4X", gstring.glyphs[k].c);
+       }
+       }
        int result = mflt_run (&gstring, 0, len, &flt_font_ft.flt_font,
flt);
+       if (-1 == result) {
+       fprintf(stdout, ") failed.\n");
+       } else if (result >= 0) {
+       fprintf(stdout, ") produced (");
+       for (k = 0; k < result; k++) {
+ #if 0
+         fprintf(stdout, " %d", gstring.glyphs[k].code);
+ #else
+         fprintf(stdout, " %4.4X>%d:%d:%d",
+                 gstring.glyphs[k].c, gstring.glyphs[k].code,
+                 gstring.glyphs[k].from, gstring.glyphs[k].to);
+ #endif
+       }
+       fprintf(stdout, ")\n");
+       if (result != gstring.used) {
+         fprintf(stdout, "Anomalously, gstring.used = %d\n",
+                 (int) gstring.used);
+       }
+       fflush(0);
+       }
        if (result != -2)
        break;
        if (INT_MAX / 2 < gstring.allocated)

The sample Tai Tham text was:
;; ᩈᩣᩴᩁᩢ᩠ᨷᨽᩣᩈᩣᩃ᩶ᩣ᩠ᨶᨶᩣ / ᨣᩣᩴᨾᩮᩬᩥᨦ - ᩈᩢᨬ᩠ᨬᩣ ᨠ᩠᩵ᨷ ᩃ᩠᩶ᨯ ᨮ᩠ ᨳᩫ᩠᩵ᨶ
ᨠᩢ᩠᩵ᨷᨠᩫ᩠᩶ᨯᨿᩥ᩠ᨷᨶᩦ᩠᩵ᨷ
;; ᨣᩕ   ᨲᩱ

I extract and analyse what was rendered as shaped ('accepted') and what
was not ('rejected'), quoting the monitoring output.  I suspect the
problem is the strict testing of the from and to fields in Lisp function
font-shape-gstring, which is defined in file font.c.

The shaping of the following was accepted:
mflt_run( 1A48 1A63 1A74) produced ( 1A48>820:0:0 1A63>858:1:1 1A74>878:2:2)

mflt_run( 1A41 1A62 1A60 1A37) produced ( 1A41>813:0:1 1A62>853:0:1
0000>953:2:3)

mflt_run( 1A3D 1A63) produced ( 1A3D>808:0:0 1A63>858:1:1)

mflt_run( 1A48 1A63) produced ( 1A48>820:0:0 1A63>858:1:1)

mflt_run( 1A43 1A76 1A63 1A60 1A36) produced ( 1A43>815:0:1
1A76>890:0:1 1A63>858:2:4 0000>952:2:4) 

mflt_run( 1A36 1A63) produced ( 1A36>800:0:0 1A63>858:1:1)

mflt_run( 1A23 1A63 1A74) produced ( 1A23>777:0:0 0000>859:1:2)

mflt_run( 1A26) produced ( 1A26>780:0:0)

mflt_run( 1A48 1A62) produced ( 1A48>820:0:1 1A62>853:0:1)

mflt_run( 1A2C 1A60 1A2C 1A63) produced ( 0000>789:0:2 1A63>858:3:3)

mflt_run( 1A43 1A60 1A76 1A2F) produced ( 1A43>815:0:3 1A76>890:0:3
0000>941:0:3) 

mflt_run( 1A2E 1A60) produced ( 1A2E>792:0:1 1A60>851:0:1)

mflt_run( 1A33 1A6B 1A60 1A75 1A36) produced ( 1A33>797:0:4
1A6B>868:0:4 1A75>889:0:4 0000>952:0:4) 

mflt_run( 1A20 1A6B 1A76 1A60 1A2F) produced ( 1A20>774:0:4
1A6B>868:0:4 1A76>890:0:4 0000>941:0:4)

mflt_run( 1A3F 1A65 1A60 1A37) produced ( 1A3F>811:0:1 1A65>862:0:1
0000>953:2:3)

The shaping of the following, with vowels or MEDIAL RA that should be
rendered before the consonant, was rejected:

mflt_run( 1A3E 1A6E 1A6C 1A65) produced ( 1A6E>872:1:1 1A3E>810:0:3
1A6C>869:0:3 1A65>862:0:3) 

mflt_run( 1A23 1A55) produced ( 1A55>835:1:1 1A23>777:0:0)

mflt_run( 1A32 1A71) produced ( 1A71>875:1:1 1A32>796:0:0)

The problem is that the first glyph does not derive from the first
character.

The shaping of the following was rejected:

mflt_run( 1A20 1A60 1A75 1A37) produced ( 1A20>774:0:2 1A75>889:0:2
0000>953:1:3)

In this case, character 2 is stacked below character 0,
and characters 1 and 3 combine to form a spacing glyph.

mflt_run( 1A20 1A62 1A60 1A75 1A37) produced ( 1A20>774:0:1
1A62>853:0:3 1A75>889:0:3 0000>953:2:4)

Character 1 is mounted on character 0, and character 3 on character 1.
Characters 2 and 4 combine to form a spacing glyph.  

mflt_run( 1A36 1A66 1A75 1A60 1A37) produced ( 1A36>800:0:1
1A66>863:0:2 1A75>889:0:2 0000>953:3:4)

Character 1 is mounted on character 0. and character 2 on character 1.
Characters 3 and 4 form a spacing glyph.

There does appear to be a work around, which is to have m17n declare
the orthographic syllables it receives to be 'grapheme clusters'.  It
solves at least some of the problems above.  However, it then makes
editing of the 'clusters' more difficult.  Note that there are examples
above with 5 characters in a cluster, and this is by no means the limit.

Richard.



--- End Message ---
--- Begin Message --- Subject: Re: bug#20140: 24.4; M17n shaper output rejected Date: Wed, 16 Feb 2022 17:13:56 +0200
> Date: Tue, 15 Feb 2022 01:27:34 +0000
> From: Richard Wordingham <richard.wordingham@ntlworld.com>
> Cc: 20140@debbugs.gnu.org, larsi@gnus.org
> 
> On Mon, 14 Feb 2022 22:14:27 +0000
> Richard Wordingham <richard.wordingham@ntlworld.com> wrote:
> 
> > On Mon, 14 Feb 2022 15:19:36 +0200
> > Eli Zaretskii <eliz@gnu.org> wrote:
> > 
> > > > Date: Sun, 13 Feb 2022 20:53:10 +0000
> > > > From: Richard Wordingham <richard.wordingham@ntlworld.com>
> > > > Cc: larsi@gnus.org, 20140@debbugs.gnu.org
> 
> > > > You should also add CGJ and ZWNJ, and some people may appreciate
> > > > ZWJ - the Khottabun font has ligatures involving ZWJ, though it
> > > > may just be an experimental feature - and ultimately WJ, for when
> > > > someone writes a Tai Tham word breaker.    
> > > 
> > > How should I add CGJ and ZWNJ?  What are the rules?
> > >   
> > > > Oh, and Thai and Lao mai t(r)i and mai chat(t)awa and U+0324
> > > > COMBINING DIAERESIS BELOW turn up occasionally - U+0324 is
> > > > supported in Thep's Khottabun font, and my Da Lekh series
> > > > supports Thai mai tri and mai chattawa. These characters seem to
> > > > work with HarfBuzz.    
> > > 
> > > Not sure I understand: what patterns/rules should be added for
> > > these?  
> > 
> > Add them all to "M" in the definition of tai-tham-composable-pattern.
> > Strictly, U+0324 should also be added to "S", but I'd be surprised to
> > see it in a genuine spelling.
> 
> In view of Wyn Owen's report (A Description and Linguistic Analysis of
> the Tai Khuen Writing System, JSEALS 10.1 (2017)
> https://evols.library.manoa.hawaii.edu/bitstream/10524/52403/1/09_Owen2017description.pdf)
> on Tai Khuen spelling, one should also add U+0E49 THAI CHARACTER MAI
> THO to "M". And, of course, as all 5 non-Tai Tham tone marks used with
> the Tai Tham script have canonical combining class greater than 9, they
> should be added to "S" - i.e. add U+0E49 to U+0E4B and U+0EC9 and
> U+0ECB to "S".

Thanks, done that as well, and installed the changes for Emacs 29.

And with that, I'm closing this bug report.  Thanks a lot for your
code and helpful discussions.


--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]