guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[patch] i18n, l10n, gettext and something more


From: Daniel Skarda
Subject: [patch] i18n, l10n, gettext and something more
Date: 29 Jul 2001 23:03:51 +0000
User-agent: Gnus/5.0806 (Gnus v5.8.6) Emacs/20.6

<PRELUDE> 
  If you find some ideas in following lines a little bit crazy - please be
  tolerant. Sun was very hot today :-) 
</PRELUDE>

Hello,

  this mail is about guile and internationalization/localization issues (it
starts with i18n, it ends with i18n, but most of it is not related to these
issues at all :). Because the patch itself may seem unlogical, I try to explain
patch as it evolved.

I18N:

 * About year ago I wrote gettext support for guile - and it passed quite
   unnoticed. Sometime ago I was asked to update it, so I started.

 * One of the tasks is to write xgettext for scheme - the program that extracts
   translatable strings out of scheme sources. That means to parse expressions
   and identify strings for translations. 

   Because laziness is basic moving force in universe, I started to think how
   to twist existing `read' primitive in guile to meet my needs. From my
   point of view it has few flaws:

     1) it does not record exact position of strings 

     2) it always interprets escape sequences (even thought when you do not 
want to)

     3) it discards comments (surprising, is not it? :-)

   In my previous attempt I solved 2) by adding special option to parser 
(ugly!).
   Today I extended parser in similar way as read-hash-extend already does - but
   for any starting character. Now I do not have to write slightly altered
   parser from scratch - I need to write only two functions (special string and
   comment parser) - and parser is ready.

   I think I extended the parser in quite clean way and it greatly improves
   the power of programming in scheme - in similar way scheme macros do
   (macros offers powerful tool for modifying tree of expressions, 'read'
   extension allows you to control how the tree is build from input
   files/strings. In both cases you write code in scheme - first one
   manipulates with lists, second one transforms strings (input from ports)
   into lists)

Here we stop talking about i18n...
... and start to talk about extending parser:

 * Also other possible extension comes in mind - for example guile parser 
   strings is not aware of coding of Japanese characters

    <WARNING>
      He knows very little about coding of Japanese characters !
    </WARNING>
  
   In some Japanese coding, Japanese part of string is encoded between two
   escape sequences - and between them there can be arbitrary characters -
   even #\" can be there!  - so scheme parser gets confused on such
   sequences and string parsing is terminated too early.  It would be very
   hard to teach guile parser all about strings - but `read' extension gives
   you the ability to alter standard behaviour when you need to.

 * When I had parser with extendable `read' function, another idea came in
   my mind: How hard it would be to extend guile syntax with infix
   expressions?  I started coding...

      (use-modules (ice-9 infix))

      (format #t "~a~%" [sin (10) ^ 2 + cos (10) * cos (10)])

   Here '[' calls "sub-parser" which parses the infix expression. Infix
   parser (everything in scheme - see ice-9/infix.scm) is quite short yet
   powerful. 

   It was very entertaining to extend guile in this way. I have to admit
   that in one moment I had to say "stop" (I caught myself playing with idea
   about writing parser for C or Perl like language... it is a shame I have
   not enough spare time (or sponsor :))

Further extensions of parser...
 ... if you are scheme syntax puritan, please do not read following section...

 * When I reflect code that I wrote I think "rough" it can be improved in
   several ways (post 1.6.0? :)

   - Parser extension (lookup array: char -> procedure) is global. That's bad
     thing - IMHO each port should have one such array (arrays could be shared
     so do not worry about wasted memory)

   - It should cooperate with modules. When you write 
   
      (use-modules (ice-9 infix)) 

     you turn on infix sub-parser for all modules/ports you are working with.
     
       (define-module (my super module)
         :use-modules (srfi srfi...)
         :parser-extension infix
         :parser-extension perl

        .... [ sqrt (a ^ 2 + b ^ 2) ] ....

           .... [perl:
                  while (<>)  
                    {
                      print if (/..../);

     That was quite wild (and useless) example, but some cooperation between
     modules and "sub-parsers" would be nice.

   - Enough dreams, let's return to back to real life. There are two
     possible improvements to guile parser:
    
     Modularize it. scm_lreadr now looks something like this:

       switch (getc ())
         {
         case '"':

     I think the speed would not decrease too much if we write

       call (array [c = getc ()], c, port)

     Also sometimes can be handy to have function that extracts numbers/
     strings/symbols/... from input port (read-number, read-symbol...)

     Another solution is to have two parsers - one basic and very fast
     parser in libguile and second one slightly slower - but very extensible
     in module.

   - Second improvement is stolen (oops! ^H^H^H^H^H^H :) borrowed from TeX - 
     introduce something like catcodes. 

     For example some people tend to complain that it is hard to count
     scheme parenthesis (they have not find the church of Emacs - yet :) and
     they would like to use brackets (or braces) in the same way they use
     parenthesis:

        [+ (+ 1 (+ 1 [+ 1 (+ 1 (+ 1 (+ 1 1)))]))]

     But scheme hackers say - no way! So what should such poor little fellow
     do? (right answer: change the catcode of braces and brackets so they
     would open/close lists...)

     Another example: my current infix parser uses 'read' for reading
     symbols - so you are limited by 'read' assumptions about scheme symbols
     You have to write [a * b] (note spaces) because a*b is one symbol in
     scheme. But when there is possibility to change the catcode of '*' ....

Back to I18N business...

  * Because I am quite worried that I cought less than '1%' of your
    attention (and many lines ago most of you already hit 'D' or 'E' in your
    favourite mailer :-) I promise to be brief (no breath taking ideas left
    in my pocket :)

  * My guile-core/scripts/xgettext implementation try to mimic most of
    features found in xgettext from GNU gettext package.

  * Language bindings for *gettext functions (libintl) is in the module 
    (i18n gettext). 

  * You will need GNU gettext 0.10.38.

BUGS:

  * There are exactly 27 bugs (unfortunately, all of them are unknown).

  * Argument parsing is affected by getopt-long bugs I reported earlier today.

Future development:

  * (post 1.6 ?) localize all messages in libguile/*.c and ice-9/*.scm

  * Debug, debug, debug.

  * Even thought I tried really hard to release polished code, it is
    possible that I left many rough/untested/buggy lines in my code.
    Please let me know if something does not work for you.

Open questions:

  * I know about CVS branches, feature freeze etc. But patch is clean,
    (afaik) does not break anything (infix module can be marked as
    experimental) and i18n/l10n features can be quite important.
    
    Could it be included in this stable release?

  * What do you think about extensions to parser (and other proposed
    improvements). For me the ability to alter the way input is parsed is
    quite appealing and at least should be implemented as alternative
    parser. 

Oops, I almost forgot :-)

   ftp://atrey.karlin.mff.cuni.cz/pub/local/0rfelyus/guile/

 Just place new files/dirs into guile-core and apply guile.patch to old files...
   
                   -- End of long and boring email --
                     -- What do you think now ? --

Best regards,
Dan Skarda



reply via email to

[Prev in Thread] Current Thread [Next in Thread]