[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[patch] i18n, l10n, gettext and something more
From: |
Daniel Skarda |
Subject: |
[patch] i18n, l10n, gettext and something more |
Date: |
29 Jul 2001 23:03:51 +0000 |
User-agent: |
Gnus/5.0806 (Gnus v5.8.6) Emacs/20.6 |
<PRELUDE>
If you find some ideas in following lines a little bit crazy - please be
tolerant. Sun was very hot today :-)
</PRELUDE>
Hello,
this mail is about guile and internationalization/localization issues (it
starts with i18n, it ends with i18n, but most of it is not related to these
issues at all :). Because the patch itself may seem unlogical, I try to explain
patch as it evolved.
I18N:
* About year ago I wrote gettext support for guile - and it passed quite
unnoticed. Sometime ago I was asked to update it, so I started.
* One of the tasks is to write xgettext for scheme - the program that extracts
translatable strings out of scheme sources. That means to parse expressions
and identify strings for translations.
Because laziness is basic moving force in universe, I started to think how
to twist existing `read' primitive in guile to meet my needs. From my
point of view it has few flaws:
1) it does not record exact position of strings
2) it always interprets escape sequences (even thought when you do not
want to)
3) it discards comments (surprising, is not it? :-)
In my previous attempt I solved 2) by adding special option to parser
(ugly!).
Today I extended parser in similar way as read-hash-extend already does - but
for any starting character. Now I do not have to write slightly altered
parser from scratch - I need to write only two functions (special string and
comment parser) - and parser is ready.
I think I extended the parser in quite clean way and it greatly improves
the power of programming in scheme - in similar way scheme macros do
(macros offers powerful tool for modifying tree of expressions, 'read'
extension allows you to control how the tree is build from input
files/strings. In both cases you write code in scheme - first one
manipulates with lists, second one transforms strings (input from ports)
into lists)
Here we stop talking about i18n...
... and start to talk about extending parser:
* Also other possible extension comes in mind - for example guile parser
strings is not aware of coding of Japanese characters
<WARNING>
He knows very little about coding of Japanese characters !
</WARNING>
In some Japanese coding, Japanese part of string is encoded between two
escape sequences - and between them there can be arbitrary characters -
even #\" can be there! - so scheme parser gets confused on such
sequences and string parsing is terminated too early. It would be very
hard to teach guile parser all about strings - but `read' extension gives
you the ability to alter standard behaviour when you need to.
* When I had parser with extendable `read' function, another idea came in
my mind: How hard it would be to extend guile syntax with infix
expressions? I started coding...
(use-modules (ice-9 infix))
(format #t "~a~%" [sin (10) ^ 2 + cos (10) * cos (10)])
Here '[' calls "sub-parser" which parses the infix expression. Infix
parser (everything in scheme - see ice-9/infix.scm) is quite short yet
powerful.
It was very entertaining to extend guile in this way. I have to admit
that in one moment I had to say "stop" (I caught myself playing with idea
about writing parser for C or Perl like language... it is a shame I have
not enough spare time (or sponsor :))
Further extensions of parser...
... if you are scheme syntax puritan, please do not read following section...
* When I reflect code that I wrote I think "rough" it can be improved in
several ways (post 1.6.0? :)
- Parser extension (lookup array: char -> procedure) is global. That's bad
thing - IMHO each port should have one such array (arrays could be shared
so do not worry about wasted memory)
- It should cooperate with modules. When you write
(use-modules (ice-9 infix))
you turn on infix sub-parser for all modules/ports you are working with.
(define-module (my super module)
:use-modules (srfi srfi...)
:parser-extension infix
:parser-extension perl
.... [ sqrt (a ^ 2 + b ^ 2) ] ....
.... [perl:
while (<>)
{
print if (/..../);
That was quite wild (and useless) example, but some cooperation between
modules and "sub-parsers" would be nice.
- Enough dreams, let's return to back to real life. There are two
possible improvements to guile parser:
Modularize it. scm_lreadr now looks something like this:
switch (getc ())
{
case '"':
I think the speed would not decrease too much if we write
call (array [c = getc ()], c, port)
Also sometimes can be handy to have function that extracts numbers/
strings/symbols/... from input port (read-number, read-symbol...)
Another solution is to have two parsers - one basic and very fast
parser in libguile and second one slightly slower - but very extensible
in module.
- Second improvement is stolen (oops! ^H^H^H^H^H^H :) borrowed from TeX -
introduce something like catcodes.
For example some people tend to complain that it is hard to count
scheme parenthesis (they have not find the church of Emacs - yet :) and
they would like to use brackets (or braces) in the same way they use
parenthesis:
[+ (+ 1 (+ 1 [+ 1 (+ 1 (+ 1 (+ 1 1)))]))]
But scheme hackers say - no way! So what should such poor little fellow
do? (right answer: change the catcode of braces and brackets so they
would open/close lists...)
Another example: my current infix parser uses 'read' for reading
symbols - so you are limited by 'read' assumptions about scheme symbols
You have to write [a * b] (note spaces) because a*b is one symbol in
scheme. But when there is possibility to change the catcode of '*' ....
Back to I18N business...
* Because I am quite worried that I cought less than '1%' of your
attention (and many lines ago most of you already hit 'D' or 'E' in your
favourite mailer :-) I promise to be brief (no breath taking ideas left
in my pocket :)
* My guile-core/scripts/xgettext implementation try to mimic most of
features found in xgettext from GNU gettext package.
* Language bindings for *gettext functions (libintl) is in the module
(i18n gettext).
* You will need GNU gettext 0.10.38.
BUGS:
* There are exactly 27 bugs (unfortunately, all of them are unknown).
* Argument parsing is affected by getopt-long bugs I reported earlier today.
Future development:
* (post 1.6 ?) localize all messages in libguile/*.c and ice-9/*.scm
* Debug, debug, debug.
* Even thought I tried really hard to release polished code, it is
possible that I left many rough/untested/buggy lines in my code.
Please let me know if something does not work for you.
Open questions:
* I know about CVS branches, feature freeze etc. But patch is clean,
(afaik) does not break anything (infix module can be marked as
experimental) and i18n/l10n features can be quite important.
Could it be included in this stable release?
* What do you think about extensions to parser (and other proposed
improvements). For me the ability to alter the way input is parsed is
quite appealing and at least should be implemented as alternative
parser.
Oops, I almost forgot :-)
ftp://atrey.karlin.mff.cuni.cz/pub/local/0rfelyus/guile/
Just place new files/dirs into guile-core and apply guile.patch to old files...
-- End of long and boring email --
-- What do you think now ? --
Best regards,
Dan Skarda
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [patch] i18n, l10n, gettext and something more,
Daniel Skarda <=