freecats-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Freecats-Dev] First review of Free CATS specification docs by Yves Sa


From: Henri Chorand
Subject: [Freecats-Dev] First review of Free CATS specification docs by Yves Savourel
Date: Sun, 22 Jun 2003 19:34:38 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003

Hi all,

Here is some food for thought: Yves reviewed our specification documents and came up with a first feed-back.

I hope the attachment will not get lost (I allowed them in the mailing list setup).


Cheers,

Henri

-------- Original Message --------
Subject: RE: Free CATS specification documents
Date: Sun, 22 Jun 2003 09:55:11 -0600
From: "Yves Savourel" <address@hidden>
To: <address@hidden>



Hi Henri,

Thanks for the documents. I've looked at them and came up with the few
notes and ideas below, in no specific order. Please, take them for what
they are: just 'thinking out loud' type of notes.


One think I would suggest for the TM is to take in account a possible
third type of match. Exact match when you get 100% of the two source
segments identical, fuzzy match for any segment under 100%, and 'perfect
match' (I'm not sure how to call it) for the case where it's an exact
match and you are able, somehow, to detect that the context is also
identical. This is an important distinction because perfect matches
normally don't have to be looked at, while exact matches have to be
looked at by the translator in context. Perfect match could be related
to an ID for example, like a string in a resource file. Having a
identical source text doesn't mean it's the same instance of text,
therefore the context may be different and so the translation. But
having the same text and the same ID makes the match much more 'safer'.
This could obviously come at a later stage of development.

Liked to this, there is a huge aspect of TM that should really be done
without TM. I my opinion a large part of what TMs do is just a patch, a
remedy for the symptom of the real problem: the fact that you don't know
what part of the document has change between version 1 and 2. An updater
module would be a huge step forward: a way to compare source doc version
1, source doc version 2, translated doc version 1, and create the
translated version 2 with the delta left to edit or translate (and then,
at that point the translator+TM takes over).

A note on the TM server repository: You seem to look into XML-databases
with XML-based indexing engine. It's certainly a possibility, but don't
discard more simple classic database as well. Something like mySQL for
example is free and performing very well. SQL offers commands such as
LIKE that are quite powerful and give already very good query result for
fuzzy matching (without you doing anything). If on top of that you
combine such query on a key field generated from the text of the TU it
can be quite efficient.

Related to fuzzy matching: I've also attached an old article
(Waikoloa.zip) that explains one way to create a simple TM engine. It's
certainly nothing fancy, but it may help a little bit understanding how
calculating fuzzy matches can work. You can download the source code and
the executable of the sample at _http://www.opentag.com/Waikoloa.zip_.

On the topic of interfaces (meaning 'API' not UI). You probably have
heard of T-Remote, a product from Telelingua, which is basically the
same think as your TM server. They have developed an interface and the
'connectors' to plug their workbench client to various existing TM
suites. Maybe they would be interested in some form of collaboration.
They may see FreeCATS as a threat to their own solution, but even so.
See for example Philippe Mercier's article in one of the LISA newsletter
(_http://www.lisa.org/archive_domain/newsletters/2003/1.4/mercier.html_).


One other minor thing: Getting files in SWX format made me smile. While
OO is a good application and a free one, very few people in the Windows
world (which is whether we like it or not, the world where, by far, most
people of our industry work), have it installed. It's no big deal: they
are really ZIP files with XML documents inside, but I would have
expected an open source project to have documentation output in a very
common format like HTML. This is nothing against OO as authoring tool,
but it's probably not the best format to use for distribution if you
publish them to a wider public :)

This actually made me think about a possible problem of open source
projects. Many are a little bias toward Linux, Java, etc. in reaction
against Microsoft often. But the mainstream of possible users are on
Windows, and expect Windows-like applications. Cross platform
applications have often the drawback of being generic, therefore not
exactly like users expect. Those aspects don't seem very important at
first: in Windows for example it would be that you can't use Ctrl+C to
copy to the Clipboard, or can't use Alt+key to access a menu, etc. On
the Mac, the user will expect to use whatever Mac special keys sequences
he/she usually uses, etc. At length not having such expected behavior
become very annoying to a user, and it becomes a reason to not use the
application. I guess my point is that as soon as you talk about UI, it
is very important to have a UI that really fits into the platform it
targets, or the users will give up. That means a cross-platform
application may sometime have to have parts developed specifically for
each targeted platforms.


Ok, that all for today. Il fait trop beau dehors pour continuer à taper
sur un clavier :)

Kenavo,
-yves

<<...>>

Attachment: Waikoloa Article.zip
Description: Zip compressed data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]