freecats-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Freecats-Dev] Standalone client editor / OO integration (from Keith)


From: Henri Chorand
Subject: [Freecats-Dev] Standalone client editor / OO integration (from Keith)
Date: Thu, 27 Feb 2003 00:10:30 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003

Keith Godfrey wrote:

As translators, our first aim was to provide a full-fledged
>> standalone translation editor, because it might be the most
>> productive solution. We then quickly realized that we would
>> need as many conversion filters as possible in order to be
>> able to translate whatever customers require, and we thought
>> about the huge job done by Open Office team. We soon realized
>> their conversion filters would have to be integrated into
Free CATS client(s).

Building a standalone translation editor of good quality, capable
> of representing source files of different natures reasonably well
> (forget accurately) will be a significant undertaking, imho, and
> one that will be very difficult to accomplish in a cross-platform
> manner.

This is why we believe we need to make our tool work with its working format only (a pivot format) and use conversion filters. It's true, there is no hope in designing a custom bilingual working format for each specific source file format we want to be able to translate, then adapting our tool to each of those.

OO's XML format might well be the one we're looking for, and it seems you already did a lot with this one. Such a working format needs (among other things) to include a set of formatting attributes that is as comprehensive as possible, at paragraph-level and at character-level. Thierry also believes this one must be XML-based and I see no other solution. That said, as XML is a very broad thing, and the way OO sees it and implements it seems to be what we need (we would only have to embed it with our TU delimiters). The fact that OmegaT supports OO's format is another reason we really would like to cooperate with you on this project.

One possible solution would be to take an open source word
> processor that has some filters to start with (such as Abiword)
> and build upon that, but then you're tying yourself to a specific
> platform.  On the plus side, you've already got a solid
> infrastructure to start from.  A CAT tool imbedded within
> OpenOffice.org, if it were possible, might provide the most
> optimal solution, but I've never heard if such a task would work.


I would just like to point out that our present talks deal with the following issues:
- the TM server part
- a bilingual file working format (handled by the translation client)
- the translation client itself (standalone or based on a WP).


At this stage, let me know if you agree with the following:

- Client-level issues should not prevent us from working towards
  building a TM server, starting from OmegaT.

- We should try to define and implement its API based on:
    - what OmegaT will need (it should be easy to plug its interface
      on top of its own internals, of course)
    - what Yves Champollion believes he needs in order to make
      WordFast support it


We might see more clearly what needs to be done for the translation
client once the TM server is precisely defined, and our architecture
should be modular enough to accept whatever translation client(s) we
develop on top of it.


In an ideal world, translators might ask for both a standalone
translation editor (like OmegaT) and integration within a word
>> processor.

My background would suggest that a focus should be made on one
> or the other techniques - trying to satisfy both will be
> significantly more complex and likely never be completed.

True.

Intead of using Abiword, it might be worth considering making
> a custom build of OOo with built in Trados like features (and
> port the parts of other open source tools, such as OmegaT,
> that would be needed to make the CAT side work).

At this stage, we might well try to contact OO team in order to:
- Ask them how to best build our plugin
- Ask for their help (after all, it would make OO the heart of
  our solution - at least, Yves Champollion should agree here,
  that's the way we should let them see it)

One thing to consider - all of the file filters in OmegaT
> would require a complete rewrite if text style information
> needs to be extracted, and seperate output filters would
> need to be created if the user is allowed to modify the
> file formatting.

This would mean we would keep the TM server part of OmegaT
untouched (or nearly), but let some of its other layers.

OmegaT's filters are reasonably simple - they extract bits
> of text from a stream of data (the source file) and simply
> replace that text with translated text when writing the
translated file.  That method provides very strict
> enforcement of no formatting changes outside of the proper
> word processing editor.

Well, maybe we can take them as a basis for future work.

Could we design our working format starting from OO's Writer
format, then add TU delimiter tags in a way similar to how
Trados and WordFast do with a MS Word file?


Unless one goes with a high quality word processor (such as
> OOo), it may be dangerous try to modify formats - clients
> may end up with files that don't work for them (I've spent
> several years as a localization engineer and have seen
> plenty of corrupted files - another reason for OmegaT's
strict formatting policy)

Of course, this should not be allowed in a "free" way.
We have not written in detail in our specification documents,
but the way we see it would be, IF we work with a standalone
translation editor, to only allow deleting tags (adequately
recognized as such) or inserting new ones (with a super-
"insert" menu).

When I think about it, it looks simpler if we work from within
OO, because the user would be allowed to add formatting while
working on a OO document, which means this new formatting
would necessarily be OO-compatible. At the very least, when I
think about how Trados and WordFast work from within MS Word,
it looks feasible and I really hope it is.

At this stage, we need to know IF (and to which extent)
>> OmegaT keeps all such formatting (as found in OO's native
>> XML format files).

As mentioned above, OmegaT ignores all formatting information
> - it's only interest is in whether or not the formatting
> tags are 'hard' (like paragraph boundaries) or 'soft' (like
> formatting boundaries).  The tags are discarded after
> identification.

We might then try to improve things here.
Let me know what you feel about the above suggestions.
I also hope Yves can provide a clever feed-back on all this.


Henri





reply via email to

[Prev in Thread] Current Thread [Next in Thread]