smc-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[smc-devel] [Fwd: address@hidden: Draft Approach Paper]]


From: Baiju M
Subject: [smc-devel] [Fwd: address@hidden: Draft Approach Paper]]
Date: Fri, 15 Nov 2002 09:17:03 +0400 (SCT)

Forwarding a message, I think some of the points will be relevent to
Malayalam also.

Regards,
Baiju M

-------- Original Message --------
Subject: address@hidden: Draft Approach Paper]
From: "Nagarjuna G." <address@hidden>
Date: Fri, November 15, 2002 8:25 am
To: Baiju M <address@hidden>


----- Forwarded message from jitendra <address@hidden> -----

Envelope-to: address@hidden
From: jitendra <address@hidden>
Subject: Draft Approach Paper
To: "Nagarjuna G." <address@hidden>
X-AntiVirus: OK! AntiVir MailGate Version 2.0.0.0
         at hbcse.tifr.res.in has not found any known virus in this email.

Dear Nagarjuna
May I request you to give your comments and also get Arun M and Baijju
to respond
Jitendra


Dear Dr. Rekha Govil
    May I request you to give your comments , suggestions, criticisms
etc.
May I also request you to send me the e-mail addresses of Dr Om Vikas
and Mrs Swarnalata and all the participants.. After getting a first
reaction from some more people I would like to forward the document to
them. It may not be right to keep burdening them with raw documents.
Jitendra



      Towards a Character Encoding and Document Modelling Standard

Apropos the ''Localisation Clinic'' Meet under the auspices of
Vanasthali Vidyapeeth and CoIL-NET meeting , both at Udaipur during
11-13 November, we all discussed the issues relating to the above
topics.
Those present in the meetings  included the several   members of the
indian language software indusry i.e. V-Soft, Modular, Cyberscape,
C-DAC, the represntatives of the MAIT ,NCST . The meet also brought
together sevaral members from the academia such as BITS Rachi, IIIT
Gwalior , IIT Roorkee, BHU Varanasi ,  and above all the members from
Vanasthali Vidyapeeth and also the undersigned.The Represntatives from
the TDIL, MIT, New Delhi included Dr Om Vikas , Mrs Swarnalata and Mr P.
K. Chaturvedi.
While issues were discussed in clinical authenticity, I feel honoured
that I was asked to prepare an approach paper based on which a meeting
of all concerned , that is industry, academia and TDIL could meet and
urgently tharsh out the issues.

I am jotting down , here a few draft points as a starting point for all
others to add their concerns,views and substantiations. After one or two
iterations, which the undersigned will moderate a cohesive agenda with
back up material , if necessary small get-togethers we shall arrive at a
sufficiently satifactory approach paper to begin a meeting and arrive at
some consensus.

Objectives
    Ultimately we need to produce and communicate documents in Indian
languages with efficiency, facility and do so in universally acceptable
manner. Of course, due to historical reasons , we have legacy systems
which have followed or not followed standards. We should provide for as
much backward compatibility as feasible and also give a transitional
program . However we shall be more determined by the future and global
trends and not get restricted in local and historical limitations.
    We know that so far we were restricted by the 8-bit represntations
and OS level support for them. Now that we have a multi-byte
representation supported in all the new versions of OSes.However , if we
go by the currently floating proposed standards of encoding based on
some fonts , there are many problems. Many of these problems were
discussed in the meet. May be they need to be documented in the light of
the following.
   For standard documentation we need standard character encoding,
standard input-methods , standard document definitions and default fonts
for fall back and standard rendering . The standard rendering mechanisms
in the syytems can take care of display and printing. For multi-lingual
data and intra-lingual transfer, we need transliteration tools.
    For demostrating all these we need standard application (without
frill)  like word processor and spreadsheet, and publishing .
    For making all this part of public knowledge , all this should be
well documented and made available on web-pages for full access
including download.
    For keeping the standards alive, their must be a page maitenance
system with clear responsibility.  TDIL site will be the ideal place for
hosting the web pages.

1> To arrive at a standard for the character encoding:

     a> We shall no more standardise fonts but only standardise
     character encoding
     b> Only such encoding will be accepted as standard which is
     free under GPL or other such license without any restricion of IPR.
     c> Only such encodings will be accepted which are cross
     platform.

          i) Initially this will be for the current versions
          of MSWindows and Linux kernel 2.4.18 onwards.
          ii) The sama should be convertible (i.e converters
          should be made available for legacy systems such as
          8-bit ISCII) with legacy .

     d> There can be sevral standards but converters from each
     other is a pre-requiste for more than one standard.
     e>To start with we must have two standard encodings each for
     each script. We have to have atleast one standard encoding for each
script for ttf font and one for OTF fonts.
     f> At first we restrict ourselves to Devanagari.
     g> We take the mangal font of Microsoft or raghu fonts of NCST as
good starting points and with due acknowledgement,  but in no way
get restricted by the same.

2> To arrive at standard input methods:
        Keyboard input is not the only one. pen input and touch-screen
inputs and speech input are becoming popular and standrads may be
evolved for the same. At least a framework may be evolved and a public
dicussion on the same be initiated.

     a>Keyboard input: C-DAC has done an excellent job on inscipt
     and a few other keyborads.Some more keyborads may be included as
standard based on experience.
     b> Hand-writing recognition :  Pen inputs may be standardised .
     c> Others??

3> Document definitions (Please note that the undersigned has only
perfuntory knowledge of XML and will be obliged if some one corrects any
overt or covert error.)

     XML is emerging as the future standard and hence XML may be
     considered for the document represntation. This will also give the
much needeed flexibilty of character encoding for the
     legacy represenations (using appropriate  Document Type
     Definitions (DTD) )
     Since XML native represntation is unicode(this has effect on
     transport of files from one place to other)  , and since a
     character model is being standradised (this will affect the
     character encoding) , it is better to accept unicode
     represntation. This does not at all mean that the  unicode
     encoding proposed by a specific company has to be a standard
     but only means that a structure of multibyte representation is
accepted. The specific encodings , as proposed by Microsoft
     have not yet been accepted as ISO standard and we may propose
achange which ISO standard and automatically  unicode
     consortium will accept. If not a converter may have to be
     devised.
     Multi-lingual and Intra-lingual:It remains for us to arrive at an
alternative encoding which takes care of our multilingual
     and intralingual requiremnets. Do we have an agreement on the bets
way to utilise the encoding space for this. We must
     collect all the published documents on the issue , and request the
liguistics-cum-character-encoding-unicode experts to give a set of
suggetsion that makes our job of deciding easy. Study of Acharya
web site of IIT Madras and C-DAC documents if any
     may have to done in the light of unicode.Do we propose a
     change in the encoding for a standard for intralingual work.
     So those who donot want to interchange data may continue to
     work with whatever encoding they may have adopted or use their own
converters.
     Document Type Declarartion : ( As distinct for Document Type
     Defitnition which refers to entities of character type)
     Document Type declarartion refers to whether the document is
     multi-lingual or unilingual and depending on that the locale
     specification is decided and correponding encoding and other
     standards are used.
     Country and Language standard: ISO standards exist for naming many
indian languages (e.g hi_IN or hi-in for hindi some
     toimes case sensitive some times not? etc).It is not there for
multi-lingual document. We need to suggest one for such if we are
to have common encoding for all Indian languages.

4> Tools for fonts : GTK 2.0 and Pango 1.1.1
        The undersigned felt that most vendors of the fonts have been
working within the framework of non-unicode ttf fonts with HTML as the
standard document. Further they are also working without a general
purpose powerful rendering tool that is designed for complex-text.  As
such they are working with OS specific ( even version specific) tools
for rendering. Even the graphics infrastructure is specific to the
Operating system . Thus there is feeling in those working in MS world
that they have little use of Linux based tools.
        Luckily GTK 2.0 and Pango 1.1.1 are precisely such Free-software
(not just open source) which , with due acknowledgement can be used for
the cross-platform fonts and general applications.
        Vendors may  be glad to have a standard platform so they can add
value and reap the benefits. The undersigned believes that the free
softwrae (GPL) license does not prevent any one from adding value and
earn from it.


Jitendra Shah





----- End forwarded message -----

-- 
------------------------------------------------------------------------
address@hidden                 www.hbcse.tifr.res.in/gn/
Key fingerprint = C1E2 1B8C 8E98 A697 68B7  ADAC E956 6D4B DE90 BF01







reply via email to

[Prev in Thread] Current Thread [Next in Thread]