bug-gne
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-gnupedia] Content Format


From: Jean-Daniel Fekete
Subject: [Bug-gnupedia] Content Format
Date: Wed, 24 Jan 2001 19:11:17 +0100

Bob Dodd <address@hidden> Wrote :

> Why are we getting so hung up on content format?
>
> Clearly there is a *minimum* level of information we need to know
about
> the entry (however that gets submitted and stored), and we need to
know
> what format the content is in, in order to present the content to the
> user. But that's all.
>
> This whole thing about HTML, XML, Tei, Latex, MathML for content
> description is meaningless, and so quickly outdated by the "march of
> technology" as to make discussion almost (but not quite) pointless...

No, this is not true.
All the "big" libraries like the Lib. of Congress are concerned about
persistence of technology.  They have not been able to re-read tapes
that have been written ten years ago.  They know what they are talking
about for data format and encoding, as well as keeping textual
information for a long time.

However, they are now considering XML as mature enough to be the
infrastructure for encoding  their textual archives and they consider
TEI as the good format.

The point is not to have the latest technology here.  The point is to be
able to describe an encyclopedia faithfully without losing important
information.  If technology changes, it is easy to translate from
XML/TEI into something else that suits our needs.

To make a parallel, if you consider the GNU projet ten years ago, RMS
chose the "C" language because it was mature enough and could be used to
implement all desirable applications.  "C" has not been chosen because
it was the "best" language, nor the preferred language of RMS, just
because it was mature.
The point is the same with XML and TEI that can be paralleled with Unix
(the infrastructure) and "C" (the language).

> There is a minimum practical limit on content formats in that our
> "editors" (however you wish to define that term) need to be able to
> check for spam etc. After that, it's up to presentation tools to
decide
> how to handle the format of the material (e.g. some may not be able to

> display Chinese or Arabic fonts), and when they can't how they inform
> the user (e.g. a prompt to "Format xyz not supported: save to disk?")
>
> Let's be honest, the vast majority of people don't even know latex or
> Tei exists, would be terrified of writing XML/HTML/WML, and would
> expect to write their (formatted) entries in Microsoft Word, embedding

> pictures created using Excel and Visio. If we want their entries (and
> it's pointless even saying "text files", because most people would
> think that a Word document *is* a text file), we need to accept those
> formats too... So long as storing an retrieveing the content doesn't
> have copyright issues, I think you have to leave it to presentation
> tools, as to which format they support, and when they can only show
our
> "mimumum" information.

You can already create XML/TEI documents using Emacs and FrameMaker.
It would be better if Microsoft Office could produce it but I don't
expect that in the near future.
If we need free tools to help in the creation of Nupedia, we can do it.
But it would be a much difficult task to build another XML DTD plus the
tools, given the DTD would evolve quickly.

> OK, you can encourage certain content formats, but you can't be too
> prescriptive. We also have to live in the real world, where most of
our
> authors may be computer literate, but their idea of document
production
> is to use commercial tools (that often come "free" with their PCs).

You are right, the format required for a encyclopedia is probably
complicated.  I have written some encyclopedic articles (about
PostScript and PDF) and it took me time to do it right.  Somebody
re-entered it in FrameMaker and did the editorial work to fit the
quality of the encyclopedia.

I don't know how to avoid that.  I could have done it, but you cannot
expect a specialist of say botanics to follow the rules required by a
good encyclopedia.  Somebody will have to translate the MSWord document
and structure it, filling the required field to index it.

Accepting loosely structured documents will lower the quality of the
encyclopedia and that does not seem to match the goal of the project.

--
  Jean-Daniel Fekete
  Ecole des Mines de Nantes, 4 rue Alfred Kastler, La Chantrerie,
  BP 20722, 44307 Nantes Cedex 03, France
  Voice: +33-2-51-85-82-08  | Fax: +33-2-51-85-82-49
  address@hidden | http://www.emn.fr/fekete/





reply via email to

[Prev in Thread] Current Thread [Next in Thread]