bug-gne
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnupedia] Content Format


From: Jean-Daniel Fekete
Subject: Re: [Bug-gnupedia] Content Format
Date: Wed, 24 Jan 2001 22:19:02 +0100

If you allow any format, don't insist on editorial rules and only create
indexes from full text, it is closer to the WEB than to an encyclopedia.

I don't see *being different from Nupedia* as a goal for gnupedia.

Using a encoding format for the document does not mean we will use the same
format for showing the document.
For example, the texinfo format of the GNU tools is designed to document
software but, for reading, it is translated into TeX,  HTML or INFO.
XML/TEI is not designed to be presented to humans.  It is designed for easy
processing and, as one kind of processing, can be translated in any
convenient display format such as HTML.
Relying on a high level format for the original document provides a lot of
help to indexers and other processing tools.

  Jean-Daniel Fekete
  Ecole des Mines de Nantes, 4 rue Alfred Kastler, La Chantrerie,
  BP 20722, 44307 Nantes Cedex 03, France
  Voice: +33-2-51-85-82-08  | Fax: +33-2-51-85-82-49
  address@hidden | http://www.emn.fr/fekete/

----- Original Message -----
From: Tom Chance <address@hidden>
To: <address@hidden>
Sent: Wednesday, January 24, 2001 8:44 PM
Subject: Re: [Bug-gnupedia] Content Format


> I think what Rob is trying to say here (I might be
> wrong) is that we would be missing a great opportunity
> if we kept articles in exactly the same way as
> Nupedia. It may be that that way is simply the only
> option, or by far the best, bot both Rob and I
> disagree.
>
> However I suppose if people really want to keep the
> articles themselves in individual files, the way to go
> is to index them with a mySQL database (for searching
> etc.) and to keep the files themselves as simple text
> files in a large respository. All the searching,
> parsing and displaying of the articles would be done
> by Perl (the internet's best language, which could
> easily convert m$ word, LaTeX, MathML etc. documents
> into formatted plain text. It could also parse the
> text files for things like <author> and interpret them
> in the same sort of way that XML would be doing), and
> displayed on the frontend in simply html form (as XML
> won't work with many browsers, even back to Netscape
> 4.6 whose support for XML is awful).
>
>
> Tom Chance
>
>
> --- Rob Scott <address@hidden> wrote: > WE ARE
> NOT NUPEDIA
> >
> > --- Jean-Daniel Fekete <address@hidden>
> > wrote: >
> > > Bob Dodd <address@hidden> Wrote :
> > >
> > > > Why are we getting so hung up on content format?
> > > >
> > > > Clearly there is a *minimum* level of
> > information
> > > we need to know
> > > about
> > > > the entry (however that gets submitted and
> > > stored), and we need to
> > > know
> > > > what format the content is in, in order to
> > present
> > > the content to the
> > > > user. But that's all.
> > > >
> > > > This whole thing about HTML, XML, Tei, Latex,
> > > MathML for content
> > > > description is meaningless, and so quickly
> > > outdated by the "march of
> > > > technology" as to make discussion almost (but
> > not
> > > quite) pointless...
> > >
> > > No, this is not true.
> > > All the "big" libraries like the Lib. of Congress
> > > are concerned about
> > > persistence of technology.  They have not been
> > able
> > > to re-read tapes
> > > that have been written ten years ago.  They know
> > > what they are talking
> > > about for data format and encoding, as well as
> > > keeping textual
> > > information for a long time.
> > >
> > > However, they are now considering XML as mature
> > > enough to be the
> > > infrastructure for encoding  their textual
> > archives
> > > and they consider
> > > TEI as the good format.
> > >
> > > The point is not to have the latest technology
> > here.
> > >  The point is to be
> > > able to describe an encyclopedia faithfully
> > without
> > > losing important
> > > information.  If technology changes, it is easy to
> > > translate from
> > > XML/TEI into something else that suits our needs.
> > >
> > > To make a parallel, if you consider the GNU projet
> > > ten years ago, RMS
> > > chose the "C" language because it was mature
> > enough
> > > and could be used to
> > > implement all desirable applications.  "C" has not
> > > been chosen because
> > > it was the "best" language, nor the preferred
> > > language of RMS, just
> > > because it was mature.
> > > The point is the same with XML and TEI that can be
> > > paralleled with Unix
> > > (the infrastructure) and "C" (the language).
> > >
> > > > There is a minimum practical limit on content
> > > formats in that our
> > > > "editors" (however you wish to define that term)
> > > need to be able to
> > > > check for spam etc. After that, it's up to
> > > presentation tools to
> > > decide
> > > > how to handle the format of the material (e.g.
> > > some may not be able to
> > >
> > > > display Chinese or Arabic fonts), and when they
> > > can't how they inform
> > > > the user (e.g. a prompt to "Format xyz not
> > > supported: save to disk?")
> > > >
> > > > Let's be honest, the vast majority of people
> > don't
> > > even know latex or
> > > > Tei exists, would be terrified of writing
> > > XML/HTML/WML, and would
> > > > expect to write their (formatted) entries in
> > > Microsoft Word, embedding
> > >
> > > > pictures created using Excel and Visio. If we
> > want
> > > their entries (and
> > > > it's pointless even saying "text files", because
> > > most people would
> > > > think that a Word document *is* a text file), we
> > > need to accept those
> > > > formats too... So long as storing an retrieveing
> > > the content doesn't
> > > > have copyright issues, I think you have to leave
> > > it to presentation
> > > > tools, as to which format they support, and when
> > > they can only show
> > > our
> > > > "mimumum" information.
> > >
> > > You can already create XML/TEI documents using
> > Emacs
> > > and FrameMaker.
> > > It would be better if Microsoft Office could
> > produce
> > > it but I don't
> > > expect that in the near future.
> > > If we need free tools to help in the creation of
> > > Nupedia, we can do it.
> > > But it would be a much difficult task to build
> > > another XML DTD plus the
> > > tools, given the DTD would evolve quickly.
> > >
> > > > OK, you can encourage certain content formats,
> > but
> > > you can't be too
> > > > prescriptive. We also have to live in the real
> > > world, where most of
> > > our
> > > > authors may be computer literate, but their idea
> > > of document
> > > production
> > > > is to use commercial tools (that often come
> > "free"
> > > with their PCs).
> > >
> > > You are right, the format required for a
> > > encyclopedia is probably
> > > complicated.  I have written some encyclopedic
> > > articles (about
> > > PostScript and PDF) and it took me time to do it
> > > right.  Somebody
> > > re-entered it in FrameMaker and did the editorial
> > > work to fit the
> > > quality of the encyclopedia.
> > >
> > > I don't know how to avoid that.  I could have done
> > > it, but you cannot
> > > expect a specialist of say botanics to follow the
> > > rules required by a
> > > good encyclopedia.  Somebody will have to
> > translate
> > > the MSWord document
> > > and structure it, filling the required field to
> > > index it.
> > >
> > > Accepting loosely structured documents will lower
> > > the quality of the
> > > encyclopedia and that does not seem to match the
> > > goal of the project.
> > >
> > > --
> > >   Jean-Daniel Fekete
> > >   Ecole des Mines de Nantes, 4 rue Alfred Kastler,
> > > La Chantrerie,
> > >   BP 20722, 44307 Nantes Cedex 03, France
> > >   Voice: +33-2-51-85-82-08  | Fax:
> > +33-2-51-85-82-49
> > >   address@hidden |
> > > http://www.emn.fr/fekete/
> > >
> > >
> > >
> > > _______________________________________________
> > > Bug-gnupedia mailing list
> > > address@hidden
> > > http://mail.gnu.org/mailman/listinfo/bug-gnupedia
> >
> >
> >
> ____________________________________________________________
> > Do You Yahoo!?
> > Get your free @yahoo.co.uk address at
> > http://mail.yahoo.co.uk
> > or your free @yahoo.ie address at
> > http://mail.yahoo.ie
> >
> > _______________________________________________
> > Bug-gnupedia mailing list
> > address@hidden
> > http://mail.gnu.org/mailman/listinfo/bug-gnupedia
>
>
> __________________________________________________
> Do You Yahoo!?
> Yahoo! Auctions - Buy the things you want at great prices.
> http://auctions.yahoo.com/
>
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]