bug-gne
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnupedia] Architecture Questions


From: Mike Warren
Subject: Re: [Bug-gnupedia] Architecture Questions
Date: 21 Jan 2001 00:18:00 -0700
User-agent: Gnus/5.0807 (Gnus v5.8.7) XEmacs/21.1 (20 Minutes to Nikko)

Bryce Harrington <address@hidden> writes:

> I suppose if I had to make a guess at what the architecture would
> end up being, it would store the articles themselves as text XML
> files (DocBook, perhaps), [..]

TEI has been suggested as well. I think -- especially early-on -- that
a simple approach such as this is best, since likey the DTD/Schema
will change a fair amount as missing features, etcetera are added or
deleted.  Plus, making a tar.gz of a bunch of XML files is easy and
human-readable; changing the schema of a database and packaging it is
hard.

> When the XML files are added to the repository, converters would be
> run to produce articles of other formats (text, word doc, pdf, ps,
> tex, etc.)  When the user requests an article, he or she would also
> specify the desired format.

The ease server load, this could even be part of a client or a proxy
interface to the database (i.e. a proxy grabs the XML and converts it
to whatever the client wants).

> When producing mirrors of the repository, only the XML files,
> commentary, makefiles, and conversion scripts and templates would
> need to be transferred.

Really, only the XML files and an index. You could make URLs like:

  http://www.server.org/gnupedia/the-unique-id-of-the-article.xml

and all that changes for a mirror is the server name. 

> I'm sure others even now have some ideas on how to handle
> distribution, but for the near term something rsync-ish would
> probably be sufficient.

I think the suggestion of a separate classification (index, view,
etc.) is the best. These could each be on different servers (or even
mostly-independent projects) and just grab the appropriate XML article
files from a mirror.

The mirrors might keep a list of recently-added files and a master
index of all the files they store to make life easy for things
referencing them, but this would be an extremely simple thing to do
and would allow concentration on parsers to get content into the
appropriate format easily, and on the actual DTD/schema of the content
itself. After these are solidified, more work might be done on making
the repositories ``nicer'', but I really don't see the advantage of
using databases at all; at some point, the clients are probably all
going to want the raw XML, so getting it through a database seems like
a waste of processing power on the server-side.

-- 
address@hidden
<URL:http://www.mike-warren.com>
GPG: 0x579911BD :: 87F2 4D98 BDB0 0E90 EE2A  0CF9 1087 0884 5799 11BD



reply via email to

[Prev in Thread] Current Thread [Next in Thread]