bug-gne
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-gnupedia] A Voice From the Past


From: John Goodwin
Subject: [Bug-gnupedia] A Voice From the Past
Date: Sat, 20 Jan 2001 21:48:22 -0800

Well, I thought I'd chip in my two cents' worth
since I tried this sort of thing a while back. If
you don't know what I'm referring to, check the
GNU Bulletin from 1992 or 1993 for something
called the FreeLore Project (it wasn't part of
GNU). I've already failed at what you're trying to
do. (I'm not sure that's a credential :) ).

There is a widespread intuition that (1) there
needs to be a movement that makes texts free the
same way free software is free and (2) the
internet is the correct medium for that
development.

Point 1: This project you are proposing is every
bit as big as the whole GNU project--it's the GNU
project's second half, only for text and not
software. I'm not kidding. You can limit its scope
to something smaller, but you would be fencing
yourself in, not keeping the world out.

There are really two projects here, disguised as
one. The first is an encyclopedia. Something I can
put on a CD and know I have all of it. The other
is an amorphous, dynamic virtual library that is
available as soon as I connect to the internet.  I
will get away from this dualism a bit later, but not
right now.

The first thing we need to consider is why what we
have is not good enough. The internet already is a
virtual library with lots of links. How are you
going to improve on it?  There are already ways to
aggregate links like google, dmoz.org and slashdot
RSS channels.

I think the key is the "encyclopedia" notion.
There is a need to gather a lot of information in
one place.  But that presupposes selection and
selection is like censorship.

Ergo: you should abandon the goal of creating one
encyclopedia [because you will fight over
censorship, editing, formatting...] and try to
create extensible "modules" that will
interoperate. In other words, you build the GNU
Text System but don't specify every program it can
run!

To build my copy of the encyclopedia and burn a CD
I should be able to fetch the tar.gz, and say
"./configure; make; make install" and have my
new encyclopedia module in the usual place
(/usr/doc/encyclopedia?).  I should be able to
import any of a gazillion formats and read it with
any of a gazillion interfaces.

As you can see, I am saying we should think of the
Encyclopedia as the "GNU system" and the Virtual
Library as "free software that might run on the
GNU System if the user installs it".  The latter
is very broad and extensible; the former is a core
enabling technology.

Emulate the success of the GNU project. Everyone
has their own interpretation, but here's mine: (1)
early distribution of infrastructure tools like
emacs and gcc, (2) a clear, easily explained goal
[build a free software clone of Unix], (3) clear
coding guidelines that could be copied, with
supporting tools--texinfo, autoconf.

There was both a broader concept (free software,
internet is a virtual library), and a narrow
concept (The GNU System, an encyclopedia).  You
need *both*.  In particular, you need to figure
out how to enable the former by building the
latter.

I think one of the keys to building hyperlinked
texts that don't break is the notion of a
persistent URL (see http://www.purl.org if you
haven't heard of it).  However, there is serious
trouble ahead if Free Software doesn't do
something slightly different (I'll explain what and
why in a second).

The basic idea behind a purl is that you do the
obvious--you add a level of indirection so you can
fix up the links transparently. The first part of
the URL is the name resolver (http://purl.org/)
and the rest is the namespace managed by the
resolver. These PURLS are aliased to subtrees of
the current web site.

Good points: (1) you can use real URLs and
existing software will resolve them; (2) you get
persistence.  Very simple, very nice.

Now for the bad news. (1) purl.org (OCLC.org)
becomes a single point of failure. One
organization (or a small number) owns the indexing
for the entire web. Everything that is good and
bad about DNS is good and bad about this. (2) You
can hijack whole realms of knowledge if you
subvert the index! Imagine DNS spoofing the whole
encyclopedia. It's a 1984-style mutable past all over
again. (Hence the authentication [proposal #6] below).

I would be especially leary of OCLC. They may be
really great guys now and have That True Free
Software Religion[tm], but in the past they were
terribly hung up on the we-own-the-database-
all-universities-use-to-catalog-their-libraries
trip. No public access except for holy-priest
librarians; university pays to put data in;
university pays to take it out.

No way can we trust these guys to own all the
persistent links in the web. [Much to their credit
the PURL concept imagines multiple registrars, but
a few may not be enough]. We need free software
and a trusted (a.k.a. decentralized) free-the-data
infrastructure here.

So, here are my practical suggestions for this
project (or a related one):

1. Start a "GnURL" project for persistent URLs
(make gnurl.gnu.org a PURL name resolver). GnURLs
will work with both HTML and XML existing software
today. There's your "quick start" factor.

[I just reserved the http://purl.org/NET/GNURL/ subdomain,
if you want it.]

2. Call any tarred and gzipped HTML tree with local
references and GnURLs *only* a "GNU encyclopedia
extension module". Allow users to manage those
namespaces and hence their links.

3. Provide both moderated and unmoderated domains
within the GNU-administered namespace. Make the
"GNU encyclopedia" (the moderated, core subjects)
just one of many subdomains in the GNU namespace.
First among equals, like the GNU System.

4. Provide an "loopback GnURL resolver", somewhat
like "localhost" or "file://" for resolving
internal references using a lookup table of
transient URLs.  (A.K.A. implement footnotes and a
bibliography).

5. (the really hard part) Make it easy for anyone
to set up a similar name resolver and invent some
sort of peer-to-peer sharing so you don't loose a
whole extension module if you loose its (single)
name resolver, can discover resolver you don't
know about, etc.

6. (optional) Add some sort of authentication
technology so I can find out what the checksum of
my tar.gz should be.  I can check whether the past
really is the past, i.e. verify persistence.

Let's see how it stacks up to the success
criteria:

(1) Where's the clear, easily explained concept?
Any "text+code that uses internal links and GnURLs
only" qualifies as an extension module and will
work with the system.  I think that was short and
clear.

(2) What is the GNU encyclopedia? Mostly a
distributed, managed namespace of pointers, plus
some core GNU-generated content. The namespace can
be extensible, managed in a decentralized fashion,
and house both moderated and unmoderated content.
GNU can run as many moderated (and unmoderated)
core projects as fit its philosophy and resources.

(3) Where is the infrastructure? Focus on the part
we don't have today: a trusted/distributed,
persistent, verifiable URL mechanism that works
with all existing HTML/XML-centric software. Use the
www.purl.org server to bootstrap the project, but
make the name resolver an M4 macro in the
configuration script. :) Provide clear guidelines
for future migration so authors can make their content
compatible with the future software.

I believe the main advantage of this proposal is
that we can explain to content providers what to
do today (make a website with any content and
software you want provided--

  (a) we can tar it up and install it in the usual GNU
fashion; and

  (b) you use internal linking technology and GnURLs
only.)

* * *

Every free software project is lots of projects
[authors] all travelling adjacent world lines. :)

=googol=
address@hidden





reply via email to

[Prev in Thread] Current Thread [Next in Thread]