[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-gnupedia] Software for managing textual collections
From: |
<address@hidden> |
Subject: |
[Bug-gnupedia] Software for managing textual collections |
Date: |
Thu, 18 Jan 2001 09:22:47 +1300 |
Hello
[Bias warning: I helped develop the GNU-licensed software I'm about to
evangelize.]
There is a software system called Greenstone (http://sourceforge.net/projects/g
reenstone/) from the New Zealand Digital Library (www.nzdl.org) which would
appear to be ideal for managing the text of this kind of project. Written in
perl (collection building and maintance functionality and C++ (runtime web
serving) it's fully GPL and runs on UNIX (the main development platform) and
Windows.
Greenstone includes the following features:
*) functionality for mirroring documents from the web
*) functionality for full text indexing
*) document classification using the dublin core metadata standard
(http://purl.oclc.org/dc/) the most widespread metadata standard in the
digital library world
*) automatic language identification
*) automatic detection, extraction and markup of acronyms/dates/etc
*) Unicode throughout
*) An interface translated into multiple languages (Arabic, Chinese, Dutch,
English, French, Maori and Spanish)
*) Both a web interface and a Corba interface
*) Work on a Z39.50 interface is underway
Greenstone imports the following document formats:
*) Text (ASCII, UTF8)
*) HTML
*) PS/PDF
*) Email
*) JPEG, GIF, TIFF ... (support for images as documents is limited, support
for images embedded within textual documents is very complete)
*) The above in the form of archives (zip, tar) and/or compressed documents
(zip, gzip, bzip).
There is a great collection of sample collections at www.nzdl.org
-- stuart yeates <address@hidden> aka `loam'
"Oh, havoc," cried Pooh, as he let slip the heffalumps of war.
X-no-archive:yes
- [Bug-gnupedia] Software for managing textual collections,
<address@hidden> <=