bug-gne
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-gnupedia] Software for managing textual collections


From: <address@hidden>
Subject: [Bug-gnupedia] Software for managing textual collections
Date: Thu, 18 Jan 2001 09:22:47 +1300

Hello

[Bias warning: I helped develop the GNU-licensed software I'm about to 
evangelize.]

There is a software system called Greenstone (http://sourceforge.net/projects/g
reenstone/) from the New Zealand Digital Library (www.nzdl.org) which would 
appear to be ideal for managing the text of this kind of project. Written in 
perl (collection building and maintance functionality and C++ (runtime web 
serving) it's fully GPL and runs on UNIX (the main development platform) and 
Windows.

Greenstone includes the following features:
*) functionality for mirroring documents from the web 
*) functionality for full text indexing 
*) document classification using the dublin core metadata standard 
(http://purl.oclc.org/dc/) the most widespread metadata standard in the 
digital library world
*) automatic language identification
*) automatic detection, extraction and markup of acronyms/dates/etc
*) Unicode throughout
*) An interface translated into multiple languages (Arabic, Chinese, Dutch, 
English, French, Maori and Spanish)
*) Both a web interface and a Corba interface
*) Work on a Z39.50 interface is underway 

Greenstone imports the following document formats:
*) Text (ASCII, UTF8)
*) HTML
*) PS/PDF
*) Email
*) JPEG, GIF, TIFF ... (support for images as documents is limited, support 
for images embedded within textual documents is very complete)
*) The above in the form of archives (zip, tar) and/or compressed documents 
(zip, gzip, bzip).

There is a great collection of sample collections at www.nzdl.org











--    stuart yeates <address@hidden> aka `loam'
"Oh, havoc," cried Pooh, as he let slip the heffalumps of war.
X-no-archive:yes




reply via email to

[Prev in Thread] Current Thread [Next in Thread]