From: Carsten Pfeiffer
Date: Fri, 19 Oct 2001 04:14:02 +0200

On Freitag, 19. Oktober 2001 02:28 David Squire wrote:

> I have that in the pipeline. At present it handles pdf, ps, doc, txt and
> html. I have also written a tool to heuristically correct text that is
> mangled by ps2txt or pdf2txt (i.e. hyphens, missing ligatures, etc.).
> It's the last week of semester here, so there is a real chance that I'll be
> able to integrate this stuff into GIFT soon.

Wow, excellent! Looking forward to trying that out. How'd you query for such 
things? I don't remember all of MRML out of my head -- was there some 
elements for querying for textual meta-data?

Is there [going to be] some way to not only perform queries, but also access 
all the data, so that one could try to visualize it in some tree or graph 

Carsten Pfeiffer

