koha-zebra
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Koha-zebra] Zebra and non-filing characters


From: Sebastian Hammer
Subject: Re: [Koha-zebra] Zebra and non-filing characters
Date: Thu, 29 Dec 2005 11:05:24 -0500
User-agent: Mozilla Thunderbird 1.0.7 (Macintosh/20050923)

Paul POULAIN wrote:

Sebastian Hammer a écrit :

Joshua Ferraro wrote:

Hello everyone,

This is just generic question regarding Zebra's handling of
MARC non-filing characters. I know there is a 'stopwords'-like
function available using the 'map' directive:

map (^The\s) @

but I'm wondering whether Zebra is also capable of examining the
non-filing character specs within each MARC field to decide
whether to index or not to index ...

You mean using an indicator in the field to determine how many characters to skip? To the best of my knowledge, this is not supported at present, sorry.


Would really be a nice feature, at least for MARC-lover catalogers (that still exists !)

What I don't like about that approach anyway is that it leaves it ambiguous what happens when the user put a leading article into a search term... I think yu'd be better off just configuring the system to ignore the most common leading articles as described above.


pro : will work even if the cataloger forget to set the indicator & makes them more and more useless. con : MARC-lover catalogers will hate such a behaviour, because there are few exceptions. I think i can assume the noise french catalogers will make ;-)

But I think the issue with searching is pretty serious, though.. I've
been noticing lately a few Z39.50 servers that will return zero hits for
a full-field search if the user forgets (or doesn't know) to remove any
leading article himself. Now even for a MARC-fetishist, I think that is
just plain wrong. If you are going to eliminate leading articles from
searches, the least you can do is make it optional..

One way to do that with the dumb MARC21 character-skipping scheme would
be to generate two indexing entries for phrase indexes -- with and
without the offending leading article. That would fix searching, but it
would be a problem for sorting unless we were careful.

Browsing can also be a challenge.

My vote would be to start with the prefix-ignoring list, which in my
experience is enough to satisfy 99.9% of librarians, most of whom have
no clue about that feature of MARC21 anyway. Leave the other stuff as a
nice-to-have to be addressed at leisure at some point when we're
re-examining that part of the indexing logic anyway.

--Sebastan

It is true that this would require separate configuration for different languages, but you probably wouldn't get around that anyway, since many non-English-speaking countries use other record formats than MARC21, and the use of indicators to control indexing is not universal.. the Danish MARC (cleverly named DANMARC) format, for instance, use a special character inside of the subfields to mark the part which should not be indexed.

In what is already developped in Koha 3.0, we will clearly have UNIMARC-french, MARC21-english, and probably other MARC-language flavours. So I agree with you.

Happy new year to everyone, with lot of free software & happiness !


--
Sebastian Hammer, Index Data
address@hidden   www.indexdata.com
Ph: (603) 209-6853










reply via email to

[Prev in Thread] Current Thread [Next in Thread]