bibulus-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bibulus-dev] Formatting bibliography entries


From: Thomas Widmann
Subject: [Bibulus-dev] Formatting bibliography entries
Date: Thu, 06 May 2004 20:14:59 +0100
User-agent: Gnus/5.1006 (Gnus v5.10.6) Emacs/21.3 (gnu/linux)

Hello, everybody,

although what follows is a bit long, I hope you will all read it and
comment on it.  It concerns one of the areas most difficult to get
right.

BibTeX styles tend to be organised along the following lines: There is
a function for each entry type, and each function contains a sequence
for formatting instructions.  An example from plain.bst in a syntax
that should make it easier to understand:

FUNCTION {article}
{
  bibitem
  authors
  new_block
  title
  new_block
  if missing(crossref)
    { 
      emphasize(journal)
      vol_num_pages
      date_as_year
    }
  else
    {
      article_crossref
      pages
    }
  new_block
  note
  fin_entry
}

I see two main problems with this approach:

1) Many entry types are very similar, but that is not expressed at all
   (except for certain functions like vol_num_pages that are used by
   more than one entry-formatting function).

2) In the example above, there is one *if* clause.  However, if all
   the bells and whistles of Custom-bib are implemented, the basic
   structure is totally obscured by nested conditionals.

All of this means it becomes very difficult to implement and maintain.
I learnt this the hard way half a year ago when I tried to make one
massive formatting function.  Basically I ended up with nearly a
thousand lines of stuff like this:

    if ($self->{STYLE}{datepos} ne 'afterauthor'
        and $self->{STYLE}{datepos} ne 'afternotes'
        and $self->{STYLE}{datepos} ne 'endbutjournal'
        and $self->{STYLE}{yearafternumber} ne 'space'
        and $self->{STYLE}{yearafternumber} ne 'comma'
        and $self->{STYLE}{yearaftervolume} ne 'spaceparentheses'
        and $self->{STYLE}{yearaftervolume} ne 'parentheses') {
      $self->formatdateasyear;
    }

I'm therefore increasingly of the opinion that we have to come up with
a new formatting model.  What do you all think about the following?

Basically, every element that is output has three associated
functions: location, punctuation and formatting.  That is, to deal
with the <url> element, three functions would be involved:

sub l_url { ... }

  The location function should determine where within the reference
  this element should be placed (if output at all).  I'm in doubt how
  this should work -- I presume the end result should be a list, e.g.,
  ['author', 'year', 'title', 'journal', 'pages', 'note'], but I'm not
  at all sure what the best way to arrive at this would be.

  I can think of at least two approaches:

  - Each l_ function could return relative elements, such as 'before
    author', 'after journal' or 'at end', and we would then have to
    write a function to make sense of this.  However, this makes it
    very important to call the l_ functions in the right order (to
    take a simple case, just imagine if they all want to go 'at
    end').

  - Each l_ function could return a number, and afterwards we would
    just sort on this field.  That is, l_author might return 10,
    l_title 500, and l_year 250 or 850 depending on the formatting
    style.  In this case, we get problems if two l_ function return
    the same number, but that need never happen.  I tend to favour
    this approach, not least because it makes it very easy to add
    additional fields.

sub p_url { ... }

  This function should basically return the desired punctuation before
  and after this element (by punctuation I mean 'new block', 'new
  sentence', 'comma' and 'space' and such things).  There would be a
  hierarchy, so that if the element to the left asks for a new block
  while the one to the right just wants a comma, the new block would
  win.

  I'm not sure whether these should be separate functions, but it
  doesn't really seem to fit in well with the location functions.

sub f_url { ... }

  The formatting function would be normally be defined by the output
  modules.  For instance, Bibulus::LaTeX would probably output the
  contents wrapped in \url{...} or perhaps \texttt{...}, while
  Bibulus::HTML would wrap it in an <a href="..."></a>.

To sum up, there would be three passes when formatting each reference:

1) The l_ functions would be called in random order, followed by a
   sort.  The result would be a list of elements.

2) The p_ functions would be called on each element of the list,
   inserting punctuation elements into it.

3) The f_ functions would be called in turn on each element, actually
   outputting things.

Taken together, this would mean that if a user wants to output the url
before everything else, they would just write the following (if using
the numerical location option described above):

sub l_url {
   return 1;
}

Furthermore, it makes it very easy to extend the system.  Let's say
that a user wants an extra field containing the library in which the
item is found.  All they would have to do would be to extend the DTD
with <library>, add the relevant data to their XML files, and then
provide the functions l_library, p_library and f_library.

The only major problem I can think of is that it might be difficult to
group and split things -- that is, if two fields should be output
together, or if a field should be output twice.  I'm not sure whether
this would be a huge problem, though.  Hmmm, thinking about it again,
this might not even be a problem since the grouping and splitting can
be done by manipulating the XML tree before calling the formatting
functions.

I'm looking forward to hearing from you!

/Thomas
-- 
Thomas Widmann          Bye-bye to BibTeX: join the Bibulus project now!
address@hidden                        <http://www.nongnu.org/bibulus/>
Glasgow, Scotland, EU     <http://savannah.nongnu.org/projects/bibulus/>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]