bug-recutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-recutils] Re: [help-recutils] references between %rec data


From: Jose E. Marchesi
Subject: [bug-recutils] Re: [help-recutils] references between %rec data
Date: Tue, 25 Jan 2011 21:51:45 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1.91 (gnu/linux)

    
Hi Zeus.

    for the sample from info recutils, can i cover in rec file
    situations when the same book was published by the same/different
    publisher in different years on different languages?

That entirely depends on what you want to model...

You see, the record sets that you store in recfiles can be viewed as
relations in a relational model.  In that way, a record set is like a
table. For example:

/
| %rec: Book
| 
| Title: GNU Emacs Manual
| Author: Richard M. Stallman
|
| Title: Mio Cid
| Author: Anonymous
\

The previous example can be seen as a table of books with two rows.
Each row has two columns with a title and an author.  Exactly like in a
relational database:

| Title            | Author              |
|------------------+---------------------|
| GNU Emacs Manual | Richard M. Stallman |
| Mio Cid          | Anonymous           |

But record sets are more flexible than tables, and both concepts are not
exactly the same, because:

- It is possible to omit a field in a record, and
- It is possible to denote 1-N relationships with a single record.

The first case is quite simple.  We can have books without authors just
omitting the Author field in the corresponding records:

/
| %rec: Book
| 
| Title: GNU Emacs Manual
| Author: Richard M. Stallman
|
| Title: Mio Cid
|
| Title: The Catcher in the Rye
| Author: J. D. Salinger
\

"Mio Cid" does not have any author.  One can "simulate" this feature
in relational databases by using the "null" value as the Author in the
book rows, but the column is still there:

| Title                  | Author              |
|------------------------+---------------------|
| GNU Emacs Manual       | Richard M. Stallman |
| Mio Cid                | <null>              |
| The Catcher in the Rye | J. D. Salinger      |

Things complicate when the records of a set have many different
fields... the corresponding relational table would be huge and full of
nulls.

The second case is quite interesting: the ability of denoting 1-N
relationships in a single record.  Consider the following conceptual
datamodel:

  +------+             +--------+
  |      | 1         N |        |
  | Book |-------------| Author |
  |      |  writtenby  |        |
  +------+             +--------+

It features two entity sets (or simply entities): books and authors.
There is also a relationship between them: books are written by zero or
more authors.  The usual way to implement this conceptual datamodel in
the relational domain would be with two tables, one for books and
another one for authors.  Then the books table is expanded to include a
foreign key to identify an author.  Since a book can be associated with
several authors, there can be several rows in the books table with the
same book title:

| Title            | Author              |
|------------------+---------------------|
| GNU Emacs Manual | Richard M. Stallman |
| Mio Cid          | <null>              |
| Dragon Book      | Aho                 |
| Dragon Book      | Sethi               |
| Dragon Book      | Ullman              |


An equivalent record set "Book" for the previous table would be:

/
| %rec: Book
|
| Title: GNU Emacs Manual
| Author: Richard M. Stallman
|
| Title: Mio Cid
|
| Title: Dragon Book
| Author: Aho
|
| Title: Dragon Book
| Author: Seti
|
| Title: Dragon Book
| Author: Ullman
\

But that record set is a bit confusing, isn't it?  For one thing, it is
supposed to be containing books, but the actual contents are something
slightly different: it is implementing both the conceptual entity "book"
and the relationship between books and authors.  It is neither easy to
get a clear picture about the stored data in a first sight.

Fortunately, it is possible to encode such relationships in a more
compact way just by specifying several authors per title:

/
| %rec: Book
| 
| Title: GNU Emacs Manual
| Author: Richard M. Stallman
|
| Title: Mio Cid
|
| Title: Compilers. Principles, Techniques and Tools.
| Author: Aho 
| Author: Sethi
| Author: Ullman
\

Now we can feel that the records stored in the record set are books,
that happen to have zero or more authors.  It is also quite readable.

In the previous example, having just the Book record set, we did not use
an explicit record set of authors because we were not interested them
more than the fact they write books.  Now let's suppose we want to store
additional information about them.  In particular, their nationality.
What we have to do is to introduce a new set:

/
| %rec: Author
| 
| Name: Richard M. Stallmank
| Country: USA
| 
| Name: Aho
| Country: Canada
| 
| Name: Sethi
| Country: India
|
| Name: Ullman
| Country: USA
\

And then change the Book records to use references to the Author's
names:

/
| %rec: Book
| 
| Title: GNU Emacs Manual
| Author: Richard M. Stallman
|
| Title: Mio Cid
|
| Title: Compilers. Principles, Techniques and Tools.
| Author:Name: Aho 
| Author:Name: Sethi
| Author:Name: Ullman
\

Now let's take a look to your model (at least! :D).  You want to
associate publishers with books, associating a publication date to the
_relationship_.  In an entity relationship diagram it would be something
like:

  +------+       +-----------+       +--------+
  |      | 1     |           |     N |        |
  | Book |-------| writtenby |-------| Author |
  |      |       |  (date)   |       |        |
  +------+       +-----------+       +--------+

In cases where the relationship has attributes it is not possible to use
the compact form in the records, because there may be ambiguities.  That
happens in your record:

    Title: GNU Emacs Manual
    Author: Richard M. Stallman
    PublID: 1
    PublDate: 1970-01-01
    PublID: 2
    PublDate: 2011-01-25
    Location: home

Which PublDate corresponds to which PublID?  The relationship could be
determined by the order of the fields, but it would be too easy to screw
it due to an accidental change.  In this case it would much better to
introduce a new record type Edition:

/
| %rec: Edition
| 
| Publisher:Name: FSF
| Book:Title: GNU Emacs Manual
| Date: 1970-01-01
| 
| Publisher:Name: FSF
| Book:Title: GNU Emacs Manual
| Date: 2011-01-25
\

But wait! There is a 1-N relationship between Editions and "dates".
Thus:

/
| %rec: Edition
| 
| Publisher:Name: FSF
| Book:Title: GNU Emacs Manual
| Date: 1970-01-01
| Date: 2011-01-25
|
| Publisher:Name: Freedom Books
| Book:Title: GNU Emacs Manual
| Date: 2009-12-10
\

Note that I added an additional fictional publisher that made yet
another edition of the Emacs manual.

At this time the only recutil that is aware of the references between
records is recfix: it checks whether the value of a reference conforms
to its declared type in the referenced record set.

I am thinking on the best way to handle references in recsel,
effectively implementing "joins".  Something like:

/
| $ recsel -t Book -e "Title = 'Dragon Book'" -p 
Title,Author:Name,Author:Country
|
| Title: Dragon Book
| Author:Name: Aho
| Author:Country: Canada
| 
| Title: Dragon Book
| Author:Name: Sethi
| Author:Country: India
| 
| Title: Dragon Book
| Author:Name: Ullman
| Author:Country: US
\

Ideas are highly welcome.  I CCed to bug-recutils for that purpose :D

-- 
Jose E. Marchesi    address@hidden
GNU Project         http://www.gnu.org




reply via email to

[Prev in Thread] Current Thread [Next in Thread]