[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Anarchdb-devel] Database population
From: |
Francois Gombault |
Subject: |
Re: [Anarchdb-devel] Database population |
Date: |
Mon, 12 May 2003 22:44:12 +0200 |
User-agent: |
KMail/1.5 |
[redirecting the mail to anarchdb-devel]
Andrew P Vanpernis wrote:
> I placed two text files into the the cards directory of ARDB's CVS
> repository.
> One is the the complete (minus Anarchs) card list from the White Wolf
> webpage. Unfortunately this file lacks any sort of rarity information.
> The second is a complete (minus Anarchs) list of rarities that can be
> found at http://www.thelasombra.com/cardlists.htm.
Fine. This is the best material we can find, so we'll see what we can do from
this.
> The problem seems to be combining the information in these files and
> organizing them into a CSV file. I was just trying to think of some
> ideas for how to do this, and what the output should look like. Do we
> want multiple listings for a card within a given set. For example, with
> the card Academic Hunting Ground do we want?
>
> Academic Hunting Ground, Jyhad, Uncommon, ...
> Academic Hunting Ground, VtES, Uncommon, ...
> Academic Hunting Ground, Camarilla, Preconstructed Tremere, ...
> Academic Hunting Ground, Camarilla, Uncommon, ...
>
> Or this?
>
> Academic Hunting Ground, Jyhad, Uncommon, ...
> Academic Hunting Ground, VtES, Uncommon, ...
> Academic Hunting Ground, Camarilla, Preconstructed Tremere | Uncommon,
> ...
I think I prefer the second one, as we can handle multiple rarities pretty
well with SQL queries. I see no need for duplicating the entries.
> Another question is should all of the fields be spelled out (like I did
> above), or should we come up with a set of abbreviations, similar to
> those used in the text files?
I'd vote against abbrev., because
1) right now, they're already a pain to learn for newbies
2) we might some day have so many of them that they'll become unmanageable
(from an SQL substring search point of view too).
Now, for the parsing machine:
We have a strong constraint. It is that building the populating the database
now assign indexes to card names etc. These indexes will be used to build
decks, export inventory, etc.
Rebuilding the database later on, after the release of a new set, for example,
should keep the _same_ indexes for older cards, and assign new ones to new
cards, thus ensuring compatibility.
So, here's the algorithm I imagined, let me know what you think about it:
1. Have a "sets.history" file, indicating the order of publication of the
sets. It will look like:
set: Jyhad J
set: VTES VTES
promo: Marianna Gilbert
promo: Dan Murdock
set: Dark Sovereign DS
...
Or something equivalent.
2. For each line in "sets.history", parse the card list and the rarity list,
and build two CSV files, like Jyhad_crypt.csv and Jyhad_library.csv. In these
files, cards are ordered by name, A->Z.
Unless it's a promo entry, in this case we don't generate files, we just store
it's data.
Feed the CSV files in the database, or insert the promo card data.
3. Loop for the next set/promo.
This way, as long as entries in "sets.history" stay in the same order, and as
long as cards don't get drastically renamed, we should end up with a
compatible database.
Comments? Suggestions?
--
Francois
I WILL NOT SNAP BRAS
Bart Simpson on chalkboard in episode 8F22