arx-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Arx-users] Repo format take II


From: Walter Landry
Subject: [Arx-users] Repo format take II
Date: Tue, 13 Dec 2005 15:58:51 -0800 (PST)

Greetings,

After the last round of comments, I have a new draft of the repo
format.  It is mostly similar to the last one, with skip-deltas
reduced to a limited role.

The repo looks like

repo/
  keys
  README
  dirhash
  pending/
  0/
  256/
  512/
  branch.d/
    dirhash
    subbranch.d/
      dirhash
      0/
      256/
      512/

"keys" has the public keys for the repository, "README" has the repo
format and a note not to touch anything, and "dirhash" has a hash of
all of the subdirectories.  These dirhashes are updated anytime
something in its directory or subdirectories change (except
"pending").  "pending" is a place to put pending transactions.

There can also be "listing" files.  They are simply directory
listings, and are added with "arx update-listing".  If ArX sees a
"listing" file at the top, then it will update "listing" files in all
of the directories it touches.

Within each revision directory, we have several types of files.  When
naming files and directories, we use an abbreviated hash of 15 hex
characters (60 bits).  Everywhere else (in logs, etc.), we use the
full 256 bits of hash.

1) Cached revisions.  These are directories named by the short hash
   and sequence number N.  Inside the directories are the files
   rev.tgz, log, and log.sig.  rev.tgz is the tar'd gzip'd copy of the
   complete project tree.  log is the log message etc. and the hash of
   the revision.  log.sig is the gpg signature of the log message.

   If N!=0, then there is also a URL file containing the complete
   branch name of previous revisions.  If the branches are located in
   the same repo, then the repo location is omitted.

   repo/branch/0/<hash><N>/
   repo/branch/0/<hash><N>/rev.tgz
   repo/branch/0/<hash><N>/log
   repo/branch/0/<hash><N>/log.sig
   repo/branch/0/<hash><N>/URL

2) Ordinary patch revisions. These have the short hashes of the
   beginning <hash A> and finishing <hash B> revisions.  Inside the
   directories are the files patch.tgz, log, and log.sig.  patch.tgz
   is the tar'd, gzipped patch from <hash A> to <hash B>.  log
   contains the intial <hash A> and final hashes <hash B> and the hash
   of the file patch.tgz.  log.sig is again the gpg signature of log.

   repo/branch/0/<hash B><hash A>/
   repo/branch/0/<hash B><hash A>/rev.tgz
   repo/branch/0/<hash B><hash A>/log
   repo/branch/0/<hash B><hash A>/log.sig


3) Skip-delta.  These are one-way patch files (no logs) named by the
   short hashes of the beginning <hash A> and finishing <hash B>
   revisions and a number N indicating how many patches the
   skip-deltas encompass.  The patches encompass 32^N revisions (32,
   1024, 32768, etc.).

   repo/branch/0/<hash B><hash A><N>

4) Terminal revisions.  These are empty files named by the short
   hashes of the terminal revision, followed by the character T.

   repo/branch/0/<hash>T

5) Sequence revisions.  These are empty files named by the short hash
   of the revision, followed by a "N", and then a number N indicating
   the sequence number of that revision.

   repo/branch/0/<hash>N<N>

6) Tag revisions.  These are directories named like patch revisions,
   but inside there is log, log.sig, and URL.  The log file has all of
   the hashes and directory locations for tagged branches.  It does
   not have the hash of the revision.  The hash of the revision is
   actually the hash of the log file itself.  The URL file contains a
   serialized list of complete branch names for the tagged branches.
   If the branch is stored inside the same repo, then the repo part is
   ommitted.

==========

In order to make sure that I covered everything, I wrote up what
happens when when Don Quixote and his sidekick Sancho Panza work on a
project.

So to start, Quixote creates a repo

  arx make-repo --key "address@hidden" repo

this creates the directories and files

  repo/
  repo/keys
  repo/README
  repo/dirhash
  repe/pending/

Now he imports a project

  cd project_tree
  arx init ../repo,project
  arx commit -m "Initial import"

This creates directories and files

  repo/project.d/
  repo/project.d/dirhash
  repo/project.d/0/
  repo/project.d/0/dirhash
  repo/project.d/0/aaaaaaaaaaaaaaa0/
  repo/project.d/0/aaaaaaaaaaaaaaa0/rev.tgz
  repo/project.d/0/aaaaaaaaaaaaaaa0/log
  repo/project.d/0/aaaaaaaaaaaaaaa0/log.sig

In the project tree, it creates

  _arx/revision
  _arx/history
  _arx/cache/aaaaaaaaaaaaaaa/
  _arx/cache/aaaaaaaaaaaaaaa/...

Where _arx/revision has the complete location and branch name of the
project tree, in this case

  file:///home/quixote/repo,address@hidden

history has the complete graph of all patches ever applied to the
tree.  In this case, it is just the link from nothing to aaa...

He makes changes and commits another revision

  arx commit -m "second"

This creates

  repo/project.d/0/bbbbbbbbbbbbbbbaaaaaaaaaaaaaa/
  repo/project.d/0/bbbbbbbbbbbbbbbaaaaaaaaaaaaaa/patch.tgz
  repo/project.d/0/bbbbbbbbbbbbbbbaaaaaaaaaaaaaa/log
  repo/project.d/0/bbbbbbbbbbbbbbbaaaaaaaaaaaaaa/log.sig

He creates more revisions.  When he commits revision 32, that also
creates a skip-delta back to the first revision

  repo/project.d/0/eeeeeeeeeeeeeeeaaaaaaaaaaaaaaa1

When the number of patches from the beginning gets to 256, ArX creates
the directory

  repo/project.d/256

and starts putting revisions there.  He accidently commits something
with his password, so he deletes it with

  arx delete-revision @269

This deletes

  repo/project.d/256/eeeeeeeeeeeeeeeddddddddddddddd/

and any skip-deltas encompassing it.  This only works if it is at the
tip of revisions.  He works on some mildly experimental stuff, but it
does not work out.  So he terminates that microbranch

  arx terminate ../repo,address@hidden

which creates an empty file

  repo/project.d/256/abcabcabcabcabcT

and works from an earlier version

  cd ..
  rm -rf project_tree
  arx get repo,address@hidden
  <hack>
  arx commit -m "Better method"

Now when anyone runs "arx get" from his repo, they will only get the
new branch unless they explicitly ask for the terminated branch.  If
someone asks to get a project that has only terminated microbranches,
they must explicitly state what they want.

Don Quixote's trusty sidekick, Sancho Panza, decides to branch his
repo.  At first, he just mirrors the entire repo

  arx make-repo panza_repo
  arx propagate /home/quixote/repo panza_repo

which just copies everything over.  He periodically resyncs, and the
dirhash files mean that he only has to list directories that have
changed.  He hacks by getting revisions out of his own repo,
committing, and merging.

  arx get panza_repo,project project_tree
  cd project_tree
  <hack>
  commit -m "cheaper, faster, better"
  arx propagate /home/quixote/repo ../panza_repo
  arx merge

If he decided to merge a tree directly with Don Quixote's repo

  arx merge /home/quixote/repo,project

then that will copy all of the new patches into _arx/pending.  The
next time he commits, all of those patches will go into the repo, and
they will be signed by Sancho, not Don Quixote.  If it was a simple
update, it will reset the tree revision to the updated revision.

But Sancho runs out of space, and decides to only mirror what is in
repo,project.  Since this will be the only branch in his repo, he
makes it the default branch.

  arx propagate ../panza_repo,project ../panza_repo
  arx delete-branch ../panza_repo,project

This creates

  panza_repo/0
  panza_repo/0/dirhash
  panza_repo/0/aaaaaaaaaaaaaaa/
  panza_repo/0/aaaaaaaaaaaaaaa/rev.tgz
  panza_repo/0/aaaaaaaaaaaaaaa/log
  panza_repo/0/aaaaaaaaaaaaaaa/log.sig

and all of the other revisions.  Sancho then tells his project tree
about the branch movement

  arx relocate ../panza_repo

Sancho then continues hacking.  Since this "project" branch is the
default branch for panza_repo, there is no need for a branch name.
For example,

  arx get panza_repo

will get the latest version in that repo.

But Sancho still doesn't have enough space, so he decides to truncate
history.  He deletes his repo and does a fresh chechout from Don
Quixote's repo.

  cd ..
  rm -rf panza_repo
  arx make-repo panza_repo
  arx get /home/quixote/repo,project project_tree
  cd project_tree
  arx relocate ../panza_repo

Now he does some work

  <hack>
  commit -m "Non-delusional work"

The commit creates

  panza_repo/256/defdefdefdefdef289/
  panza_repo/256/defdefdefdefdef289/rev.tgz
  panza_repo/256/defdefdefdefdef289/log
  panza_repo/256/defdefdefdefdef289/log.sig
  panza_repo/256/defdefdefdefdef289/URL
  panza_repo/256/aaabbbcccdddeeedefdefdefdefdef/
  panza_repo/256/aaabbbcccdddeeedefdefdefdefdef/patch.tgz
  panza_repo/256/aaabbbcccdddeeedefdefdefdefdef/log
  panza_repo/256/aaabbbcccdddeeedefdefdefdefdef/log.sig

The URL file points to Don Quixote's repo, since that is where the
project tree originally came from.  He could also have it point to a
different place

  arx fix-url @256 http://the.internet,project

Note that Sancho will also be creating skip-deltas in his repo, but at
a different schedule from Don Quixote's.  Sancho will be creating a
level 1 skip-delta when the distance of a patch from def... is 32, a
level 2 skip-delta when the distance from def... is 1024, etc.

When Sancho merges with with Don Quixote, only those patches that
topologically come after def... will be put into _arx/pending.  It is
possible that some of these merges will involve patches that increase
the maximum distance to the root revision.  For example,

   aaa          0
    |
   bbb          1
    |
   ccc          2
  /   \
abc    def      3
 |      |
abd     |       4
 |      |
abe     |       5
   \   /
    aab         6

In this case, "abc", "abd", and "abe" are all topologically equal to
"def", so they would not be included in Sancho's repo.  However,
without those revisions, Sancho Panza would think that the revision
count for "aab" is 4.  So in these cases, Sancho's repo would contain
the empty file

  panza_repo/256/aabaabaabaabaabN6

The leading "N" tells ArX that this is not a tgz of a project tree.

Don Quixote notices the work that Sancho did, but only likes part of
it (The "cheaper, faster, better" patch, not the "Non-delusional work"
patch").  So he cherry-picks it

  arx dopatch /home/panza/address@hidden

The "ffe" part is required to disambiguate that patch from Don
Quixote's own patch #552 in Sancho's repo.  When Don Quixote commits,
that adds that patch to Don Quixote's repo if he doesn't already have
it.  If that revision was a merge, then Don Quixote would have to
completely specify the revision hash and at least partly specify the
previous revision hash.  Only that particular patch would go into Don
Quixote's repo.

Don Quixote tags a release

  arx tag ../repo,release ../repo,project

which creates a tagged revision

  repo/release.d/0/
  repo/release.d/0/log
  repo/release.d/0/log.sig
  repo/release.d/0/URL

Don Quixote decides to change a log

  arx fix-log -m "Initial" @1

which replaces the log and log.sig files, but does not change the
rev.tgz file.  Note that this is not usually propagated.  So be
careful with your mistakes!

He decides to use a different key to sign revisions.  He starts with

  arx sig -d --repo --key "address@hidden"

This removes that public key from repo/keys.

  arx sig -d ../repo,project

This removes all of the log.sig files under repo/project.d

  arx sig -a --repo --key "address@hidden"

This adds that public key to repo/keys

  arx sig -a --key "address@hidden" ../repo,project

and this adds new log.sig files.


Cheers,
Walter




reply via email to

[Prev in Thread] Current Thread [Next in Thread]