[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Arx-users] Repo format take II
From: |
Walter Landry |
Subject: |
[Arx-users] Repo format take II |
Date: |
Tue, 13 Dec 2005 15:58:51 -0800 (PST) |
Greetings,
After the last round of comments, I have a new draft of the repo
format. It is mostly similar to the last one, with skip-deltas
reduced to a limited role.
The repo looks like
repo/
keys
README
dirhash
pending/
0/
256/
512/
branch.d/
dirhash
subbranch.d/
dirhash
0/
256/
512/
"keys" has the public keys for the repository, "README" has the repo
format and a note not to touch anything, and "dirhash" has a hash of
all of the subdirectories. These dirhashes are updated anytime
something in its directory or subdirectories change (except
"pending"). "pending" is a place to put pending transactions.
There can also be "listing" files. They are simply directory
listings, and are added with "arx update-listing". If ArX sees a
"listing" file at the top, then it will update "listing" files in all
of the directories it touches.
Within each revision directory, we have several types of files. When
naming files and directories, we use an abbreviated hash of 15 hex
characters (60 bits). Everywhere else (in logs, etc.), we use the
full 256 bits of hash.
1) Cached revisions. These are directories named by the short hash
and sequence number N. Inside the directories are the files
rev.tgz, log, and log.sig. rev.tgz is the tar'd gzip'd copy of the
complete project tree. log is the log message etc. and the hash of
the revision. log.sig is the gpg signature of the log message.
If N!=0, then there is also a URL file containing the complete
branch name of previous revisions. If the branches are located in
the same repo, then the repo location is omitted.
repo/branch/0/<hash><N>/
repo/branch/0/<hash><N>/rev.tgz
repo/branch/0/<hash><N>/log
repo/branch/0/<hash><N>/log.sig
repo/branch/0/<hash><N>/URL
2) Ordinary patch revisions. These have the short hashes of the
beginning <hash A> and finishing <hash B> revisions. Inside the
directories are the files patch.tgz, log, and log.sig. patch.tgz
is the tar'd, gzipped patch from <hash A> to <hash B>. log
contains the intial <hash A> and final hashes <hash B> and the hash
of the file patch.tgz. log.sig is again the gpg signature of log.
repo/branch/0/<hash B><hash A>/
repo/branch/0/<hash B><hash A>/rev.tgz
repo/branch/0/<hash B><hash A>/log
repo/branch/0/<hash B><hash A>/log.sig
3) Skip-delta. These are one-way patch files (no logs) named by the
short hashes of the beginning <hash A> and finishing <hash B>
revisions and a number N indicating how many patches the
skip-deltas encompass. The patches encompass 32^N revisions (32,
1024, 32768, etc.).
repo/branch/0/<hash B><hash A><N>
4) Terminal revisions. These are empty files named by the short
hashes of the terminal revision, followed by the character T.
repo/branch/0/<hash>T
5) Sequence revisions. These are empty files named by the short hash
of the revision, followed by a "N", and then a number N indicating
the sequence number of that revision.
repo/branch/0/<hash>N<N>
6) Tag revisions. These are directories named like patch revisions,
but inside there is log, log.sig, and URL. The log file has all of
the hashes and directory locations for tagged branches. It does
not have the hash of the revision. The hash of the revision is
actually the hash of the log file itself. The URL file contains a
serialized list of complete branch names for the tagged branches.
If the branch is stored inside the same repo, then the repo part is
ommitted.
==========
In order to make sure that I covered everything, I wrote up what
happens when when Don Quixote and his sidekick Sancho Panza work on a
project.
So to start, Quixote creates a repo
arx make-repo --key "address@hidden" repo
this creates the directories and files
repo/
repo/keys
repo/README
repo/dirhash
repe/pending/
Now he imports a project
cd project_tree
arx init ../repo,project
arx commit -m "Initial import"
This creates directories and files
repo/project.d/
repo/project.d/dirhash
repo/project.d/0/
repo/project.d/0/dirhash
repo/project.d/0/aaaaaaaaaaaaaaa0/
repo/project.d/0/aaaaaaaaaaaaaaa0/rev.tgz
repo/project.d/0/aaaaaaaaaaaaaaa0/log
repo/project.d/0/aaaaaaaaaaaaaaa0/log.sig
In the project tree, it creates
_arx/revision
_arx/history
_arx/cache/aaaaaaaaaaaaaaa/
_arx/cache/aaaaaaaaaaaaaaa/...
Where _arx/revision has the complete location and branch name of the
project tree, in this case
file:///home/quixote/repo,address@hidden
history has the complete graph of all patches ever applied to the
tree. In this case, it is just the link from nothing to aaa...
He makes changes and commits another revision
arx commit -m "second"
This creates
repo/project.d/0/bbbbbbbbbbbbbbbaaaaaaaaaaaaaa/
repo/project.d/0/bbbbbbbbbbbbbbbaaaaaaaaaaaaaa/patch.tgz
repo/project.d/0/bbbbbbbbbbbbbbbaaaaaaaaaaaaaa/log
repo/project.d/0/bbbbbbbbbbbbbbbaaaaaaaaaaaaaa/log.sig
He creates more revisions. When he commits revision 32, that also
creates a skip-delta back to the first revision
repo/project.d/0/eeeeeeeeeeeeeeeaaaaaaaaaaaaaaa1
When the number of patches from the beginning gets to 256, ArX creates
the directory
repo/project.d/256
and starts putting revisions there. He accidently commits something
with his password, so he deletes it with
arx delete-revision @269
This deletes
repo/project.d/256/eeeeeeeeeeeeeeeddddddddddddddd/
and any skip-deltas encompassing it. This only works if it is at the
tip of revisions. He works on some mildly experimental stuff, but it
does not work out. So he terminates that microbranch
arx terminate ../repo,address@hidden
which creates an empty file
repo/project.d/256/abcabcabcabcabcT
and works from an earlier version
cd ..
rm -rf project_tree
arx get repo,address@hidden
<hack>
arx commit -m "Better method"
Now when anyone runs "arx get" from his repo, they will only get the
new branch unless they explicitly ask for the terminated branch. If
someone asks to get a project that has only terminated microbranches,
they must explicitly state what they want.
Don Quixote's trusty sidekick, Sancho Panza, decides to branch his
repo. At first, he just mirrors the entire repo
arx make-repo panza_repo
arx propagate /home/quixote/repo panza_repo
which just copies everything over. He periodically resyncs, and the
dirhash files mean that he only has to list directories that have
changed. He hacks by getting revisions out of his own repo,
committing, and merging.
arx get panza_repo,project project_tree
cd project_tree
<hack>
commit -m "cheaper, faster, better"
arx propagate /home/quixote/repo ../panza_repo
arx merge
If he decided to merge a tree directly with Don Quixote's repo
arx merge /home/quixote/repo,project
then that will copy all of the new patches into _arx/pending. The
next time he commits, all of those patches will go into the repo, and
they will be signed by Sancho, not Don Quixote. If it was a simple
update, it will reset the tree revision to the updated revision.
But Sancho runs out of space, and decides to only mirror what is in
repo,project. Since this will be the only branch in his repo, he
makes it the default branch.
arx propagate ../panza_repo,project ../panza_repo
arx delete-branch ../panza_repo,project
This creates
panza_repo/0
panza_repo/0/dirhash
panza_repo/0/aaaaaaaaaaaaaaa/
panza_repo/0/aaaaaaaaaaaaaaa/rev.tgz
panza_repo/0/aaaaaaaaaaaaaaa/log
panza_repo/0/aaaaaaaaaaaaaaa/log.sig
and all of the other revisions. Sancho then tells his project tree
about the branch movement
arx relocate ../panza_repo
Sancho then continues hacking. Since this "project" branch is the
default branch for panza_repo, there is no need for a branch name.
For example,
arx get panza_repo
will get the latest version in that repo.
But Sancho still doesn't have enough space, so he decides to truncate
history. He deletes his repo and does a fresh chechout from Don
Quixote's repo.
cd ..
rm -rf panza_repo
arx make-repo panza_repo
arx get /home/quixote/repo,project project_tree
cd project_tree
arx relocate ../panza_repo
Now he does some work
<hack>
commit -m "Non-delusional work"
The commit creates
panza_repo/256/defdefdefdefdef289/
panza_repo/256/defdefdefdefdef289/rev.tgz
panza_repo/256/defdefdefdefdef289/log
panza_repo/256/defdefdefdefdef289/log.sig
panza_repo/256/defdefdefdefdef289/URL
panza_repo/256/aaabbbcccdddeeedefdefdefdefdef/
panza_repo/256/aaabbbcccdddeeedefdefdefdefdef/patch.tgz
panza_repo/256/aaabbbcccdddeeedefdefdefdefdef/log
panza_repo/256/aaabbbcccdddeeedefdefdefdefdef/log.sig
The URL file points to Don Quixote's repo, since that is where the
project tree originally came from. He could also have it point to a
different place
arx fix-url @256 http://the.internet,project
Note that Sancho will also be creating skip-deltas in his repo, but at
a different schedule from Don Quixote's. Sancho will be creating a
level 1 skip-delta when the distance of a patch from def... is 32, a
level 2 skip-delta when the distance from def... is 1024, etc.
When Sancho merges with with Don Quixote, only those patches that
topologically come after def... will be put into _arx/pending. It is
possible that some of these merges will involve patches that increase
the maximum distance to the root revision. For example,
aaa 0
|
bbb 1
|
ccc 2
/ \
abc def 3
| |
abd | 4
| |
abe | 5
\ /
aab 6
In this case, "abc", "abd", and "abe" are all topologically equal to
"def", so they would not be included in Sancho's repo. However,
without those revisions, Sancho Panza would think that the revision
count for "aab" is 4. So in these cases, Sancho's repo would contain
the empty file
panza_repo/256/aabaabaabaabaabN6
The leading "N" tells ArX that this is not a tgz of a project tree.
Don Quixote notices the work that Sancho did, but only likes part of
it (The "cheaper, faster, better" patch, not the "Non-delusional work"
patch"). So he cherry-picks it
arx dopatch /home/panza/address@hidden
The "ffe" part is required to disambiguate that patch from Don
Quixote's own patch #552 in Sancho's repo. When Don Quixote commits,
that adds that patch to Don Quixote's repo if he doesn't already have
it. If that revision was a merge, then Don Quixote would have to
completely specify the revision hash and at least partly specify the
previous revision hash. Only that particular patch would go into Don
Quixote's repo.
Don Quixote tags a release
arx tag ../repo,release ../repo,project
which creates a tagged revision
repo/release.d/0/
repo/release.d/0/log
repo/release.d/0/log.sig
repo/release.d/0/URL
Don Quixote decides to change a log
arx fix-log -m "Initial" @1
which replaces the log and log.sig files, but does not change the
rev.tgz file. Note that this is not usually propagated. So be
careful with your mistakes!
He decides to use a different key to sign revisions. He starts with
arx sig -d --repo --key "address@hidden"
This removes that public key from repo/keys.
arx sig -d ../repo,project
This removes all of the log.sig files under repo/project.d
arx sig -a --repo --key "address@hidden"
This adds that public key to repo/keys
arx sig -a --key "address@hidden" ../repo,project
and this adds new log.sig files.
Cheers,
Walter