social-mediagoblin
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Social-mediagoblin] Tahoe-LAFS as a document-oriented database


From: Christopher Allan Webber
Subject: Re: [Social-mediagoblin] Tahoe-LAFS as a document-oriented database
Date: Sun, 10 Apr 2011 10:02:36 -0500
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (gnu/linux)

>>  storage_handler.get_unique_filename(['dir1', 'dir2', 'filename.jpg'])
>>  # Possibly returns either:
>>  #  - 'filename.jpg'                     # if no such file yet exists
>>  #  - '%s-filename.jpg' % uuid.uuid4()   # if another file of this name 
>> exists
>
> But if two people call this simultaneously then they might both get
> 'filename.jpg' returned to them. Then what?

Hm, good point, that is a race condition there.  It's going to be a
pretty rare one since these will be under directories that are "user"
and then "entry" uids, like so:

/media/USER_ID/ENTRY_ID/filename.png

...so it will be pretty rare as it will have to be two conflicting
filenames in the same project, but still a race condition to account
for, probably.

I could use file locking, but maybe I will always prepend the uuid, and
that will solve that problem.  Most people won't be direct-linking to
the file.

One solution would be:

/media/USER_ID/ENTRY_ID/RANDOM_UUID/filename.png

but that would lead to a lot of messy and superfluous directories.  But
it *would* guarantee non-conflicts.

Or we could do:

/media/USER_ID/ENTRY_ID/${RANDOM_UUID}-filename.png

always, but that means that if someone wants to wget the file, they'll
have a bunch of ugly stuff ahead of it for no good reason, which is sad.

Or we could take a combo of the two, default to:
/media/USER_ID/ENTRY_ID/${RANDOM_UUID}-filename.png
but have an option to have a "pristine filename", in which case you'll
get:
/media/USER_ID/ENTRY_ID/RANDOM_UUID/filename.png
... but that sounds like code (and UI option) bloat.

We can either ignore this rare case, or go with one of these lamer
solutions in the short term.  The nice thing is that we can support
a non-ideal solution for now and it'll still work if we change our
naming schema in the long term.

But I'd love some help and feedback here.

>> This API is inspired by Django's filesystem API:
>> http://docs.djangoproject.com/en/dev/topics/files/
>
> Oh, look there is already an implementation of Django's filesystem API
> on Tahoe-LAFS:
>
> https://github.com/thraxil/django-tahoestorage/blob/master/tahoestorage/storage.py

Interesting!  Noted for reference.

>>> 1. On what do you rely for the guarantee that the file is uncorrupted?
>>> There are basically two use cases: you store a file yourself and get
>>> it back later, or you share a file with someone else. In the former
>>> you want to be sure that you get back the same file you put in. In the
>>> latter the recipient wants to be sure that they get the same file the
>>> sharer sent.
>>
>> So case 1, I guess we can store sha1 hashes in the database and check
>> against them if necessary?
>
> Sounds good. (I would use sha-256 instead of sha1.)

Also noted.

>> So, in case 2, I'm really not sure what kind of problem you're
>> anticipating.  Maybe more examples would be helpful.  Are you talking
>> about like, a cryptographic integrity check to make sure yes, this is
>> the right file, no fooling, nobody's going to goatse.cx me?
>
> If I send you a file, I would not like it if anyone else can cause you
> to see a different file than the one I sent.

If we always tack on a uuid somehow I guess it will prevent this, when
direct linking to the file, that is.  I'm fine with every "different"
file uploaded always having a unique filename.

But one thing I *do* want to be changable is, say I upload:

http://mediagobl.in/cwebber/work/cats_pajamas/

I want, and *must* support, the ability to make a change to the image
shown here.  Not necessarily the hotlinked image, but if I realize I
accidentally made my cat's eyes purple, I must have the ability to
correct them to green if I want to.

That leads to an interesting question: what happens to the old file, if
I'm replacing it with a new one?

One solution is for us to support "old revisions" of the main file.  If
I change the file, I might be able to do in the user interface:

   ☑ Keep old file revisions

... which will be checked by default.  That way my main "display" page
can have the content change, but the storage system won't have the file
change.

I could really use some direction on this!  Zooko, Asheesh, Will, and
any others with relevant thoughts: please weigh in.

-- 
𝓒𝓱𝓻𝓲𝓼𝓽𝓸𝓹𝓱𝓮𝓻 𝓐𝓵𝓵𝓪𝓷 𝓦𝓮𝓫𝓫𝓮𝓻



reply via email to

[Prev in Thread] Current Thread [Next in Thread]