social-mediagoblin
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Social-mediagoblin] Tahoe-LAFS as a document-oriented database


From: Christopher Allan Webber
Subject: Re: [Social-mediagoblin] Tahoe-LAFS as a document-oriented database
Date: Sat, 09 Apr 2011 10:20:53 -0500
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (gnu/linux)

Zooko, thanks for this email.  It's actually incredibly timely.. I had
just sketched up an idea of how the storage API was to work.  I want to
have a generic storage API so we can plug in multiple backends (from
just simple local file storage, to eucalyptus, to tahoe-lafs even!)

So, before I reply to your email directly, let me paste in that API.

#+BEGIN_SRC python
  storage_handler.file_exists(['dir1', 'dir2', 'filename.jpg'])
  # True / False
  
  storage_handler.get_unique_filename(['dir1', 'dir2', 'filename.jpg'])
  # Possibly returns either:
  #  - 'filename.jpg'                     # if no such file yet exists
  #  - '%s-filename.jpg' % uuid.uuid4()   # if another file of this name exists
  #
  # You would then use this to call
  # this_file = storage_handler.get_file(['dir1', 'dir2', our_unique_filename])
  
  storage_handler.get_file(['dir1', 'dir2', 'filename.jpg'])
  # Returns a read/writeable file-like object
  # I guess makes the directory if necessary?
  
  storage_handler.write_file(['dir1', 'dir2', 'filename.jpg'], data)
  # Writes this file, lazy convenienceness.
  # I guess makes the directory if necessary?
  
  storage_handler.delete_file(['dir1', 'dir2', 'filename.jpg'])
  # Deletes the file here
  
  storage_handler.file_path(['dir1', 'dir2', 'filename.jpg'])
  # Only for appropriate local filestores
#+END_SRC

This API is inspired by Django's filesystem API:
http://docs.djangoproject.com/en/dev/topics/files/

...where you get back a python file-like object that you .read() and
.write() to, but maybe it isn't a file object directly, as possibly
you're actually writing to some remote server, etc.  Note that we're not
using paths like ['dir1/dir2/filename.jpg'] but rather a list of
components.  This way we can really be sure what directories the author
*intended*, but also strip out evil things via
werkzeug.utils.secure_filename().

If anyone has comments on this I'd greatly appreciate hearing them.

Okay, now I'll respond to your email inline:

"Zooko O'Whielacronx" <address@hidden> writes:

> Folks:
>
> There is a discussion on the Tahoe-LAFS mailing list about how
> Tahoe-LAFS could be used for storing assets such as media files.
>
> http://tahoe-lafs.org/pipermail/tahoe-dev/2011-April/006257.html
>
> I want to emphasize that I'm not "pushing" for social-mediagoblin to
> use Tahoe-LAFS instead of using MongoDB or postgresql or whatever. I'm
> sure you folks have good reasons for your choices (Chris has written
> some notes about this issue) and I don't want Tahoe-LAFS to get used
> in ways that it is ill-suitedโ€” that would just cause headaches for
> everybody including me.

Interesting post, thanks for sharing.

And yeah, I don't intend to use tahoe-lafs where mongodb is currently,
as the "database".  If tahoe-lafs ever becomes a backend to mongodb,
maybe, and then I won't even need to write support for that myself! ;)

But as a media storage system I think it might work out well.
In the case of a backend like tahoe-lafs, I figure we can actually
allow for space in the database to actually map where these paths are if
necessary, but I'm not sure.

> Rather, I think there are some important fundamental architectural
> issues which are revealed in this conversation on the Tahoe-LAFS
> mailing list. Regardless of which actual technologies we use, we
> should understand and make conscious decisions about these
> architectural issues.
>
> Namely:
>
> 1. On what do you rely for the guarantee that the file is uncorrupted?
> There are basically two use cases: you store a file yourself and get
> it back later, or you share a file with someone else. In the former
> you want to be sure that you get back the same file you put in. In the
> latter the recipient wants to be sure that they get the same file the
> sharer sent.

So case 1, I guess we can store sha1 hashes in the database and check
against them if necessary?

So, in case 2, I'm really not sure what kind of problem you're
anticipating.  Maybe more examples would be helpful.  Are you talking
about like, a cryptographic integrity check to make sure yes, this is
the right file, no fooling, nobody's going to goatse.cx me?

I'd be interested in some cases where you think this will become a
problem.

> 2. Is there a guarantee of confidentialityโ€”that people who weren't
> intended to see the file can't see it? If so, on what do you rely for
> that guarantee?

Initially I think we're just going to handle things where everything is
pulbic to everyone.  But eventually I'd like to be able to support
sharing certain files with only family and friends, but I think that's a
way off.

One way of doing this when *not* using something like tahoe-lafs might
be to use the X-Sendfile response header.  This way we can authenticate
that it's okay to send this file and then let apache/nginx/whatever
actually do the serving.

> 3. Performance questions about clusters of servers and clients. May or
> may not be relevant to socialmediagoblin.
>
> Thank you for your attention.
>
> Regards,
>
> Zooko
>

Thank you, zooko, for the useful email :)


-- 
๐“’๐“ฑ๐“ป๐“ฒ๐“ผ๐“ฝ๐“ธ๐“น๐“ฑ๐“ฎ๐“ป ๐“๐“ต๐“ต๐“ช๐“ท ๐“ฆ๐“ฎ๐“ซ๐“ซ๐“ฎ๐“ป



reply via email to

[Prev in Thread] Current Thread [Next in Thread]