gzz-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gzz-commits] manuscripts/storm short-paper.rst


From: Benja Fallenstein
Subject: [Gzz-commits] manuscripts/storm short-paper.rst
Date: Tue, 20 May 2003 14:48:12 -0400

CVSROOT:        /cvsroot/gzz
Module name:    manuscripts
Changes by:     Benja Fallenstein <address@hidden>      03/05/20 14:48:12

Modified files:
        storm          : short-paper.rst 

Log message:
        shorten short paper

CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/manuscripts/storm/short-paper.rst.diff?tr1=1.1&tr2=1.2&r1=text&r2=text

Patches:
Index: manuscripts/storm/short-paper.rst
diff -u manuscripts/storm/short-paper.rst:1.1 
manuscripts/storm/short-paper.rst:1.2
--- manuscripts/storm/short-paper.rst:1.1       Tue May 20 10:18:35 2003
+++ manuscripts/storm/short-paper.rst   Tue May 20 14:48:12 2003
@@ -2,22 +2,32 @@
 Storm: Supporting data mobility through location-independent identifiers
 ========================================================================
 
+.. Main point of this paper:
+   Location-independent identifiers support data mobility;
+   DHT allows location-independent identifiers
+
 Abstract
 ========
 
-In this paper, we define data mobility as a collective term for the
-movement of documents between computers, different locations 
-on one computer and movement of content between documents.
-We identify dangling links and alternative versions as major
-obstacles for the free movement of data. This paper presents the Storm 
-(STORage Module) design as one possible solution to these problems.
-Storm uses location-independent globally unique 
-identifiers, append-and-delete-only storage and peer-to-peer networking to 
-resolve problems raised by data mobility. Moreover, we discuss some 
-specific use scenarios related to ad hoc networks, unreliable network 
-connections and mobile computing, in which the need for data mobility 
-is obvious. Our current prototype implementation works on a single system;
-peer-to-peer networking is in an early prototype stage.
+- data mobility
+- problems
+- location-independent identifiers such as hashes
+- resolvable through DHT
+- our implementation (Storm) is beginning to be deployed
+
+.. In this paper, we define data mobility as a collective term for the
+   movement of documents between computers, different locations 
+   on one computer and movement of content between documents.
+   We identify dangling links and alternative versions as major
+   obstacles for the free movement of data. This paper presents the Storm 
+   (STORage Module) design as one possible solution to these problems.
+   Storm uses location-independent globally unique 
+   identifiers, append-and-delete-only storage and peer-to-peer networking to 
+   resolve problems raised by data mobility. Moreover, we discuss some 
+   specific use scenarios related to ad hoc networks, unreliable network 
+   connections and mobile computing, in which the need for data mobility 
+   is obvious. Our current prototype implementation works on a single system;
+   peer-to-peer networking is in an early prototype stage.
 
 .. raw:: latex
 
@@ -51,326 +61,13 @@
 This, we believe, may be the most important result of peer-to-peer 
 research with regard to hypermedia.
 
-In this paper, we examine how location-independent identifiers can 
-support *data mobility*. Documents often move quite freely 
-between computers: they are sent as 
-e-mail attachments, carried around on disks, published on the web, moved 
-between desktop and laptop systems, downloaded for off-line reading or 
-copied between computers in a LAN. We use 'data mobility' as a collective 
-term for the movement of documents between computers (or folders!),
-and movement of content between documents (through copy&paste) [#]_.
-
-.. [#] While the physical mobility of e.g. notebooks may effect
-   data mobility (for example due to caching for off-line access), 
-   data mobility is neither the same as, nor limited to the physical
-   movement of devices.
-
-We address two issues raised by data mobility:
-Dangling links and keeping track of alternative versions. 
-Resolvable location-independent identifiers
-make these issues much easier to deal with, since data
-can be identified wherever it is moved [#]_.
-Current systems dealing with these issues
-often do not deal well with many forms of data mobility.
-
-.. [#] It might be more appropriate to speak about *resources*
-   and *references* instead of *documents* and *links*, but
-   in the spirit of [kappe95scalable]_, we stick with
-   the simpler terms for explanation purposes.
-
-*Dangling links* are an issue when documents are moved
-between servers; when no network connection is available,
-but there is a local copy (e.g. on a laptop or dialup system);
-or when the publisher removes a document permanently,
-but there are still copies (e.g. in a public archive such as
-[waybackmachine]_). Dangling links are also an issue
-when a document and a link to it are received independently,
-for example as attachments to independent emails,
-or when a link is sent by mail and the document is available
-from the local intranet. When two people meet e.g. on the train,
-they should be able to form an ad-hoc network and follow links
-to documents stored on either one's computer [thompson01coincidence]_.
-Furthermore, when a document is split to parts, links to 
-the elements in the parts that are then in new documents should not break.
-
-Advanced hypermedia systems such as Microcosm and Hyper-G
-address dangling links through a notification system 
[hill94extending-andalso-kappe95scalable]_:
-When a document is moved, a message is sent to servers storing links to it.
-Hyper-G uses an efficient protocol for delivering such notifications
-on the public Internet. 
-
-Location-independent identifiers for documents 
-make such a system unnecessary; a structured peer-to-peer lookup system 
-can find documents wherever they are moved.
-This kind of system also works for data not publicized on the Internet.
-For example, if one email has a document attached to it, and another email
-links to this document, an index of locally stored documents
-by permanent identifier allows the system to follow the link.
-This would be difficult to realize through a
-notification mechanism.
-
-*Tracking alternative versions*, on the other hand,
-is an issue when documents are modified
-on several independent, unconnected systems, for example 
-when a user takes a document home from work on a floppy disk;
-when they keep the same set of documents on their desktop and laptop,
-modifying them on each; when two people collaborate on a document, 
-sending each other versions of the document by email; 
-when someone downloads a document, modifies it, and publishes
-the modified version,
-or when a group of people collaborate on a set of documents,
-synchronizing irregularly with a central server (as in CVS [cvs]_),
-a network of servers (as in Lotus Notes) or directly with each other 
-(as in Groove [groovesurl]_). In each of these cases, a user should be able
-to work on the version at hand and then either merge it with others 
-or fork to a different branch, as well as rollback the current changes
-or look at differences between versions *without network connectivity*. 
-
-The main contribution of this paper is the Storm (for *STORage Module*) 
design, 
-a hypermedia system built to use the emerging 
-peer-to-peer data lookup technologies to enhance data mobility
-by dealing with versioning and dangling links. 
-
-Storm is a library
-for storing and retrieving data as *blocks*, immutable
-byte sequences identified by cryptographic content hashes
-[lukka02guids]_. 
-We address the mobility of documents by block storage
-and versioning., 
-Fig. [ref-storm_layers]_ provides an overview of Storm's components.
-
-.. uml:: storm_layers
-    :caption: Components of the Storm model
-
-    package Blocks
-
-    package Indexing
-        use Blocks
-
-    package XuStorage
-        use Indexing
-        use Blocks
-  
-    package Pointers
-        use Indexing
-        use Blocks
-
-    package Diffs
-        use Indexing
-        use Blocks
-
-    ---
-    Blocks.c = (0,0);
-    vertically(55, foo, Blocks, Indexing);
-
-    dx = 80; dy = 60;
-    XuStorage.c = Indexing.c + (-dx, -dy);
-    z1 = Indexing.c + (dx, -dy);
-    Pointers.c = z1 + (15, 25);
-    Diffs.c = z1 - (15, 15);
-
-Additionally, we hope to 
-provide an input to the ongoing discussion about peer-to-peer
-hypermedia systems
-[thompson01coincidence-andalso-bouvin02open-andalso-p2p-hypertext-panel-andalso-lukka02guids]_.
-
-This paper is structured as follows. In the next section, we describe 
-related work. In section 3, we give an overview of the xanalogical storage 
model.
-In section 4, we introduce the basic storage unit of our 
-system, i.e. file-like blocks identified by cryptographic hashes. In section 
5, 
-we discuss application-specific reverse indexing of blocks by their 
-content, essential for many applications. In section 6, we present 
-techiques for efficient versioned storage of mutable data on top of blocks. 
-In section 7, we report on implementation experience and future directions. 
-Section 8 concludes the paper.
+- location-dependent identifiers cause broken links
 
+- alternative versions on independent systems hard to synchronize
 
-Related Work
-============
+- creating a location-independent namespace, resolve through DHT
 
-Dangling links and alternative versions
----------------------------------------
 
-The dangling link problem has received a lot of attention
-in hypermedia research (e.g. [davis98referential]_). As examples, we examine 
the ways
-in which HTTP, Microcosm [fountain90microcosm]_ and Hyper-G [andrews95hyperg]_ 
-deal with the problem.
-
-In HTTP, servers are able to notify a client that a document
-has been moved, and redirect it accordingly [rfc2068]_. However,
-this is not required, and there are no facilities for
-updating a link automatically when its target is moved.
-The HTTP protocol includes a "LINK" request 
-for creating a relationship between a set of URIs, 
-but this feature has never been commonly implemented [reich-davis99-ohp]_.
-
-In Microcosm, hypermedia functionality is implemented
-using *filters*, which react to arbitrary messages
-(such as 'find links to this anchor') generated by
-a client application. Filters are processes on the local system
-or on a remote host [hill94extending]_. When
-a document is moved or deleted, a message is sent
-to the filters. Linkbases implemented as filters can
-update their links accordingly. A client selects a set
-of remote filters to use. Only links stored by one
-of these filters can be found by the client.
-
-.. [HymEbook?]
-
-.. Microcosm systems can independently choose 
-   whether to import filters from other systems, and whether
-   to host and export own filters; thus, a system can act
-   as both a client and server at the same time,
-   for example in a workgroup.
-
-In Hyper-G, documents are bound to servers, and a link
-between documents on different servers is stored by both servers
-[kappe95scalable]_. This ensures that all links from and to a document
-can always be found, but requires the cooperation 
-of both parties. Hyper-G employs a scalable protocol
-for notifying servers when a document has been moved or removed.
-A server hosting links to this document can then ask
-the link's author to change the link, or at least the link
-can be removed automatically. The *p-flood* algorithm
-employed by Hyper-G guarantees that a message
-is delivered to all interested servers, but requires that each
-interested server keeps a list of all the others.
-
-These approaches share the assumption that it is not possible
-to resolve a location-independent identifier. Otherwise,
-it would not be necessary to update links when a document
-is moved, nor would either of the servers storing two given documents
-need to know the links between them;
-knowing only a document's location-independent identifier,
-it would be possible to find both the document and links to it,
-no matter which peer in the network they are stored on.
-
-In a similar vein, 
-version control systems like CVS or RCS [tichy85rcs]_ generally assume
-a central server hosting a repository. The WebDAV/DeltaV protocols,
-designed for interoperability between version control systems, inherit
-this assumption [rfc2518-andalso-rfc3253]_. 
-On the other hand, Arch [arch]_ places all repositories
-into a global namespace and allows independent developers 
-to branch and merge overlapping repositories without any central control.
-
-Lotus Notes, a popular database sharing and collaboration tool, 
-uses both location-dependent and location-independent
-identifiers [lotus-notes-c-api]_. However, partly due to the age of the 
system, Lotus Notes 
-is limited to client-server architecture. 
-Groove [groovesurl]_ is an improved design based on Lotus Notes, 
-employing strong security mechanisms and usesing peer-to-peer functionality 
-as the basis of communication channels among a limited amount of participants. 
-
-.. [ref HTML version format proposal] Alternate versions important for
-   authoring process [search refs]. (Note: Keeping track of versions
-   structure is also \*hyper*media. Refs?) (WebDAV!)
-
-.. review: http://citeseer.nj.nec.com/griffiths99contentspec.html ?
-   couldn't find a relevant angle, as it's a storage /protocol/. hm
-   same prob. with http://citeseer.nj.nec.com/millard98reworking.html
-   http://citeseer.nj.nec.com/227358.html that are about OHProtocol.
-
-
-Peer-to-peer systems
---------------------
-
-..  [ref: iris: http://iris.lcs.mit.edu/].
-
-During the last few years, there has been a lot of research
-related to peer-to-peer resource discovery, both in the academia 
-and in the industry [p2pworkinggroup]_.
-There are two main approaches: broadcasting 
[gnutellaurl-andalso-ripeanu02mappinggnutella-andalso-kazaaurl]_, 
-and distributed hashtables (DHTs) 
[stoica01chord-andalso-ratnasamy01can-andalso-zhao01tapestry-andalso-rowston01pastry-andalso-maymounkov02kademlia-andalso-malkhi02viceroy]_.
 Broadcasting systems
-forward queries to all systems reachable in a given number of hops
-(time-to-live). DHTs store (key,value) pairs which can be found given
-the key; a DHT assigns each peer a subset of all possible keys, and
-routes queries for a given key to the peer responsible for it.
-Before a pair can be found, it must be *inserted* in the DHT
-by sending it to the peer responsible for the key. Both approaches
-use an application-level overlay network for routing.
-
-While broadcasting systems' performance can be worse than linear, DHTs' 
performance
-usually has log-like bounds in the number of peers 
-for *all* internal operations [#]_. This scalability is
-what makes global searches feasible in DHTs. In broadcasting approaches,
-on the other hand, scalability is achieved by forwarding queries 
-only to a limited subset of the peers (bounded by the time-to-live),
-which means that searches in these systems are not truly global.
-
-.. [broadcasting's message population can grow as fast as O(n^2) -Hermanni]
-
-.. [#] It's not clear whether *all* proposed DHT designs can preserve
-   log-like properties when participants are heterogeneous and they 
-   join and leave the system in a dynamic manner. 
-
-A DHT has a *key space*, for example the points on a circle.
-The keys in (key,value) pairs are mapped to points in the key space
-through a hash function. Independently, each peer is assigned
-a point in the space. The DHT defines a distance metric
-between points in the key space (e.g. numeric, XOR); the peer
-responsible for a hashtable key, then, is the one that is *closest*
-to it in the key space, according to the distance metric.
-A DHT peer is roughly analogous to a hashtable bucket.
-Queries are routed in the overlay network, each hop bringing
-them closer to their destination in key space, until they reach
-the responsible peer. A common API that can be supported by current and future 
DHTs
-is proposed in [zhao03api]_.
-
-.. Recently, a few DHT-like systems have been developed which employ
-   a key space similarly to a DHT, but in which queries are routed
-   to (key,value) pairs [bonsma02swan-andalso-AspnesS2003]_: A peer 
-   occupies several positions in the key space, one for each 
-   (key,value) pair. In such a system, the indirection of placing
-   close keys in the custody of a 'hashtable bucket' peer is removed
-   at the cost of each peer maintaining one node in the overlay network
-   for each (key,value) pair it publishes.
-
-The basic definition of a distributed hashtable does not indicate
-how large the keys and values used may be. Intuitively, we expect keys
-to be small, maybe a few hundred bytes at most; however, there are different
-approaches to the size of values. Consider a file-sharing application:
-If the keys are keywords from the titles of shared files, are the values
-the files-- or the addresses of peers from which the files may be
-downloaded? Iyer et al [iyer02squirrel]_ call the former approach
-a *home-store* and the latter a *directory* scheme (they call the peer
-responsible for a hashtable item its 'home node,' thus 'home-store').
-The choice between the schemes affects the scalability and reliability
-of the network.
-
-CFS [dabek01widearea]_ and PAST [rowstron01storage]_ 
-are scalable storage systems using the home node approach,
-based on the Chord [stoica01chord]_ and
-Pastry [rowston01pastry]_ DHTs, respectively.
-Freenet [freenet-ieee]_ is a system for anonymous reading
-and publication.
-
-Recently there has been some interest in peer-to-peer hypermedia.
-Thompson and de Roure [thompson01coincidence]_ examine the discovery
-of documents and links available at and relating to
-a user's physical location. An example would be
-a linkbase constructed from links made available by different
-participants of a meeting [thompson00weaving]_. 
-Bouvin [bouvin02open]_ focuses on the scalability and ease of publishing
-in peer-to-peer systems, examining ways in which p2p can serve
-as a basis for Open Hypermedia. Our own work has been 
-in implementing Xanalogical storage [lukka02guids]_.
-
-At the Hypertext'02 panel on peer-to-peer hypertext [p2p-hypertext-panel]_,
-there was a lively discussion on whether the probabilistic access
-to documents offered by peers joining and leaving the network
-would be tolerable for hypermedia publishing. For many documents,
-the answer is probably no; however, for personal links,
-comments, and notes about documents, probabilistic access may be acceptable,
-especially when seen as a trade-off against
-having to set up a webspace account before publication.
-
-In the end, some peers will necessarily be more equal than others:
-Published data will be hosted on servers
-which are permanently on-line, but are otherwise ordinary peers
-in the indexing overlay network.
-   
-   
 Storm block storage
 ===================
 
@@ -522,434 +219,10 @@
 for the experimental Gzz system, a platform explicitly developed
 to overcome the limitations of traditional file-based applications.
 
+- versioning, pointers
 
-Implementation
---------------
+- web integration
 
-Storm blocks are MIME messages [borenstein92mime]_, i.e., objects with
-a header and body as used in Internet mail or HTTP.
-This allows them to carry any metadata that can be carried
-in a MIME header, most importantly a content type.
-
-Collections of existing Storm blocks are called *pools*. Pools provide
-the following interface for injecting and obtaining data::
-
-    add(bytes) -> id
-    getIds() -> list
-    get(id) -> block
-
-and the following methods for moving blocks between pools::
-
-    add(block)
-    delete(id)
-    
-Implementations may store blocks in RAM, in individual files,
-in a Zip archive, in a database, in a p2p network, 
-or through other means.
-We have implemented the first three (using hexadecimal
-representations of the block ids for file names).
-
-Many existing peer-to-peer systems could be used to
-find blocks on the network.
-For example, Freenet [freenet-ieee]_, recent Gnutella-based clients 
-(e.g. Shareaza [shareazaurl]_), and Overnet [overneturl]_ 
-also use SHA-1-based identifiers. 
-Implementations on top of a DHT could use both the
-directory and the home store approach as defined by [iyer02squirrel]_.
-
-Unfortunately, we have not put a p2p-based implementation
-into use yet and can therefore only report on our design.
-Currently, we are working on a prototype implementation
-based on UDP, the GISP distributed hashtable [kato02gisp]_,
-and the directory approach (using the DHT to find a peer
-with a copy of the block, then using HTTP to download the block).
-Many practical problems have to be overcome before this
-implementation will be usable (for example seeding the
-table of known peers, and issues with UDP and network
-address translation [rfc3253]_).
-
-.. talk about efficiency, storing big media files only once--
-
-   It is unclear whether this approach is efficient for text
-   in the Storm framework; in the future, we may try storing
-   the characters in the documents themselves, along with their
-   permanent identifiers; however, this makes spoofing
-   possible. For images or video, on the other hand,
-   it is clearly beneficial if content appearing in different
-   documents-- or different versions of a document-- is only
-   stored once, in a block only referred to wherever
-   the data is transcluded. This is similar to different Web pages
-   including the same image.
-
-An important open issue with block storage are
-UI conventions for listing, moving and deleting blocks.
-
-.. Currently, the only interface is a file system directory
-   containing a set of blocks as files with hexadecimal, 
-   random-looking names. In Gzz, we currently trick our way around
-   the problem; at startup time, we simply load the most current
-   version of a document whose identifier is hard-wired into
-   the software (mutable documents are described in section 6.1).
-
-
-Application-specific reverse indexing
-=====================================
-
-Finding links and transclusions in
-Xanalogical storage is an example of *reverse indexing*
-of Storm blocks: finding a block based on its contents.
-(For other examples, see section 6, below.)
-Storm provides a general API for indexing blocks in
-application-specific ways. We have implemented indexing
-on a local machine, but the interface is designed so that
-implementation on top of a distributed hashtable
-will be straight-forward. (Again, our GISP-based implementation
-is in a very early stage.)
-
-In Storm, applications are not allowed to put arbitrary
-items into the index. Instead, applications that want 
-to index blocks provide the following callback
-to a Storm pool::
-
-    getItems(block) -> 
-        set of (key, value) pairs
-
-This callback analyzes a block and returns a set of 
-hashtable items (key/value pairs) to be placed into the index. 
-The Storm pool, in turn, provides 
-the following interface to the application::
-
-    get(key) -> set of (block, value) pairs
-
-This function finds all items created by this application
-with a given key, indicating both the application-provided
-value and the block for which the item was created.
-
-We use the ``getItems()`` approach instead of
-allowing applications to put arbitrary items into the database
-because binding items to blocks makes it easy for pools
-to e.g. remove associated items when deleting a block.
-
-In a networked implementation, each peer is responsible
-for indexing the blocks it stores. Since no peer can
-feasibly know all applications that may need indexing,
-there may be blocks available on the network that have
-not been indexed by a particular application.
-We do not see this as a problem --- it's just like a block
-being available only if there's a peer which wants it to be ---
-but applications must be prepared to deal with it.
-
-Locally, on the other hand, it is guaranteed that
-all blocks in a pool are indexed by all applications
-known by the pool. To ensure this, we check that all blocks
-are indexed when a pool is loaded, and add missing items to the index.
-
-One indexing application that may seem obvious is keyword-based
-full-text search. However, no research has been done
-in this direction; it is not clear whether the current
-interface is well suited to this, or whether current implementations
-are able to scale to the load to store an item for each word
-occuring in a document.
-
-.. [There are two refs about keywords in DHTs-- should we ref these ? 
-Hermanni]
-
-   Not sure how applicable they are: our system is *not*
-   as general or performant as a DHT (as explained above).
-   Should read & find out whether they could be implemented
-   through our index system at all... -b
-
-
-Versioning
-==========
-
-Mutable documents can be implemented on top of block storage
-using a combination of two mechanisms, *pointers* and *diffs*.
-A *pointer* is an updatable reference to a block,
-and a diff is a set of differences between versions,
-similar to what is stored e.g. by version control systems such as CVS.
-
-
-Pointers: implementing mutable resources
-----------------------------------------
-
-A Storm pointer is a globally unique identifier (usually created randomly)
-that can refer to different blocks over time. A block a pointer
-points to is called the pointer's *target* (Fig. [ref-storm_pointers]_).
-
-To assign a target to a pointer, we create a special kind of block,
-a *pointer block*, representing an assertion like *pointer P targets
-block B*. To find the target of pointer P, Storm searches for
-blocks of this form. This is one application of Storm
-indexing (Section 5), using P as the index key.
-
-.. uml:: storm_pointers
-    :caption: The Storm pointer system. A pointer is implemented
-             by a collection of pointer blocks which can obsolete 
-             other pointer blocks and each pointer block gives a single
-             target for the pointer.
-
-    class Pointer
-
-    class PointerBlock
-        assoc multi(*) - multi(1) Pointer
-        assoc multi(*) - multi(1) role(target) Target
-        
-    ring = assoc PointerBlock multi(1) - multi(*) role(obsoleted) PointerBlock
-
-    class Target
-
-    ---
-
-    Pointer.c = (0, 0);
-    horizontally(100, foo, Pointer, PointerBlock);
-    vertically(50, bar, PointerBlock, Target);
-    ring.p = PointerBlock.e{right} .. PointerBlock.n{down};
-    
-
-In addition to the pointer and the target, pointer blocks contain
-a list of zero or more *obsoleted* pointer blocks. When a new version
-is created, it usually supersedes one older version; 
-the corresponding pointer block then 'obsoletes'
-the pointer block targeting the superseded version.
-Only the new, non-obsoleted block will be considered when
-loading the document (although the pointer blocks pointing to
-past versions remain accessible for tracing the document's history) [#]_.
-
-.. [#] All known pointer blocks for a pointer are still loaded
-   when the pointer is resolved. Storm then discards
-   the obsoleted ones.
-
-If, on the other hand, two people collaborate on a document
-and produce two independent versions, neither will obsolete
-the other. When they synchronize their pools by copying
-all new blocks in either to the other, both versions will be
-considered 'current' by the system. The users can then take
-appropriate action, by consolidating the changes in both versions
-(manually or through an automatic merge algorithm),
-or by treating the two versions as alternative. After the
-alternative versions have been consolidated, a pointer block
-obsoleting both consolidated previous versions is created.
-
-Currently, the pointer mechanism
-works only between trusted Storm pools, e.g.
-in a workgroup collaborating on a set of documents.
-In a multi-user environment, we usually want only one user
-or group to be able to publish official versions a document.
-It is not yet clear how to do this,
-but digital signatures of pointer blocks seem promising.
-For long-term publishing, one-time signatures have been
-found useful [anderson98erl]_. 
-
-.. digital signatures require a public key infrastructure
-   and a trusted timestamping mechanism, which
-   are hardly feasible for a system intended to be used
-   for off-line as well as on-line work.
-
-The ability to retain multiple 'current' versions of a document
-can be useful, for example when there is no time to consolidate
-changes at the time of synchronization. However, we need
-to choose one such version when loading the document.
-For example, we could open an official or original version automatically
-if one exists.
-
-While we think that alternative current versions are useful for
-asynchronous collaboration, they aren't well suited to Web-like publishing.
-For this, a different system may be practical, where digitally signed pointer 
blocks
-store a target and a timestamp; when resolving a pointer, the newest
-pointer block for that pointer would then be selected.
-
-In summary, the current pointer system seems promising, but 
-there are a number of unresolved issues with it: 
-authenticating pointer blocks; the user interface for choosing
-between alternative current versions; and the suitability
-for Web-like publishing. More research is needed in this area.
-
-.. authenticating -> [possible refs: ConChord, SDSI/SPKI ? -Hermanni]
-
-
-Diffs: storing alternative versions efficiently
------------------------------------------------
-
-.. [Hm, should we move/remove 'Additionally, many versioning'
-   paragraph into related work ? -Hermanni]
-
-The pointer system suggests that for each version of a document,
-we store an independent block containing this version. This
-obviously doesn't scale well when we keep a lot of versions
-each with only small changes. Instead, we use the well-known
-technique of storing only the differences between versions.
-
-We still refer to a version by the id of a block containing it.
-However, we do not necessarily *store* this block,
-even though we refer to it. Instead, we may create a *diff block*,
-containing the ids of two versions and the differences between them.
-When we want to load a version
-and do not have the block, we use Storm indexing to find
-all diff blocks from or to that version, trying to find
-a chain of differences starting at a known version. Then,
-we can apply the differences in order, and arrive at the version
-we seek.
-
-When we have reconstructed this version, we create a Storm block
-from it and check that it matches the id of the version
-we are seeking. This way, we do not need to place any trust
-in the diff blocks we are using. While anybody can create
-a diff block pretending to give us version X even though it really
-gives us version Y, we can still retrieve diff blocks from
-an untrusted network source because we can check whether a block
-has given us version X or Y by checking the cryptographic hash.
-
-.. In this scheme, we can easily drop a previous version
-   by merging differences: If we have stored the differences
-   from version ``A`` to ``B``, and ``B`` to ``C``, 
-   to drop version ``B``, we compute
-   the difference from ``A`` to ``C``, and replace the two
-   previous differences by it. If we also store a difference
-   between version ``C`` and ``D``, it does not need
-   to be altered, because it refers to *version* ``C`` and not
-   the difference to ``C`` from ``B`` (as in the simplistic scheme).
-
-   We can also store the block containing version ``D``
-   in addition to storing the versions above. Then, we can reconstruct
-   version ``C`` in two ways: By using the diffs from ``A`` to ``B``
-   and ``B`` to ``C``, or, more efficiently, by applying the inverse
-   of the diff from ``C`` to ``D`` to version ``D`` [#]_.
-
-   .. [#] Of course, in reality the number of differences
-      that can be 'skipped' will have to be much higher
-      for this mechanism to be useful.
-
-Our current implementation is a layer above Storm block storage
-and indexing. This layer implements a ``load(id) -> version``
-interface through the following simplified algorithm:
-
-1. If the block for ``version-id`` is in the pool, return it.
-2. Else, search for diff blocks storing the difference
-   between ``version-id`` and any other version.
-3. For each of these blocks, attempt to ``load()`` the *other* version.
-4. If successful, apply the difference.
-5. Check the hash of the resulting version. If correct, return it;
-   if incorrect, go back to step 3.
-
-As computing differences is file-format dependent, so is our system
-for storing versions. In our implementation, applications need to
-provide a callback interface for reading and writing versions
-and computing and applying differences.
-
-.. uml:: version_interfaces
-    :caption: Diff interfaces
-
-    class Version "interface"
-        methods
-            getDiffFrom(:Version): Diff
-
-    class Diff "interface"
-        methods
-            applyTo(:Version): Version
-            inverse(): Diff
-
-    class VersionFormat "interface"
-        methods
-            readVersion(:InputStream): Version
-            readDiff(:InputStream): Diff        
-
-            writeVersion(:OutputStream, :Version)
-            writeDiff(:OutputStream, :Diff)
-
-    ---
-    Version.c = (100,0);
-    Diff.c = (250, 0);
-    VersionFormat.c = (175, -80);
-    %horizontally(100, foo, Version, Diff);
-    %vertically(120, bar, foo, VersionFormat);
-
-
-The diff system is more complicated than simple block storage,
-and therefore more liable to bugs. However, saving is still 
-purely additive: New diffs
-are added, but old diffs aren't changed. Therefore, when a save
-goes wrong, again only the changes after the previous save are lost.
-
-.. With backward diffing, we remove the cached full version,
-   but we can reconstruct it using the diffs. We believe that
-   diff-based Storm storage is still more reliable than file storage,
-   where a simple application bug can lose all previous work
-   on a document.
-
-To protect against buggy ``Diff`` or ``VersionFormat``
-implementations, before storing a diff, we always check
-that we can reconstruct the appropriate version block from it;
-if this fails for some reason, we store the full version block
-instead. At the cost of some storage space, this protects
-the user's data.
-
-.. [this would be relevant, but is cut because of space constraints -b]
-   Currently, Storm pool implementations do not know anything about diffs;
-   all the functionality described here is implemented on top of them.
-   For a networked system, however, it would be useful if a server
-   could recreate version blocks before sending them to a client.
-   Then, instead of transferring all the diffs, only the full version
-   would have to be sent through the network.
-
-
-Discussion
-==========
-
-To evaluate the design, we revisit the issues raised by data mobility. 
-For the two issues addressed, *dangling links* and *tracking
-alternative versions*, each individual (use) case that was identified in the
-introduction is dealt with here to illustrate Storm.
-
-*Dangling links*. When documents are moved between servers, when using Storm
-the links to them are not affected as the identifiers are
-location-independent. In a peer-to-peer implementation, the lookup with the
-id returns a location where the data is currently available. If the
-publisher removes the document permanently, but it is archived elsewhere,
-the archives act as peers similarly. When there is no network connection
-available, but a local copy instead, Storm can find it. Also if a
-document and a link to it are received independently, e.g. as attachments in 
-separate e-mails, or a link to a document in the local intranet is e-mailed, 
-the link works.
-When people meet live, e.g. on a train, and form an ad-hoc network, they are
-able to see each other's public documents and follow links to them if a
-peer-to-peer implementation of Storm is used.
- 
-*Tracking alternative versions*. Because Storm utilises immutable blocks,
-each modification to a document creates a new block. When a document is
-modified on several independent, unconnected systems, if there are
-simultaneous changes (i.e. no synching in between), there will be several  
-versions of it. Using diffs, each version is actually (a collection of)
-changes to the original. What happens then, is outside the scope of Storm:  
-the authors may decide to merge the changes forming a new joint version, but 
-how that is done is file format and hence application specific. If (some of)
-the new versions of the document are not merged but forked to separate
-branches, they simply continue to exist (they may be assigned different
-names, which is again outside the scope of Storm).
-
-.. some things from the earlier treatment left here as notes: 
-
-   B sets the document public (how that is done depends on UI
-   implementation), i.e.  putting in a published pool, which may reside
-   locally or externally e.g.  on a (dedicated) server that is "always-on".
-   These public/private pools are an area where future research is needed,
-   possibly related to rights and permissions etc. too.
-   
-   Comments may be new entities(?) linking to it
-
-   [At the end of this section ? -Hermanni]
-   When Xanalogical storage is not applied, using Storm as a
-   replacement/equivalent of a conventional file and versioning system is
-   trivial?
-
-.. Besides the selected issues discussed above, a few remarks about further
-   evaluation of Storm follow. From a security point of view, the fact that all
-   data is stored in immutable blocks has obvious benefits for reliablity (data
-   is never overwritten). By way of using of SHA-1 cryptocraphic content hashes
-   as identifiers, verifyability is another benefit. As for usability, the ease
-   of replication and caching combined with location-independent identifiers
-   enable several improvements, including the possibility to keep on working on
-   shared documents even when there is no or an unreliable network connection.
-   
 
 Conclusions
 ===========
@@ -966,16 +239,7 @@
 Currently, we are working on a GISP-based peer-to-peer
 implementation.
 
-No work on integrating Storm with current programs (in the spirit of Open
-Hypermedia) has been done yet. 
-This makes Storm a rather monolithic approach at present.
-
-One possibility is to take an existing system
-(with features outside the focus of Gzz)
-which implements strict versioning, and to modify it to use Storm for storage. 
-A candidate is the object-oriented Web publishing environment Zope [zope]_, 
-which is Free Software. The
-open hypermedia protocol (OHP) may be another possibility [reich-davis99-ohp]_.
+We have written an HTTP gateway and plan integration with KDE.
 
 Work is also needed on user interfaces for Storm.
 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]