[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: How to treat XML files checked into CVS
From: |
Paul Sander |
Subject: |
Re: How to treat XML files checked into CVS |
Date: |
Wed, 16 Apr 2008 01:07:21 -0700 |
On Apr 15, 2008, at 8:34 PM, Arthur Barrett wrote:
The thread began on September 14, 2001, with the subject "giving up
CVS". A patch was posted with the subject " Demo of extensible merge
(was Re: giving up CVS)".
Hmmm interesting... Not really sure if that covers all the possible
places a merge is initiated, but still insteresting.
At the time I posted the patch, CVS had a wrapper function around the
3-way merge. That wrapper function was called from every place where
a merge could be initiated. (At least, this is applied to every
merge that affected user-visible artifacts, i.e. files located in his
sandbox.) So that was the proper place to insert the extensions.
Keep in mind that the patch was done as a proof of concept, and it
was not intended to be a model for production code. Relying on
naming conventions to ascertain data types is not robust enough, in
my opinion. Reading the file for magic numbers or other identifying
traits is better, and it might even be the best way given that CVS
does not guarantee that every version stored in an RCS container has
the same data type. (If it would, then something like a MIME type
stored in the RCS file admin phrase would be best. But making that
guarantee in CVS would require a redesign from the ground up, and
calls out other well-worn arguments.)
Do you still see this as a requirement?
I most certainly do. This is one of several missing features that I
consider to be essential for anything other than small toy projects.
Over time, members of this forum have raised the issue of merging
many types of data, including: Document formats like MS Word and
Frame Maker; mark-up languages such as XML and HTML; image files
include GIF, JPEG, and PNG; motion picture formats such as MPEG;
composite data types like those used by NeXT Step and VLSI design
tools. Even those who have opposed adding such extensibility have
claimed to wish for better merge capability for their chosen
programming languages than is possible with a diff3-based tool like
the one supplied with CVS, by somehow bringing the tool "closer to"
the language.
I will go so far as to claim that differencing and merging algorithms
can be developed for every type of data, including those lost causes
listed above. The degenerate case for merging is a simple selection,
but even image files such as JPEG can have a meaningful merge if
someone could design a proper user experience. I imagine a merge
tool with four tiled images; three allow lasso-style selections and
represent the contributor, ancestor, and working versions; the final
one has all of the editing capabilities of, say, Photoshop, and into
which selections from the other three images can be pasted. The
final image is the one that replaces the working version and is
eventually committed at the completion of a larger merge across the
project.
But no one has built tools for this purpose. Apparently there just
hasn't been much demand for them. But if we implement the hooks to
our version control tools to enable this capability, the demand may
slowly follow.
Do you have any response to the arguments I raised about how people
use
merge tools?
I disagree that the setting up an external application to perform a
merge is a complex and messy proposition. I agree that the merge
tools for many data types would probably rely on a GUI. I also
believe that some adjustment of the existing CVS user interface may
be desirable; most data types don't lend themselves to the kind of
conflict mark-ups that we're used to in ASCII file formats. So the -
kb style of handling might be necessary for most data types, along
with an additional "cvs merge" command that invokes the proper merge
tool to resolve conflicts detected during past updates and remembered
in CVS' sandbox metadata, using the working copy and the fetched
ancestor and contributor copies. In situations where the mark-ups
are useful, the merge tool might simply be the user's favorite text
editor.
For situations in which merges are initiated by wrapper tools (such
as WinMerge), such tools should embrace the full capability of the
underlying tool to the extent that is practical. If using WinMerge
causes merge history to be lost, then there's something wrong with
the integration: Either there aren't enough hooks in the lower-level
tool to give access to that level of detail, or the higher-level tool
lacks the ability to invoke the lower functions properly. In either
case, at least one of the tools wasn't thought out well enough to
permit the kind of tight integration that is really needed.
CVSNT certainly already has an alternative 'diff' mechanism used
(optionally) to create the deltas for binary files (-kB), and I can
see
it as a relatively painless proposition to add '-kE' to use an
extensible method if this is still relavent.
There's a big difference between the diff algorithm used to compute
the deltas stored in the version containers, versus the diff
algorithm used to present differences to the user. I claim that
these can, and indeed should, be different in the typical case.
RCS uses a longest common substring algorithm as a form of
compression to minimize the size of the deltas between adjacent
versions. This makes the storage of version history efficient, and
small deltas make for fast reconstruction of versions. This is as it
should be.
However, suppose a Java programmer wants to see differences between
two versions of his source file. It turns out that Java is a fairly
well behaved hierarchical language (unlike C or C++, due to the macro
preprocessor). This user would rather have his deltas presented to
him in a way that reflects the structure of his program: The
insertion, deletion, or modification of control structures and
expressions without regard to cosmetic formatting or the history of
how such control structures and expressions came into being.
Differencing algorithms such as the one published by Sudarshan S.
Chawathe are good for this, if they could be fitted with a good user
interface. (See "Comparing Hierarchical Data in External Memory",
Proceedings of the 25th VLDB Conference, Edinburgh, Scotland, 1999
for the published hierarchical diff algorithm. It might also be
available for download from the University of Maryland.) This could
be extended to implement a 3-way merge tool, too.
The bottom line here is that, given a working copy of the user's
data, and the identities of the ancestor and contributing versions,
the RCS differencing and patching algorithms would efficiently
construct complete copies of the ancestor and contributing versions.
Then a 3-way merge tool would that is specific to the type of data
would be applied to the complete copies to give the user the view he
wants of his data.
Getting back to the original topic, XML is also a well-behaved
hierarchical data format. The type of modification that I propose
for CVS would apply equally well to it.
- How to treat XML files checked into CVS, Andy, 2008/04/15
- RE: How to treat XML files checked into CVS, Arthur Barrett, 2008/04/15
- RE: How to treat XML files checked into CVS, Arthur Barrett, 2008/04/15
- RE: How to treat XML files checked into CVS, Arthur Barrett, 2008/04/15
- RE: How to treat XML files checked into CVS, Arthur Barrett, 2008/04/15
- Re: How to treat XML files checked into CVS,
Paul Sander <=
- RE: How to treat XML files checked into CVS, Bulgrien, Kevin, 2008/04/16
- RE: How to treat XML files checked into CVS, Arthur Barrett, 2008/04/28