Re: How to treat XML files checked into CVS

info-cvs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to treat XML files checked into CVS

From:	Paul Sander
Subject:	Re: How to treat XML files checked into CVS
Date:	Wed, 16 Apr 2008 01:07:21 -0700


On Apr 15, 2008, at 8:34 PM, Arthur Barrett wrote:

The thread began on September 14, 2001, with the subject "giving up
CVS".  A patch was posted with the subject " Demo of extensible merge
(was Re: giving up CVS)".



Hmmm interesting... Not really sure if that covers all the possible
places a merge is initiated, but still insteresting.

At the time I posted the patch, CVS had a wrapper function around the3-way merge. That wrapper function was called from every place wherea merge could be initiated. (At least, this is applied to everymerge that affected user-visible artifacts, i.e. files located in hissandbox.) So that was the proper place to insert the extensions.

Keep in mind that the patch was done as a proof of concept, and itwas not intended to be a model for production code. Relying onnaming conventions to ascertain data types is not robust enough, inmy opinion. Reading the file for magic numbers or other identifyingtraits is better, and it might even be the best way given that CVSdoes not guarantee that every version stored in an RCS container hasthe same data type. (If it would, then something like a MIME typestored in the RCS file admin phrase would be best. But making thatguarantee in CVS would require a redesign from the ground up, andcalls out other well-worn arguments.)

Do you still see this as a requirement?

I most certainly do. This is one of several missing features that Iconsider to be essential for anything other than small toy projects.Over time, members of this forum have raised the issue of mergingmany types of data, including: Document formats like MS Word andFrame Maker; mark-up languages such as XML and HTML; image filesinclude GIF, JPEG, and PNG; motion picture formats such as MPEG;composite data types like those used by NeXT Step and VLSI designtools. Even those who have opposed adding such extensibility haveclaimed to wish for better merge capability for their chosenprogramming languages than is possible with a diff3-based tool likethe one supplied with CVS, by somehow bringing the tool "closer to"the language.

I will go so far as to claim that differencing and merging algorithmscan be developed for every type of data, including those lost causeslisted above. The degenerate case for merging is a simple selection,but even image files such as JPEG can have a meaningful merge ifsomeone could design a proper user experience. I imagine a mergetool with four tiled images; three allow lasso-style selections andrepresent the contributor, ancestor, and working versions; the finalone has all of the editing capabilities of, say, Photoshop, and intowhich selections from the other three images can be pasted. Thefinal image is the one that replaces the working version and iseventually committed at the completion of a larger merge across theproject.

But no one has built tools for this purpose. Apparently there justhasn't been much demand for them. But if we implement the hooks toour version control tools to enable this capability, the demand mayslowly follow.

Do you have any response to the arguments I raised about how peopleuse
merge tools?

I disagree that the setting up an external application to perform amerge is a complex and messy proposition. I agree that the mergetools for many data types would probably rely on a GUI. I alsobelieve that some adjustment of the existing CVS user interface maybe desirable; most data types don't lend themselves to the kind ofconflict mark-ups that we're used to in ASCII file formats. So the -kb style of handling might be necessary for most data types, alongwith an additional "cvs merge" command that invokes the proper mergetool to resolve conflicts detected during past updates and rememberedin CVS' sandbox metadata, using the working copy and the fetchedancestor and contributor copies. In situations where the mark-upsare useful, the merge tool might simply be the user's favorite texteditor.

For situations in which merges are initiated by wrapper tools (suchas WinMerge), such tools should embrace the full capability of theunderlying tool to the extent that is practical. If using WinMergecauses merge history to be lost, then there's something wrong withthe integration: Either there aren't enough hooks in the lower-leveltool to give access to that level of detail, or the higher-level toollacks the ability to invoke the lower functions properly. In eithercase, at least one of the tools wasn't thought out well enough topermit the kind of tight integration that is really needed.

CVSNT certainly already has an alternative 'diff' mechanism used

(optionally) to create the deltas for binary files (-kB), and I cansee

it as a relatively painless proposition to add '-kE' to use an
extensible method if this is still relavent.

There's a big difference between the diff algorithm used to computethe deltas stored in the version containers, versus the diffalgorithm used to present differences to the user. I claim thatthese can, and indeed should, be different in the typical case.

RCS uses a longest common substring algorithm as a form ofcompression to minimize the size of the deltas between adjacentversions. This makes the storage of version history efficient, andsmall deltas make for fast reconstruction of versions. This is as itshould be.

However, suppose a Java programmer wants to see differences betweentwo versions of his source file. It turns out that Java is a fairlywell behaved hierarchical language (unlike C or C++, due to the macropreprocessor). This user would rather have his deltas presented tohim in a way that reflects the structure of his program: Theinsertion, deletion, or modification of control structures andexpressions without regard to cosmetic formatting or the history ofhow such control structures and expressions came into being.Differencing algorithms such as the one published by Sudarshan S.Chawathe are good for this, if they could be fitted with a good userinterface. (See "Comparing Hierarchical Data in External Memory",Proceedings of the 25th VLDB Conference, Edinburgh, Scotland, 1999for the published hierarchical diff algorithm. It might also beavailable for download from the University of Maryland.) This couldbe extended to implement a 3-way merge tool, too.

The bottom line here is that, given a working copy of the user'sdata, and the identities of the ancestor and contributing versions,the RCS differencing and patching algorithms would efficientlyconstruct complete copies of the ancestor and contributing versions.Then a 3-way merge tool would that is specific to the type of datawould be applied to the complete copies to give the user the view hewants of his data.

Getting back to the original topic, XML is also a well-behavedhierarchical data format. The type of modification that I proposefor CVS would apply equally well to it.

[Prev in Thread]

Current Thread

[Next in Thread]

How to treat XML files checked into CVS, Andy, 2008/04/15
- Re: How to treat XML files checked into CVS, Larry Jones, 2008/04/15
  - Re: How to treat XML files checked into CVS, Yves Dorfsman, 2008/04/15
    - Re: How to treat XML files checked into CVS, Paul Sander, 2008/04/15
- RE: How to treat XML files checked into CVS, Arthur Barrett, 2008/04/15
- RE: How to treat XML files checked into CVS, Arthur Barrett, 2008/04/15
  - Re: How to treat XML files checked into CVS, Paul Sander, 2008/04/15
- RE: How to treat XML files checked into CVS, Arthur Barrett, 2008/04/15
  - Re: How to treat XML files checked into CVS, Paul Sander, 2008/04/15
- RE: How to treat XML files checked into CVS, Arthur Barrett, 2008/04/15
  - Re: How to treat XML files checked into CVS, Paul Sander <=
- RE: How to treat XML files checked into CVS, Bulgrien, Kevin, 2008/04/16
  - Re: How to treat XML files checked into CVS, Yves Dorfsman, 2008/04/19
- RE: How to treat XML files checked into CVS, Arthur Barrett, 2008/04/28

Prev by Date: RE: How to treat XML files checked into CVS
Next by Date: RE: How to treat XML files checked into CVS
Previous by thread: RE: How to treat XML files checked into CVS
Next by thread: RE: How to treat XML files checked into CVS
Index(es):
- Date
- Thread