patch to speed up slow rcs patch processing.

bug-cvs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

patch to speed up slow rcs patch processing.

From:	Michael Smith
Subject:	patch to speed up slow rcs patch processing.
Date:	Thu, 06 Oct 2005 12:58:58 -0500
User-agent:	Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20031014 Thunderbird/0.3

Attached is the diff for the file src/rcs.c in the version 1.12.12 ofcvs I downloaded from the cvs site. This single file is sufficient tofix the problem I detected.

This patch addresses a problem we were having with a cvs file that was300K lines long with a patch diff with 200K entries. The update pastthe head (i.e. version 1.2 where 1.5 is head) entry took upwards of 2-4minutes per file because the complexity of the patch code was O(m*n)where n=# of lines and m=# of updates. The patch provided here changesthis to O(n) complexity (which completes the same operation in 4-6 seconds).


The changes I made from the pre-existing code were to:

1. reverse the order of the fragments to a in-order instead of a reverseorder queue2. perform a single pass to copy the non-deleted and new elements into anew linevector structure

3. copy the result back to the original line structure.

Please consider adding this patch to the release as it can make a hugedifference in functionality for projects tracking files which can havethis occur (this file was XML with new data values).

I ran the unit tests with this patch and the changes appear to pass alltests. I did receive some sort of IO error during the sanity.sh -ptests, but it appear to be innocuous.

I have not added any tests for this patch because there are nofunctional input/output relationship changes from this modification.


As per the HACKING document, I am including the consent to use this patch:

"I grant permission to distribute this patch under the terms of the GNUPublic License"

Please let me know if you have any questions about how/why something wasdone. I would also be curious if/when you might be adding the patch --although I realize that this isn't generally your policy.


--

Michael J. Smith
Software Engineer
SAIC, IDE site
Work: (321) 235-7613
mailto: msmith@ideorlando.org

7035d7034
<     struct deltafrag *dftail;
7038,7039d7036
<       int numlines, lastmodline, offset;
<     struct linevector *orig_lines = lines;
7042d7038
<     numlines = lines->nlines; // start with init # of lines
7052,7060c7048,7049
<     df->next = NULL;
<     if (dfhead == NULL) {
<             dfhead = df;
<     }
<     else {
<         dftail->next = df;
<     }
<     dftail = df;
< 
---
>       df->next = dfhead;
>       dfhead = df;
7101d7089
<         numlines+=df->nlines;
7111d7098
<         numlines-=df->nlines;
7115,7128d7101
<       // new temp data structure to hold new org before
<       // copy back into original structure.
<     lines = xmalloc (sizeof (struct linevector));
<     lines->nlines = lines->lines_alloced = numlines;
<     lines->vector = xnrealloc(NULL,
<                  numlines, sizeof(*lines->vector));
< 
< 
<     // we changed the list order to first to last -- so the
<     // list never gets larger than the size numlines
<     lastmodline = 0; 
<     offset = 0; // offset created when adding/removing lines
<                 // between new and original structure
< 
7132,7142c7105
<         unsigned int ln;
< 
<         // here we need to get to the line where the next insert will
<         // begin which is <df->pos> we will fill up to df->pos with
<         // original items
<         unsigned int deltaend = df->pos-offset;
<         
<         for(;lastmodline<deltaend;lastmodline++) {
<             // we need to copy from the orig structure into new one
<             lines->vector[lastmodline] = 
orig_lines->vector[lastmodline+offset];
<         }
---
>       unsigned int ln;
7151,7200c7114,7116
<                     {
< 
<     const char *textend, *p;
<     const char *nextline_text;
<     struct line *q;
<     int nextline_newline;
<     size_t nextline_len;
<     int online;
< 
<     textend = df->new_lines + df->len;
<     nextline_newline = 0;
<     nextline_text = df->new_lines;
<     online = 0; // which line we are currently adding
<     for (p = df->new_lines; p < textend; ++p) {
<       if (*p == '\n')
<       {
<           nextline_newline = 1;
<           if (p + 1 == textend)
<                   /* If there are no characters beyond the last newline, we
<                       don't consider it another line.  */
<                   break;
< 
<           nextline_len = p - nextline_text;
<           q = xmalloc (sizeof (struct line) + nextline_len);
<           q->vers = addvers;
<           q->text = (char *)q + sizeof (struct line);
<           q->len = nextline_len;
<           q->has_newline = nextline_newline;
<           q->refcount = 1;
<           memcpy (q->text, nextline_text, nextline_len);
<             lines->vector[lastmodline++] = q;
<             offset--;
<     
<           nextline_text = (char *)p + 1;
<           nextline_newline = 0;
<       }
<     }
<     nextline_len = p - nextline_text;
<     q = xmalloc (sizeof (struct line) + nextline_len);
<     q->vers = addvers;
<     q->text = (char *)q + sizeof (struct line);
<     q->len = nextline_len;
<     q->has_newline = nextline_newline;
<     q->refcount = 1;
<     memcpy (q->text, nextline_text, nextline_len);
<     lines->vector[lastmodline++] = q;
<     offset--; // for each line we add the offset between the #'s increases
< 
<                     }
< 
---
>               if (! linevector_add (lines, df->new_lines, df->len, addvers,
>                                     df->pos))
>                   err = 1;
7203,7205c7119,7120
<         offset+=df->nlines; // we are removing this many lines from the source
<               if (df->pos > orig_lines->nlines
<                   || df->pos + df->nlines > orig_lines->nlines)
---
>               if (df->pos > lines->nlines
>                   || df->pos + df->nlines > lines->nlines)
7209c7124,7125
<                       orig_lines->vector[ln]->vers = delvers;
---
>                       lines->vector[ln]->vers = delvers;
>               linevector_delete (lines, df->pos, df->nlines);
7218,7239d7133
<       // add the rest of the remaining lines to the data vector
<     for(;lastmodline<numlines;lastmodline++) {
<             // we need to copy from the orig structure into new one
<         lines->vector[lastmodline] = orig_lines->vector[lastmodline+offset];
<     }
< 
<     // we didn't make the lines structure fully -- so we can't
<     // use the linevector_copy without issues -- so we do it inline
<       // make the original vector larger if necessary
<     if (numlines > orig_lines->nlines) {
<         orig_lines->vector = xnrealloc (orig_lines->vector,
<                               numlines,
<                               sizeof (*orig_lines->vector));
<               orig_lines->lines_alloced = numlines;
<     }
<     memcpy (orig_lines->vector, lines->vector,
<               xtimes (lines->nlines, sizeof (*orig_lines->vector)));
<     orig_lines->nlines = lines->nlines;
< 
<     free(lines->vector);
<     free(lines); // we don't need it any longer
<

[Prev in Thread]

Current Thread

[Next in Thread]

patch to speed up slow rcs patch processing., Michael Smith <=
- Re: patch to speed up slow rcs patch processing., Bernd Jendrissek, 2005/10/07
  - Re: patch to speed up slow rcs patch processing., msmith, 2005/10/07
  - Re: patch to speed up slow rcs patch processing., Larry Jones, 2005/10/07

Prev by Date: [bug #14721] pam_open_session fails on Solaris (sparc) for CVS >= 1.12.10
Next by Date: cvs-1.12.13 autoconf doesn't verify typedef socklen_t ?
Previous by thread: problem compile static (apparatly between libc.a and regex_internal.c)
Next by thread: Re: patch to speed up slow rcs patch processing.
Index(es):
- Date
- Thread