[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
patch to speed up slow rcs patch processing.
From: |
Michael Smith |
Subject: |
patch to speed up slow rcs patch processing. |
Date: |
Thu, 06 Oct 2005 12:58:58 -0500 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20031014 Thunderbird/0.3 |
Attached is the diff for the file src/rcs.c in the version 1.12.12 of
cvs I downloaded from the cvs site. This single file is sufficient to
fix the problem I detected.
This patch addresses a problem we were having with a cvs file that was
300K lines long with a patch diff with 200K entries. The update past
the head (i.e. version 1.2 where 1.5 is head) entry took upwards of 2-4
minutes per file because the complexity of the patch code was O(m*n)
where n=# of lines and m=# of updates. The patch provided here changes
this to O(n) complexity (which completes the same operation in 4-6 seconds).
The changes I made from the pre-existing code were to:
1. reverse the order of the fragments to a in-order instead of a reverse
order queue
2. perform a single pass to copy the non-deleted and new elements into a
new linevector structure
3. copy the result back to the original line structure.
Please consider adding this patch to the release as it can make a huge
difference in functionality for projects tracking files which can have
this occur (this file was XML with new data values).
I ran the unit tests with this patch and the changes appear to pass all
tests. I did receive some sort of IO error during the sanity.sh -p
tests, but it appear to be innocuous.
I have not added any tests for this patch because there are no
functional input/output relationship changes from this modification.
As per the HACKING document, I am including the consent to use this patch:
"I grant permission to distribute this patch under the terms of the GNU
Public License"
Please let me know if you have any questions about how/why something was
done. I would also be curious if/when you might be adding the patch --
although I realize that this isn't generally your policy.
--
Michael J. Smith
Software Engineer
SAIC, IDE site
Work: (321) 235-7613
mailto: msmith@ideorlando.org
7035d7034
< struct deltafrag *dftail;
7038,7039d7036
< int numlines, lastmodline, offset;
< struct linevector *orig_lines = lines;
7042d7038
< numlines = lines->nlines; // start with init # of lines
7052,7060c7048,7049
< df->next = NULL;
< if (dfhead == NULL) {
< dfhead = df;
< }
< else {
< dftail->next = df;
< }
< dftail = df;
<
---
> df->next = dfhead;
> dfhead = df;
7101d7089
< numlines+=df->nlines;
7111d7098
< numlines-=df->nlines;
7115,7128d7101
< // new temp data structure to hold new org before
< // copy back into original structure.
< lines = xmalloc (sizeof (struct linevector));
< lines->nlines = lines->lines_alloced = numlines;
< lines->vector = xnrealloc(NULL,
< numlines, sizeof(*lines->vector));
<
<
< // we changed the list order to first to last -- so the
< // list never gets larger than the size numlines
< lastmodline = 0;
< offset = 0; // offset created when adding/removing lines
< // between new and original structure
<
7132,7142c7105
< unsigned int ln;
<
< // here we need to get to the line where the next insert will
< // begin which is <df->pos> we will fill up to df->pos with
< // original items
< unsigned int deltaend = df->pos-offset;
<
< for(;lastmodline<deltaend;lastmodline++) {
< // we need to copy from the orig structure into new one
< lines->vector[lastmodline] =
orig_lines->vector[lastmodline+offset];
< }
---
> unsigned int ln;
7151,7200c7114,7116
< {
<
< const char *textend, *p;
< const char *nextline_text;
< struct line *q;
< int nextline_newline;
< size_t nextline_len;
< int online;
<
< textend = df->new_lines + df->len;
< nextline_newline = 0;
< nextline_text = df->new_lines;
< online = 0; // which line we are currently adding
< for (p = df->new_lines; p < textend; ++p) {
< if (*p == '\n')
< {
< nextline_newline = 1;
< if (p + 1 == textend)
< /* If there are no characters beyond the last newline, we
< don't consider it another line. */
< break;
<
< nextline_len = p - nextline_text;
< q = xmalloc (sizeof (struct line) + nextline_len);
< q->vers = addvers;
< q->text = (char *)q + sizeof (struct line);
< q->len = nextline_len;
< q->has_newline = nextline_newline;
< q->refcount = 1;
< memcpy (q->text, nextline_text, nextline_len);
< lines->vector[lastmodline++] = q;
< offset--;
<
< nextline_text = (char *)p + 1;
< nextline_newline = 0;
< }
< }
< nextline_len = p - nextline_text;
< q = xmalloc (sizeof (struct line) + nextline_len);
< q->vers = addvers;
< q->text = (char *)q + sizeof (struct line);
< q->len = nextline_len;
< q->has_newline = nextline_newline;
< q->refcount = 1;
< memcpy (q->text, nextline_text, nextline_len);
< lines->vector[lastmodline++] = q;
< offset--; // for each line we add the offset between the #'s increases
<
< }
<
---
> if (! linevector_add (lines, df->new_lines, df->len, addvers,
> df->pos))
> err = 1;
7203,7205c7119,7120
< offset+=df->nlines; // we are removing this many lines from the source
< if (df->pos > orig_lines->nlines
< || df->pos + df->nlines > orig_lines->nlines)
---
> if (df->pos > lines->nlines
> || df->pos + df->nlines > lines->nlines)
7209c7124,7125
< orig_lines->vector[ln]->vers = delvers;
---
> lines->vector[ln]->vers = delvers;
> linevector_delete (lines, df->pos, df->nlines);
7218,7239d7133
< // add the rest of the remaining lines to the data vector
< for(;lastmodline<numlines;lastmodline++) {
< // we need to copy from the orig structure into new one
< lines->vector[lastmodline] = orig_lines->vector[lastmodline+offset];
< }
<
< // we didn't make the lines structure fully -- so we can't
< // use the linevector_copy without issues -- so we do it inline
< // make the original vector larger if necessary
< if (numlines > orig_lines->nlines) {
< orig_lines->vector = xnrealloc (orig_lines->vector,
< numlines,
< sizeof (*orig_lines->vector));
< orig_lines->lines_alloced = numlines;
< }
< memcpy (orig_lines->vector, lines->vector,
< xtimes (lines->nlines, sizeof (*orig_lines->vector)));
< orig_lines->nlines = lines->nlines;
<
< free(lines->vector);
< free(lines); // we don't need it any longer
<
- patch to speed up slow rcs patch processing.,
Michael Smith <=