duplicity-talk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Duplicity-talk] [patch] massive performance fix for large volume si


From: Peter Schuller
Subject: Re: [Duplicity-talk] [patch] massive performance fix for large volume sizes
Date: Mon, 10 Sep 2007 20:15:29 +0200
User-agent: Mutt/1.5.16 (2007-06-09)

>       def get_data_block(self, fp, max_size):
>               """Return pair (next data block, boolean last data block)"""
> -             buf = fp.read(max_size)
> +             buf = fp.read(min(max_size, 64*1024))
>               if len(buf) < max_size:
>                       if fp.close(): raise DiffDirException("Error closing 
> file")
>                       return (buf, 1)

This is broken. I just noticed that python, being python, has strange
read() semantics - a read(n) is actually guaranteed to return n bytes
except on EOF, which the duplicity code is written to exploit. As a
result none of the code seems to be written to handle multiple reads
being required to fill a block. Will have to look at it properly to
come up with a proper fix. In the mean time, don't apply this one. I
patched it like this since in any normal API a read() is pretty much
always guaranteed to return short unless you explicitly asked for
other behavior, so capping the read felt like a very safe
change... apparantly not.

Note however that the performance improvement is not bugus; the reason
I discovered this to begin with was that it took seconds to process
even tiny files of a coupleof kilobytes large. In other worse, the
difference does not all lie in the fact that you end up reading less
data.

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <address@hidden>'
Key retrieval: Send an E-Mail to address@hidden
E-Mail: address@hidden Web: http://www.scode.org

Attachment: pgpqFY8PyFP48.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]