bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#10877: Wimpy external files.


From: Rogier Wolff
Subject: bug#10877: Wimpy external files.
Date: Sun, 26 Feb 2012 21:18:25 +0100
User-agent: Mutt/1.5.20 (2009-06-14)

On Sat, Feb 25, 2012 at 11:26:01PM -0800, Paul Eggert wrote:
> On 02/25/2012 11:14 PM, Rogier Wolff wrote:
> 
> > Many modern operating systems do "lazy" allocation.
> 
> Sure, that's an old trick.  But this has its own problems:
> it can mean a process *thinks* it has memory allocated, but it
> doesn't *really* have the memory; which means when it tries to
> actually *use* its memory it can get killed.  This is not a direction
> we want 'sort' to head.

Hmm. Ok. 

> > a slight change in the codebase might be in order
> > for "unknown sort size".
> 
> Sorry, I didn't follow the rest of that comment.  Perhaps you
> could suggest a patch?  That might explain things better.
> "diff -u" format is typically best.

This one is more work than 10 minutes. Before I put in the effort I
would like to know if this is something that stands a chance...

Maybe some peudocode helps explain:

Currently there is a 

    bufsize = .... 
    buffer = malloc (bufsize); 

and then during the sorting something like: 

    if (data_in_buffer + new_data_len > bufsize) {
       write_data_from buffer (); 
    }

I propose to make that: 

    bufsize = .... ; // this returns a negative number to indicate it is a 
                     // wild guess, but an upper limit. 

    if (bufsize < 0) {
        curbufsize = MINBUFSIZE; 
        bufsize = -bufsize;
    } else {
        curbufsize = bufsize;
    }
    buffer = malloc (curbufsize); 

and then during the sorting: 

    if (data_in_buffer + new_data_len > curbufsize) {
       curbufsize *= 2; 
       if (curbufsize > bufsize) curbufsize = bufsize; 
       buffer = realloc (buffer, curbufsize);
       if (data_in_buffer + new_data_len > curbufsize) {
          write_data_from buffer (); 
       }
       write_data_from buffer (); 
    }

i.e. we determine an upper limit at "guessing time", and increase the
memory buffer up to that limit when the small default buffer ends up
being too small.

        Roger.

-- 
** address@hidden ** http://www.BitWizard.nl/ ** +31-15-2600998 **
**    Delftechpark 26 2628 XH  Delft, The Netherlands. KVK: 27239233    **
*-- BitWizard writes Linux device drivers for any device you may have! --*
The plan was simple, like my brother-in-law Phil. But unlike
Phil, this plan just might work.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]