[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Taking advantage of L1 and L2 cache in sort
From: |
Chen Guo |
Subject: |
Re: Taking advantage of L1 and L2 cache in sort |
Date: |
Tue, 2 Mar 2010 11:37:57 -0800 (PST) |
Forgot to CC the list:
> I did a quick time -v, and found that sorting a 96M file, with -S500M
> there were 36358 page faults, and only 5380 page faults with -S10M.
>
> Wow.
>
> So system time goes up, but user time goes down. It seems odd
> that user time would go down, but I believe it's in the output of
> the merging.
>
> In internal sort, the output occurs after all the merging's finished,
> while in external merge the output occurs as each line is being
> output. With my group working on parallel sort, we noticed a ~14%
> speedup when we output to the top level of merging, as opposed
> to all at once after the sort is completed.
>
> bash-3.2$ /usr/bin/time -v sort -S10M randL > /dev/null
> Command being timed: "sort -S10M randL"
> User time (seconds): 4.74
> System time (seconds): 0.57
> Percent of CPU this job got: 99%
> Elapsed (wall clock) time (h:mm:ss or m:ss): 0:05.32
> Average shared text size (kbytes): 0
> Average unshared data size (kbytes): 0
> Average stack size (kbytes): 0
> Average total size (kbytes): 0
> Maximum resident set size (kbytes): 0
> Average resident set size (kbytes): 0
> Major (requiring I/O) page faults: 0
> Minor (reclaiming a frame) page faults: 5380
> Voluntary context switches: 14
> Involuntary context switches: 11
> Swaps: 0
> File system inputs: 0
> File system outputs: 0
> Socket messages sent: 0
> Socket messages received: 0
> Signals delivered: 0
> Page size (bytes): 4096
> Exit status: 0
> bash-3.2$ /usr/bin/time -v sort -S500M randL > /dev/null
> Command being timed: "sort -S500M randL"
> User time (seconds): 5.27
> System time (seconds): 0.28
> Percent of CPU this job got: 99%
> Elapsed (wall clock) time (h:mm:ss or m:ss): 0:05.56
> Average shared text size (kbytes): 0
> Average unshared data size (kbytes): 0
> Average stack size (kbytes): 0
> Average total size (kbytes): 0
> Maximum resident set size (kbytes): 0
> Average resident set size (kbytes): 0
> Major (requiring I/O) page faults: 0
> Minor (reclaiming a frame) page faults: 36358
> Voluntary context switches: 3
> Involuntary context switches: 11
> Swaps: 0
> File system inputs: 0
> File system outputs: 0
> Socket messages sent: 0
> Socket messages received: 0
> Signals delivered: 0
> Page size (bytes): 4096
> Exit status: 0
>
>
>
>
>
> ----- Original Message ----
> > From: Philip Rowlands
> > To: Pádraig Brady
> > Cc: Report bugs to ; Joey Degges
> > Sent: Tue, March 2, 2010 5:21:15 AM
> > Subject: Re: Taking advantage of L1 and L2 cache in sort
> >
> > On Tue, 2 Mar 2010, Pádraig Brady wrote:
> >
> > > Currently when sorting we take advantage of the RAM vs disk
> > > speed bump by using a large mem buffer dependent on the size of RAM.
> > > However we don't take advantage of the cache layer in the
> > > memory hierarchy which has an increasing importance in modern
> > > systems given the disparity between CPU and RAM speed increases.
> > [snip data]
> >
> > Interesting results; this type of analysis might also benefit from running
> > the
>
> > various tests under cachegrind, which would give detailed results about
> > L1/L2
> > cache miss rates.
> >
> >
> > Cheers,
> > Phil