sks-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Sks-devel] SKS Memory pattern anomaly


From: Jonathon Weiss
Subject: Re: [Sks-devel] SKS Memory pattern anomaly
Date: Wed, 13 Mar 2019 17:02:24 -0400 (EDT)
User-agent: Alpine 2.20 (DEB 67 2015-01-07)

Jeremy,

When I applied the recommended configuration (especially: "command_timeout: 
600") to allow sufficient time to merge some of the really large keys that became an 
issue earlier this year, I noticed a significant memory spike (I think SKS was actively 
merging one of those large keys at the time).  I suspect that SKS is memory inefficient 
in that operation, but my experiences are probably only worth a little more than any 
random anecdata.  I ended up throwing some more RAM at my server, and yes, the merge 
literally took a few minutes.

        Jonathon

        Jonathon Weiss <address@hidden>
        MIT/IS&T/Cloud Platforms


On Tue, 5 Mar 2019, Jeremy T. Bouse wrote:


So I have all my nodes synced with around 5445343 keys after disabling all 
external peering and letting the 4 nodes get in sync. I then added a single 
external peer and then my SKS DB process goes into a
funky state once it begins peering externally. I installed strace and when I 
ran a 'strace -q -c' against the 'sks db' process on the 3 secondary nodes it 
comes out with something along the lines of:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.004000         500         8           select
  0.00    0.000000           0         5           read
  0.00    0.000000           0         7           write
  0.00    0.000000           0         5           close
  0.00    0.000000           0         2           stat
  0.00    0.000000           0        10        10 lseek
  0.00    0.000000           0         3           brk
  0.00    0.000000           0        16           alarm
  0.00    0.000000           0         5           accept
------ ----------- ----------- --------- --------- ----------------
100.00    0.004000                    61        10 total

These are running what appears to be normal with under 10% CPU and 10% MEM 
according to ps.. My primary node on the other hand is another story 
entirely... For almost a half hour now it has shown using
80% MEM  and CPU % varies but strace shows a totally different pattern.

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.77    0.026483           9      2865           pread64
  0.23    0.000060           0      2563           pwrite64
  0.00    0.000000           0       141           ftruncate
------ ----------- ----------- --------- --------- ----------------
100.00    0.026543                  5569           total

I don't know enough about the internal operations to know exactly what's going 
on but given the high memory usage and the fact my node is running 1 CPU core 
at 100% idle while the other is in nearly 100%
io wait state that it's caught in some sore of loop unable to get out of it. If 
I look under /var/lib/sks/DB I've got multiple 100MB log files starting to 
build up but no other files appears to have their
timestamps updating showing any sign of modification/update except the 
__db.00[123] files.

Anyone have any thoughts for next steps?

On 3/5/2019 1:09 AM, Jeremy T. Bouse wrote:

    Has anyone else been monitoring the memory pattern for SKS and
noticed an exceedingly high memory usage pattern? My secondary nodes are
generally showing < 11% of the instance memory used but for some reason
I'm seeing my primary node using nearly 100% of memory, and CPU for that
matter. My primary node is the only one peering outside my network and
has a limited number of peers while the secondary nodes only peer with
themselves and the primary. I've placed the short-circuit hack to NGINX
for the bad keys that have been mentioned which has shown to lower CPU
usage overall but nothing has seemed to improve the primary node. I see
my primary spend much of the time at 100% CPU and 50-90% Memory while
it's in recon mode and it only appears to dip down when it recalculates
it's stats.


_______________________________________________
Sks-devel mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/sks-devel

reply via email to

[Prev in Thread] Current Thread [Next in Thread]