[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Sks-devel] SKS Memory pattern anomaly
From: |
Jonathon Weiss |
Subject: |
Re: [Sks-devel] SKS Memory pattern anomaly |
Date: |
Wed, 13 Mar 2019 17:02:24 -0400 (EDT) |
User-agent: |
Alpine 2.20 (DEB 67 2015-01-07) |
Jeremy,
When I applied the recommended configuration (especially: "command_timeout:
600") to allow sufficient time to merge some of the really large keys that became an
issue earlier this year, I noticed a significant memory spike (I think SKS was actively
merging one of those large keys at the time). I suspect that SKS is memory inefficient
in that operation, but my experiences are probably only worth a little more than any
random anecdata. I ended up throwing some more RAM at my server, and yes, the merge
literally took a few minutes.
Jonathon
Jonathon Weiss <address@hidden>
MIT/IS&T/Cloud Platforms
On Tue, 5 Mar 2019, Jeremy T. Bouse wrote:
So I have all my nodes synced with around 5445343 keys after disabling all
external peering and letting the 4 nodes get in sync. I then added a single
external peer and then my SKS DB process goes into a
funky state once it begins peering externally. I installed strace and when I
ran a 'strace -q -c' against the 'sks db' process on the 3 secondary nodes it
comes out with something along the lines of:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.004000 500 8 select
0.00 0.000000 0 5 read
0.00 0.000000 0 7 write
0.00 0.000000 0 5 close
0.00 0.000000 0 2 stat
0.00 0.000000 0 10 10 lseek
0.00 0.000000 0 3 brk
0.00 0.000000 0 16 alarm
0.00 0.000000 0 5 accept
------ ----------- ----------- --------- --------- ----------------
100.00 0.004000 61 10 total
These are running what appears to be normal with under 10% CPU and 10% MEM
according to ps.. My primary node on the other hand is another story
entirely... For almost a half hour now it has shown using
80% MEM and CPU % varies but strace shows a totally different pattern.
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
99.77 0.026483 9 2865 pread64
0.23 0.000060 0 2563 pwrite64
0.00 0.000000 0 141 ftruncate
------ ----------- ----------- --------- --------- ----------------
100.00 0.026543 5569 total
I don't know enough about the internal operations to know exactly what's going
on but given the high memory usage and the fact my node is running 1 CPU core
at 100% idle while the other is in nearly 100%
io wait state that it's caught in some sore of loop unable to get out of it. If
I look under /var/lib/sks/DB I've got multiple 100MB log files starting to
build up but no other files appears to have their
timestamps updating showing any sign of modification/update except the
__db.00[123] files.
Anyone have any thoughts for next steps?
On 3/5/2019 1:09 AM, Jeremy T. Bouse wrote:
Has anyone else been monitoring the memory pattern for SKS and
noticed an exceedingly high memory usage pattern? My secondary nodes are
generally showing < 11% of the instance memory used but for some reason
I'm seeing my primary node using nearly 100% of memory, and CPU for that
matter. My primary node is the only one peering outside my network and
has a limited number of peers while the secondary nodes only peer with
themselves and the primary. I've placed the short-circuit hack to NGINX
for the bad keys that have been mentioned which has shown to lower CPU
usage overall but nothing has seemed to improve the primary node. I see
my primary spend much of the time at 100% CPU and 50-90% Memory while
it's in recon mode and it only appears to dip down when it recalculates
it's stats.
_______________________________________________
Sks-devel mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/sks-devel