sks-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Sks-devel] SKS Performance oddity


From: Jeremy T. Bouse
Subject: Re: [Sks-devel] SKS Performance oddity
Date: Sat, 9 Mar 2019 13:30:38 -0500
User-agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.5.3

On 3/9/2019 5:29 AM, Michiel van Baak wrote:
> 
> Hey,
> 
> I hav exactly the same problem.
> Several times in the last month I have done the following steps:
> 
> - Stop all nodes
> - Destroy the datasets (both db and ptree)
> - Load in a new dump from max 2 days old
> - Create the ptree database
> - Start sks on the primary node, without peering configured (comment out
>   all peers)
> - Give it some time to start
> - Check the stats page and run a couple of searches
> # Up until here everything works fine #
> - Add the outside peers on the primary node and restart it
> - After 5 minutes the machine takes 100% CPU, is stuck in I/O most of
>   the time and falls off the grid
> 
> It doesn't matter if I enable peering with the internal nodes or not.
> Just having 1 SKS instance running, and peering it with the network is
> enough to basically render this instance unusable.
> 
> Like you, I tried in a vm first, and also on a physical machine (dual
> 6-core xeon E5-2620 0 @ 2.00GHz with 96GB ram and 2 samsung evo 840 pro
> ssds for storage)
> I see exactly the same every time I follow the steps outlined above.
> 
> The systems I tried are Debian linux and FreeBSD and all the same.
> 

I've been trying to narrow it down and zero in on something to fix it,
though I admittedly don't know that much about the internal functions of
the process flow. I have noticed that the issue is not the recon service
itself, despite it appearing so blatantly during the recon mode. It
appears to be from my observation actually the DB service.

At this point I have 5 nodes, sks01 - sks04 are my original 4 VM nodes
all with 2 vCPU/4GB except sks01 which is 4 vCPU/8GB, and then sks0
which is my physical server with 4 core Xeon with 4GB RAM. Currently
sks0 is setup to be my external peering point, originally it was sks01.
I have just finished re-importing the keydump into sks0 and sks01 from
the daily dumps from mattrude.com for 2019-03-08 and 2019-03-09
respectively.

I'm running the following command from another machine to check on things:

>  for I in $(seq 50 54); do echo .${I}; ssh 172.16.20.${I} 'uptime; ps aux| 
> grep sks |grep -v grep; time curl -sf localhost:11371/pks/lookup?op=stats 
> |grep keys:'; echo; done

.50
 18:14:26 up 1 day, 11:30,  7 users,  load average: 0.10, 0.69, 1.31
debian-+ 24595 17.5 13.5 605012 540968 ?       Ss   15:32  28:32
/usr/sbin/sks -stdoutlog db
debian-+ 24596  0.3  0.8  72528 32740 ?        Ss   15:32   0:37
/usr/sbin/sks -stdoutlog recon
<h2>Statistics</h2><p>Total number of keys: 5448526</p>

real    0m0.014s
user    0m0.004s
sys     0m0.004s

.51
 18:14:28 up 1 day, 14:03,  4 users,  load average: 1.30, 1.65, 1.49
debian-+  5166 32.4 36.0 3059044 2950716 ?     Ss   15:37  51:01
/usr/sbin/sks -stdoutlog db
debian-+  5167  0.5  4.0 603644 331260 ?       Ss   15:37   0:48
/usr/sbin/sks -stdoutlog recon
<h2>Statistics</h2><p>Total number of keys: 5448005</p>

real    0m0.022s
user    0m0.012s
sys     0m0.000s

.52
 18:14:30 up 7 days, 19:21,  4 users,  load average: 0.98, 0.38, 0.31
debian-+  6234  0.5 38.6 1609044 1565612 ?     Rs   Mar06  30:33
/usr/sbin/sks -stdoutlog db
debian-+  6235  0.0  3.8 356328 156708 ?       Ss   Mar06   0:51
/usr/sbin/sks -stdoutlog recon
<h2>Statistics</h2><p>Total number of keys: 5447149</p>

real    1m46.269s
user    0m0.012s
sys     0m0.000s

.53
 18:16:17 up 7 days, 19:28,  4 users,  load average: 2.01, 1.55, 0.85
debian-+  5754  0.6 13.6 590840 551360 ?       Ds   Mar05  37:20
/usr/sbin/sks -stdoutlog db
debian-+  5755  0.0  3.1 266908 126064 ?       Ss   Mar05   1:59
/usr/sbin/sks -stdoutlog recon
<h2>Statistics</h2><p>Total number of keys: 5447523</p>

real    0m46.400s
user    0m0.008s
sys     0m0.004s

.54
 18:17:05 up 7 days, 19:28,  4 users,  load average: 1.88, 0.87, 0.41
debian-+  5994  0.6 18.5 791456 752596 ?       Ss   Mar05  35:24
/usr/sbin/sks -stdoutlog db
debian-+  5995  0.0  3.0 260224 122112 ?       Ds   Mar05   1:45
/usr/sbin/sks -stdoutlog recon
<h2>Statistics</h2><p>Total number of keys: 5447788</p>

real    0m0.015s
user    0m0.008s
sys     0m0.000s


For stability sake I'd removed sks0 and sks01 from my NGINX upstreams,
the exception to this is that I have

    location /pks/hashquery {
        proxy_method POST;
        proxy_pass http://127.0.0.1:11371;
    }

so that /pks/hashquery doesn't use the server pool but uses the local
SKS instance. So on sks0 it is only seeing traffic to all traffic to
11370/tcp and only traffic for /pks/hashquery URI to 11371/tcp. All
other /pks URI requests are going to the backend and hitting sks02 - sks04.

I have found some improvement with changes to the *pagesize settings
before re-importing the keydump. Currently all my nodes have had their
data re-imported using the following settings:

pagesize:          128
keyid_pagesize:    64
meta_pagesize:     1
subkeyid_pagesize: 128
time_pagesize:     128
tqueue_pagesize:   1
ptree_pagesize:    8

I also have the hack to short-circuit the bad actor keys that had been
mentioned on the list using:

        if ( $arg_search ~*
"(0x1013D73FECAC918A0A25823986CE877469D2EAD9|0x2016349F5BC6F49340FCCAF99F9169F4B33B4659|0xB33B4659|0x69D2EAD9)"
) {
            return 444;
        }

Which has resulted in me no longer seeing the flood of the requests for
them in my SKS log. The key difference of note from my config vs what
I'd seen others mention on the list is that I'm looking only at the
'search' query arg not the full query string so I'm able to catch it
regardless of 'op', ''options' or any other argument given.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]