sks-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Sks-devel] sks-peer.spodhuis.org outage - apology


From: Phil Pennock
Subject: [Sks-devel] sks-peer.spodhuis.org outage - apology
Date: Mon, 16 Jun 2014 21:12:58 -0400

Sorry folks, I goofed up yesterday and didn't notice until just now.
Almost 20 hours of outage.

After sks-peer.spodhuis.org moved to new hardware, with a newer OS, I
set it up running in a FreeBSD Jail (they're _much_ nicer to create with
ZFS available).

Over the weekend, I set up poudriere to provide packages locally and
used an overlay to create meta-packages for each jail.  Last last night,
when very tired, I audited and decided I wasn't using daemontools
anywhere and didn't know why it was even installed, since I normally use
runit, so I nuked it.  I suspect that when I checked all the services, I
just checked that http://sks.spodhuis.org/ loaded in a browser instead
of checking the actual SKS service.

I just noticed that SKS wasn't running on my box (yes, I need real
monitoring for it) and investigated; turns out, there was one jail where
daemontools was used instead of runit, since I'd just migrated my
previous setup across.

    # pkg -j sks install daemontools
    # service jail restart sks
    # fgrep daemontools /jails/sks/var/log/messages 
    May 11 17:47:19 sks pkg: daemontools-0.76_16 installed
    Jun 16 01:01:46 sks pkg: daemontools-0.76_16 deinstalled
    Jun 16 21:00:38 sks pkg: daemontools-0.76_16 installed

Those timestamps are created by pkg and so have picked up a $TZ of EDT,
GMT-4.  The outage began 20 minutes after deinstall with a restart of
the jail, to test everything.

I think a simple HTTP prober for my monitoring, which checks
`:11371/pks/lookup?op=stats`, is in my near future.

-Phil



reply via email to

[Prev in Thread] Current Thread [Next in Thread]