[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Sks-devel] sks-peer.spodhuis.org outage - apology
From: |
Phil Pennock |
Subject: |
[Sks-devel] sks-peer.spodhuis.org outage - apology |
Date: |
Mon, 16 Jun 2014 21:12:58 -0400 |
Sorry folks, I goofed up yesterday and didn't notice until just now.
Almost 20 hours of outage.
After sks-peer.spodhuis.org moved to new hardware, with a newer OS, I
set it up running in a FreeBSD Jail (they're _much_ nicer to create with
ZFS available).
Over the weekend, I set up poudriere to provide packages locally and
used an overlay to create meta-packages for each jail. Last last night,
when very tired, I audited and decided I wasn't using daemontools
anywhere and didn't know why it was even installed, since I normally use
runit, so I nuked it. I suspect that when I checked all the services, I
just checked that http://sks.spodhuis.org/ loaded in a browser instead
of checking the actual SKS service.
I just noticed that SKS wasn't running on my box (yes, I need real
monitoring for it) and investigated; turns out, there was one jail where
daemontools was used instead of runit, since I'd just migrated my
previous setup across.
# pkg -j sks install daemontools
# service jail restart sks
# fgrep daemontools /jails/sks/var/log/messages
May 11 17:47:19 sks pkg: daemontools-0.76_16 installed
Jun 16 01:01:46 sks pkg: daemontools-0.76_16 deinstalled
Jun 16 21:00:38 sks pkg: daemontools-0.76_16 installed
Those timestamps are created by pkg and so have picked up a $TZ of EDT,
GMT-4. The outage began 20 minutes after deinstall with a restart of
the jail, to test everything.
I think a simple HTTP prober for my monitoring, which checks
`:11371/pks/lookup?op=stats`, is in my near future.
-Phil
- [Sks-devel] sks-peer.spodhuis.org outage - apology,
Phil Pennock <=