monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Steadying monitoring processes without starting them


From: Wind Over Water
Subject: Re: Steadying monitoring processes without starting them
Date: Tue, 17 May 2016 02:03:06 -0400

Hi Russ,

Thanks for the reply.

A couple clarifications -

1) tserver == slave, so when I write one I mean the other.  They are 
effectively the same thing.

2) It is not a goal to have a slave (a tserver) start the master (a namenode) - 
it is only a goal to have the master start the slaves.  AFAIK, we don't want a 
slave to start prior to a master, but that is still to be confirmed.  If a 
slave can start independently and before the accumulo master, then for accumulo 
this issue goes away.  But this issue may still be present for other 
master/slave apps such as hadoop.  Or present in other stacks such as ELK 
(elastic search, logstash, kibana), or tomcat based stacks, etc.

For example, it is not true for zookeeper as multiple zookeeper instances can 
(apparently) start in parallel and independently.

Hope this helps clarify.

Thanks,
-sandy

> On May 16, 2016, at 10:44, Russell Simpkins <address@hidden> wrote:
> 
> Sandy,
> 
> Your situation is a little confusing. I'm not sure how monit, monitoring a 
> slave process, would know to start the master when the slave dies. In any 
> event, consider creating a script/program to monitor your process. In the 
> script, you can test if the tserver is up and then test the slave. If the 
> tserver is down, you can kill the slave. You can do whatever logic is 
> required in the script. Make sure the script always exits with a 0  and it 
> will be executed every check cycle.
> 
> Russ
> 
> 
> On Sat, May 14, 2016 at 1:17 PM, Sandy C <address@hidden> wrote:
> Hi Dominic,
> 
> Thanks for the reply and for the decoding of type-o's - typed the email on a 
> phone.
> 
> But the suggestion doesn't quite address the use case we are trying to get to 
> if I am understanding it correctly.
> 
> Maybe I am thinking about this wrong.  Here is a short background/outer use 
> case:
> 
> We are orchestrating a cluster (a bunch of machines) with various app stacks 
> that have all the colors of the rainbow when it comes to 
> start/stop/management.  Some app stacks can start on N nodes independently 
> while some are very sequential and start only from a single master.  Such as 
> say hadoop or accumulo.
> 
> So for this latter camp, say accumulo, we can use monit to 
> start/monitor/manage the master node which will start the slave apps 
> (tservers) on the slave nodes.
> 
> The difficulty is figuring out how to start monit on the slave nodes such 
> that it will NOT start the tserver process but will only monitor it once it 
> is up, and then restart it if it goes down.
> 
> But, we do not want the monit on the slave to bring up the tserver until the 
> master brings it up.  Which could be days/weeks/ a long time.
> 
> The goal was to do this without having to manage monit itself (as it take 
> monit down or have monit reload new config files when/if the accumulo master 
> starts).
> 
> But I can't yet see how to make that happen.
> 
> Hope this helps.  Thanks in advance for additional replies (from anyone).
> 
> -sandy
> 
> > On May 13, 2016, at 14:13, Dominic Harkness <address@hidden> wrote:
> >
> > You may be able to add a condition like "if does not exist" that will 
> > override the restart if the process isn't running. I'm not sure if you can 
> > avoid monit logging every time it sees the process does not exist, though.
> >
> > For example, if you want monit to double check before issuing a restart you 
> > could say: "if does not exist for 2 cycles then restart". It will log that 
> > the process was not running in the first cycle and then restart the process 
> > if it's not running in the second cycle as well.
> >
> > Hope that helps!
> > Dominic
> 
> 
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general
> 
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general




reply via email to

[Prev in Thread] Current Thread [Next in Thread]