monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Any Ping Sensitivity Adjust


From: Martin Pala
Subject: Re: Any Ping Sensitivity Adjust
Date: Wed, 03 Aug 2005 10:46:42 +0200
User-agent: Mozilla Thunderbird 1.0.6 (Windows/20050716)

oops, thanks :) The checksum was computed from the first array member, thus it was incorrect for second and later echo request which had different sequence id.

Here is updated patch (it is fixed for solaris as well), it should be fine now.

If developers will agree, i can send it to cvs.

Martin


Daniel wrote:
M. D. Parker wrote:

Sorry system is a 266MHz Pentium 2.

==========================================

M. D. Parker
Systems Administrator
General Atomics / Electromagnetic Systems
+1 858 455 2877
address@hidden


-----Original Message-----
From: M. D. Parker [mailto:address@hidden Sent: Tuesday, August 02, 2005 4:49 PM
To: 'This is the general mailing list for monit'
Cc: 'Martin Pala'
Subject: RE: Any Ping Sensitivity Adjust

I compiled and tried the suggested patch against the CVS head.
What happened during the test was that when started ALL hosts using the ICMP
test were flagged as down (and sent alerts accordingly) both with and
without the COUNT option.

Package was compiled on a Intel x32 running Fedora Core 3 with all current
patches.


I also patched it against 4.5.1 after stripping out the .pod patching and it went in cleanly.

Ethereal shows that the ICMP layer checksum on all the echo request packets is incorrect (and therefore packet is ignored/dropped by firewalls). The ICMP checksum also appears not to change across from the 2nd on packet on.

It looks like this call
icmphdrout[i]->checksum= checksum_ip((unsigned char *)icmphdrout, ICMP_SIZE)

and the one below it in net.c (line ~654) need to be changed to

icmphdrout[i]->checksum= checksum_ip((unsigned char *)icmphdrout[i], ICMP_SIZE)


Doing that made it at least mark the hosts as working now and I see the multiple amount of ping requests.

Regards

Daniel

diff -x CVS -Naur monit/CHANGES.txt monit-icmp/CHANGES.txt
--- monit/CHANGES.txt   2005-08-03 08:38:09.688794000 +0000
+++ monit-icmp/CHANGES.txt      2005-08-03 08:36:59.826855000 +0000
@@ -4,6 +4,14 @@
 
 Version 4.5.2
 
+NEW FEATURES AND FUNCTIONS:
+*  Monit now sends three icmp echo requests in one cycle by default.
+   It is possible to customize the echo requests count using the
+   count parameter of icmp test, for example:
+     check host myserver with address 192.168.1.1
+       if failed icmp type echo count 5 with timeout 3 seconds
+       then alert
+
 BUGFIXES:
 *  In the case that kvm access on FreeBSD failed (for example
    because of environment restricted by virtual server), process
diff -x CVS -Naur monit/http/cervlet.c monit-icmp/http/cervlet.c
--- monit/http/cervlet.c        2005-08-03 08:38:12.139059000 +0000
+++ monit-icmp/http/cervlet.c   2005-08-03 08:08:53.419008000 +0000
@@ -1409,9 +1409,9 @@
       a= i->action;
       out_print(res,
         "<tr><td>ICMP</td><td>"
-        "If failed %s with timeout %d seconds then %s else if recovered then 
%s"
+        "If failed %s count %d with timeout %d seconds then %s else if 
recovered then %s"
         "</td></tr>",
-        icmpnames[i->type], i->timeout, actionnames[a->failed->id],
+        icmpnames[i->type], i->count, i->timeout, actionnames[a->failed->id],
         actionnames[a->passed->id]);
     }
   }
diff -x CVS -Naur monit/l.l monit-icmp/l.l
--- monit/l.l   2005-08-03 08:38:09.789579000 +0000
+++ monit-icmp/l.l      2005-08-03 08:05:33.958207000 +0000
@@ -255,6 +255,7 @@
 content           { return CONTENT; }
 pid               { return PID; }
 ppid              { return PPID; }
+count             { return COUNT; }
 {byte}            { return BYTE; }
 {kilobyte}        { return KILOBYTE; }
 {megabyte}        { return MEGABYTE; }
diff -x CVS -Naur monit/monit.pod monit-icmp/monit.pod
--- monit/monit.pod     2005-08-03 08:38:10.532337000 +0000
+++ monit-icmp/monit.pod        2005-08-03 08:12:25.577456000 +0000
@@ -1758,20 +1758,32 @@
 is as follows (keywords are in capital and optional statements in
 [brackets]):
 
-  IF FAILED ICMP TYPE ECHO [WITH] [TIMEOUT number SECONDS] 
+  IF FAILED ICMP TYPE ECHO
+     [COUNT number] [WITH] [TIMEOUT number SECONDS]
      THEN action
      [ELSE IF RECOVERED THEN action]
 
 The rules for action and timeout are the same as those mentioned
-above in the CONNECTION TESTING section. An icmp ping test is
-useful for testing if a host is up, before testing ports at the
-host. If an icmp ping test is used in a check host entry, this
-test is run first and if the ping test should fail we assume that
-the connection to the host is down and monit does I<not> continue
-to test any ports. Here's an example:
+above in the CONNECTION TESTING section. The count parameter
+specifies how many consecutive echo requests will be send to the
+host in one cycle. In the case that no reply came within timeout
+frame, monit reports error. When at least one reply was received,
+the test will pass. Monit sends by default three echo requests in
+one cycle to prevent the random packet loss from generating false
+alarm (i.e. up to 66% packet loss is tolerated). You can set the
+count option to different value, which can serve as error ratio.
+For example in the case that you require 100% ping success, you
+can set the count to 1 (i.e. just one attempt will be send, when
+the packet was lost, then error will be reported).
+
+An icmp ping test is useful for testing if a host is up, before
+testing ports at the host. If an icmp ping test is used in a check
+host entry, this test is run first and if the ping test should fail
+we assume that the connection to the host is down and monit does
+I<not> continue to test any ports. Here's an example:
 
  check host xyzzy with address xyzzy.org
-       if failed icmp type echo with timeout 15 seconds 
+       if failed icmp type echo count 5 with timeout 15 seconds 
           then alert
        if failed port 80 proto http then alert
        if failed port 443 type TCPSSL proto http then alert
@@ -2672,10 +2684,10 @@
 
 I<if>, I<then>, I<else>, I<set>, I<daemon>, I<logfile>,
 I<syslog>, I<address>, I<httpd>, I<ssl>, I<enable>, I<disable>,
-I<pemfile>, I<allow>, I<read-only>, I<check>, I<init>,
+I<pemfile>, I<allow>, I<read-only>, I<check>, I<init>, I<count>,
 I<pidfile>, I<statefile>, I<group>, I<start>, I<stop>, I<uid>,
 I<gid>, I<connection>, I<port(number)>, I<unix(socket)>, I<type>,
-I<proto(col)>, I<tcp>, I<tcpssl>, I<udp>, I<alert>,
+I<proto(col)>, I<tcp>, I<tcpssl>, I<udp>, I<alert>, I<icmp>,
 I<mail-format>, I<restart>, I<timeout>, I<checksum>, I<resource>,
 I<expect>, I<send>, I<mailserver>, I<every>, I<mode>, I<active>,
 I<passive>, I<manual>, I<depends>, I<host>, I<default>, I<http>,
@@ -2829,7 +2841,7 @@
 if not send an alert:
 
  check host www.tildeslash.com with address www.tildeslash.com
-       if failed icmp type echo with timeout 15 seconds
+       if failed icmp type echo count 5 with timeout 15 seconds
           then alert
        alert address@hidden
 
diff -x CVS -Naur monit/monitor.h monit-icmp/monitor.h
--- monit/monitor.h     2005-08-03 08:38:10.840937000 +0000
+++ monit-icmp/monitor.h        2005-08-03 08:05:34.021584000 +0000
@@ -161,6 +161,8 @@
 #define LEVEL_NAME_FULL    "full"
 #define LEVEL_NAME_SUMMARY "summary"
 
+#define ATTEMPT_COUNT      3
+
 /** ------------------------------------------------- Special purpose macros */
 
 
@@ -480,6 +482,7 @@
 /** Defines a ICMP object */
 typedef struct myicmp {
   int type;                                              /**< ICMP type used */
+  int count;                                   /**< ICMP echo requests count */
   int timeout;              /**< The timeout in seconds to wait for response */
   int is_available;                     /**< TRUE if the server is available */
   double response;                              /**< ICMP ECHO response time */
diff -x CVS -Naur monit/monitrc monit-icmp/monitrc
--- monit/monitrc       2005-08-03 08:38:10.913160000 +0000
+++ monit-icmp/monitrc  2005-08-03 08:05:34.052390000 +0000
@@ -364,7 +364,7 @@
 #
 #
 #  check host myserver with address 192.168.1.1
-#    if failed icmp type echo with timeout 3 seconds then alert
+#    if failed icmp type echo count 3 with timeout 3 seconds then alert
 #    if failed port 3306 protocol mysql then alert
 #    if failed port 80 protocol http then alert
 #    if failed port 443 type tcpssl protocol http
diff -x CVS -Naur monit/net.c monit-icmp/net.c
--- monit/net.c 2005-08-03 08:38:10.929578000 +0000
+++ monit-icmp/net.c    2005-08-03 08:24:21.104477000 +0000
@@ -579,11 +579,13 @@
 
 /**
  * Create a ICMP socket against hostname, send echo and wait for response.
+ * The 'count' echo requests  is send and we expect at least one reply.
  * @param hostname The host to open a socket at
  * @param timeout If response will not come within timeout seconds abort
+ * @param count How many pings to send
  * @return response time on succes, -1 on error
  */
-double icmp_echo(const char *hostname, int timeout) {
+double icmp_echo(const char *hostname, int timeout, int count) {
 
   struct hostent *hp;
   struct sockaddr_in sin;
@@ -591,21 +593,22 @@
 #ifdef HAVE_SOL_IP
   struct iphdr *iphdrin;
   struct icmphdr *icmphdrin= NULL;
-  struct icmphdr *icmphdrout= NULL;
+  struct icmphdr *icmphdrout[count];
 #else
   struct ip *iphdrin;
   struct icmp *icmphdrin= NULL;
-  struct icmp *icmphdrout= NULL;
+  struct icmp *icmphdrout[count];
 #endif
   size_t size;
   fd_set rset;
+  int i;
   int s;
   int n= 0;
   int sol_ip;
   unsigned ttl= 255;
   char buf[STRLEN];
   struct timeval tv;
-  struct timeval t1;
+  struct timeval t1[count];
   struct timeval t2;
   double response= -1;
   
@@ -635,31 +638,33 @@
   tv.tv_sec= timeout;
   tv.tv_usec= 0;
 
-  NEW(icmphdrout);
+  for(i=0; i<count; i++) {
+    NEW(icmphdrout[i]);
 #ifdef HAVE_SOL_IP
-  icmphdrout->code= 0;
-  icmphdrout->type= ICMP_ECHO;
-  icmphdrout->un.echo.id= getpid();
-  icmphdrout->un.echo.sequence= 0;
-  icmphdrout->checksum= checksum_ip((unsigned char *)icmphdrout, ICMP_SIZE);
+    icmphdrout[i]->code= 0;
+    icmphdrout[i]->type= ICMP_ECHO;
+    icmphdrout[i]->un.echo.id= getpid();
+    icmphdrout[i]->un.echo.sequence= i;
+    icmphdrout[i]->checksum= checksum_ip((unsigned char *)icmphdrout[i], 
ICMP_SIZE);
 #else
-  icmphdrout->icmp_code= 0;
-  icmphdrout->icmp_type= ICMP_ECHO;
-  icmphdrout->icmp_id= getpid();
-  icmphdrout->icmp_seq= 0;
-  icmphdrout->icmp_cksum= checksum_ip((unsigned char *)icmphdrout, ICMP_SIZE);
+    icmphdrout[i]->icmp_code= 0;
+    icmphdrout[i]->icmp_type= ICMP_ECHO;
+    icmphdrout[i]->icmp_id= getpid();
+    icmphdrout[i]->icmp_seq= i;
+    icmphdrout[i]->icmp_cksum= checksum_ip((unsigned char *)icmphdrout[i], 
ICMP_SIZE);
 #endif
-  sout.sin_family= AF_INET;
-  sout.sin_port= 0;
-  memcpy(&sout.sin_addr, hp->h_addr, hp->h_length);
+    sout.sin_family= AF_INET;
+    sout.sin_port= 0;
+    memcpy(&sout.sin_addr, hp->h_addr, hp->h_length);
 
-  /* Get time of connection attempt beginning */
-  gettimeofday(&t1, NULL);
+    /* Get time of particular connection attempt beginning */
+    gettimeofday(&t1[i], NULL);
 
-  do {
-    n= sendto(s, (char *)icmphdrout, ICMP_SIZE, 0,
+    do {
+      n= sendto(s, (char *)icmphdrout[i], ICMP_SIZE, 0,
              (struct sockaddr *)&sout, sizeof(struct sockaddr));
-  } while(n == -1 && errno == EINTR);
+    } while(n == -1 && errno == EINTR);
+  }
   
   do {
 
@@ -680,36 +685,40 @@
     } while(n == -1 && errno == EINTR);
     
     if(n < 0)
-       goto error;
+           goto error;
     
+    for(i=0; i<count; i++) {
 #ifdef HAVE_SOL_IP
-    iphdrin= (struct iphdr *)buf;
-    icmphdrin= (struct icmphdr *)(buf + iphdrin->ihl * 4);
-    if( (icmphdrin->un.echo.id == icmphdrout->un.echo.id) &&
-        (icmphdrin->type == ICMP_ECHOREPLY) &&
-        (icmphdrin->un.echo.sequence == icmphdrout->un.echo.sequence) ) {
+      iphdrin= (struct iphdr *)buf;
+      icmphdrin= (struct icmphdr *)(buf + iphdrin->ihl * 4);
+      if( (icmphdrin->un.echo.id == icmphdrout[i]->un.echo.id) &&
+          (icmphdrin->type == ICMP_ECHOREPLY) &&
+          (icmphdrin->un.echo.sequence == icmphdrout[i]->un.echo.sequence) ) {
 #else
-    iphdrin= (struct ip *)buf;
-    icmphdrin= (struct icmp *)(buf + iphdrin->ip_hl * 4);
-    if( (icmphdrin->icmp_id == icmphdrout->icmp_id) &&
-        (icmphdrin->icmp_type == ICMP_ECHOREPLY) &&
-        (icmphdrin->icmp_seq == icmphdrout->icmp_seq) ) {
+      iphdrin= (struct ip *)buf;
+      icmphdrin= (struct icmp *)(buf + iphdrin->ip_hl * 4);
+      if( (icmphdrin->icmp_id == icmphdrout[i]->icmp_id) &&
+          (icmphdrin->icmp_type == ICMP_ECHOREPLY) &&
+          (icmphdrin->icmp_seq == icmphdrout[i]->icmp_seq) ) {
 #endif
 
-      /* Get time of connection attempt finish */
-      gettimeofday(&t2, NULL);
+        /* Get time of connection attempt finish */
+        gettimeofday(&t2, NULL);
 
-      /* Get the response time */
-      response= (double)(t2.tv_sec  - t1.tv_sec) +
-                (double)(t2.tv_usec - t1.tv_usec)/1000000;
-      break;
+        /* Get the response time */
+        response= (double)(t2.tv_sec  - t1[i].tv_sec) +
+                  (double)(t2.tv_usec - t1[i].tv_usec)/1000000;
 
+        goto done;
+      }
     }
-
   } while(TRUE);
 
   error:
-  FREE(icmphdrout);
+  done:
+  for(i=0; i<count; i++) {
+    FREE(icmphdrout[i]);
+  }
   close_socket(s);
 
   return response;
diff -x CVS -Naur monit/net.h monit-icmp/net.h
--- monit/net.h 2005-08-03 08:38:11.029242000 +0000
+++ monit-icmp/net.h    2005-08-03 08:05:34.181389000 +0000
@@ -190,10 +190,12 @@
 
 /**
  * Create a ICMP socket against hostname, send echo and wait for response.
+ * The 'count' echo requests  is send and we expect at least one reply.
  * @param hostname The host to open a socket at
  * @param timeout If response will not come within timeout seconds abort
+ * @param count How many pings to send
  * @return response time on succes, -1 on error
  */
-double icmp_echo(const char *hostname, int timeout);
+double icmp_echo(const char *hostname, int timeout, int count);
 
 #endif
diff -x CVS -Naur monit/p.y monit-icmp/p.y
--- monit/p.y   2005-08-03 08:38:11.441527000 +0000
+++ monit-icmp/p.y      2005-08-03 08:05:34.226268000 +0000
@@ -235,7 +235,7 @@
 %token SET LOGFILE FACILITY DAEMON SYSLOG MAILSERVER HTTPD ALLOW ADDRESS INIT
 %token READONLY CLEARTEXT MD5HASH SHA1HASH CRYPT
 %token PEMFILE ENABLE DISABLE HTTPDSSL CLIENTPEMFILE ALLOWSELFCERTIFICATION
-%token STATEFILE SEND EXPECT CYCLE
+%token STATEFILE SEND EXPECT CYCLE COUNT
 %token PIDFILE START STOP PATHTOK
 %token HOST PORT TYPE UDP TCP TCPSSL PROTOCOL CONNECTION
 %token ALERT MAILFORMAT UNIXSOCKET SIGNATURE
@@ -304,10 +304,10 @@
                 | timeout
                 | alert
                 | every
-               | mode
-               | group
+                | mode
+                | group
                 | depend
-               | resourcesystem
+                | resourcesystem
                 ;
 
 optfilelist      : /* EMPTY */
@@ -316,17 +316,17 @@
 
 optfile         : start
                 | stop
-               | timestamp
+                | timestamp
                 | timeout
                 | every
                 | alert
-               | permission
-               | uid
-               | gid
-               | checksum
+                | permission
+                | uid
+                | gid
+                | checksum
                 | size
-               | mode
-               | group
+                | mode
+                | group
                 | depend
                 ;
 
@@ -339,11 +339,11 @@
                 | timeout
                 | every
                 | alert
-               | permission
-               | uid
-               | gid
-               | mode
-               | group
+                | permission
+                | uid
+                | gid
+                | mode
+                | group
                 | depend
                 | inode
                 | space
@@ -355,15 +355,15 @@
 
 optdir          : start
                 | stop
-               | timestamp
+                | timestamp
                 | timeout
                 | every
                 | alert
-               | permission
-               | uid
-               | gid
-               | mode
-               | group
+                | permission
+                | uid
+                | gid
+                | mode
+                | group
                 | depend
                 ;
 
@@ -374,12 +374,12 @@
 opthost         : start
                 | stop
                 | connection
-               | icmp
+                | icmp
                 | timeout
                 | alert
                 | every
-               | mode
-               | group
+                | mode
+                | group
                 | depend
                 ;
 
@@ -716,10 +716,11 @@
                   }
                 ;
 
-icmp            : IF FAILED ICMP icmptype nettimeout THEN action1 recovery {
+icmp            : IF FAILED ICMP icmptype count nettimeout THEN action1 
recovery {
                    icmpset.type= $<number>4;
-                   icmpset.timeout= $<number>5;
-                   addeventaction(&(icmpset).action, $<number>7, $<number>8);
+                   icmpset.count= $<number>5;
+                   icmpset.timeout= $<number>6;
+                   addeventaction(&(icmpset).action, $<number>8, $<number>9);
                    addicmp(&icmpset);
                   }
                 ;
@@ -906,12 +907,20 @@
                   }
                 ;
 
+count           : /* EMPTY */ {
+                   $<number>$= ATTEMPT_COUNT;
+                  }
+                | COUNT NUMBER {
+                   $<number>$= $2;
+                  }
+                ;
+
 nettimeout      : /* EMPTY */ {
                    $<number>$= NET_TIMEOUT;
                   }
                 | TIMEOUT NUMBER SECOND {
-                  $<number>$= $2;
-                 }
+                   $<number>$= $2;
+                  }
                 ;
 
 timeout         : TIMEOUT NUMBER NUMBER {
@@ -1970,6 +1979,7 @@
 
     NEW(icmp);
     icmp->type= is->type;
+    icmp->count= is->count;
     icmp->timeout= is->timeout;
     icmp->action= is->action;
     icmp->is_available= FALSE;
@@ -2766,6 +2776,7 @@
  */
 static void reset_icmpset() {
   icmpset.type= ICMP_ECHO;
+  icmpset.count= ATTEMPT_COUNT;
   icmpset.timeout= NET_TIMEOUT;
   icmpset.action= NULL;
 }
diff -x CVS -Naur monit/util.c monit-icmp/util.c
--- monit/util.c        2005-08-03 08:38:11.629470000 +0000
+++ monit-icmp/util.c   2005-08-03 08:09:10.235624000 +0000
@@ -685,10 +685,10 @@
     for(i= s->icmplist; i; i= i->next) {
       EventAction_T a= i->action;
 
-      printf(" %-20s = if failed %s with timeout %d seconds then %s "
+      printf(" %-20s = if failed %s count %d with timeout %d seconds then %s "
         "else if recovered then %s\n",
-        "ICMP", icmpnames[i->type], i->timeout, actionnames[a->failed->id],
-        actionnames[a->passed->id]);
+        "ICMP", icmpnames[i->type], i->count, i->timeout,
+        actionnames[a->failed->id], actionnames[a->passed->id]);
     }
 
   if(s->portlist) {
diff -x CVS -Naur monit/validate.c monit-icmp/validate.c
--- monit/validate.c    2005-08-03 08:38:11.808999000 +0000
+++ monit-icmp/validate.c       2005-08-03 08:05:34.307604000 +0000
@@ -394,7 +394,7 @@
       switch(icmp->type) {
       case ICMP_ECHO:
 
-        icmp->response= icmp_echo(s->path, icmp->timeout);
+        icmp->response= icmp_echo(s->path, icmp->timeout, icmp->count);
 
         if(icmp->response < 0) {
           icmp->is_available= FALSE;
diff -x CVS -Naur monit/web/doc/next.php monit-icmp/web/doc/next.php
--- monit/web/doc/next.php      2005-08-03 08:38:12.287946000 +0000
+++ monit-icmp/web/doc/next.php 2005-08-03 08:37:19.408761000 +0000
@@ -29,11 +29,10 @@
 <div style="background: #EFF7FF; padding: 10px;">
 <b>Done</b>
 <ul style="list-style-type: square;">
-  <li>Currently none</li>
+  <li><a href="#22">Soft failure tolerance for ICMP echo test (ping)</a></li>
 </ul>
 <b>In progress</b>
 <ul style="list-style-type: square;">
-  <li><a href="#22">Soft failure tolerance for ICMP echo test (ping)</a></li>
   <li><a href="#31">Monitor a (log) file using regex</a></li>
 </ul>
 <b>Planned</b>

reply via email to

[Prev in Thread] Current Thread [Next in Thread]