monit – monitoring za velike

Kad imate vise servera o kojima se brinete, cesto se desi da servisi na istima iz nekih razloga odu u nirvana state a da to niste ni primjetili. Nekad zbog pogresne konfiguracije servisi se i ne startaju, nekad child procesi servisa pojedu sve resurse.. jednostavno ne sluze ono sto bi trebali.

Zasto Monit?

Nagios je sigurno najpoznatiji i najkoristeniji monitoring servis out there, ali isto tako ne bas lagan za konfigurisati.

Ideja je jednostavna: ukoliko se desi $x posalji mi mail na $y i restartuj servis sa komandom $z

Za neki standardni LAMPP setup  monitrc bi to otprilike izgledao ovako (kako instalirati monit na omiljeni distro  reci ce vam Google):

set daemon 60
set logfile syslog facility log_daemon
set mailserver localhost
set mail-format { from: monit@nula.ba
subject: $SERVICE $EVENT at $DATE
message: Monit $ACTION $SERVICE at $DATE on $HOST: $DESCRIPTION }
set alert amar.cosic@gmail.com

check process sshd with pidfile /var/run/sshd.pid
start program "/etc/init.d/ssh start"
stop program "/etc/init.d/ssh stop"
if failed port 22 protocol ssh then restart
if 5 restarts within 5 cycles then timeout

check process mysql with pidfile /var/run/mysqld/mysqld.pid
group database
start program = "/etc/init.d/mysql start"
stop program = "/etc/init.d/mysql stop"
if mem usage > 80% for 5 cycles then alert
if cpu usage > 80% for 5 cycles then alert
if failed host 127.0.0.1 port 3306 then restart
if 5 restarts within 5 cycles then timeout

check process apache with pidfile /var/run/apache2.pid
group www
start program = "/etc/init.d/apache2 start"
stop program = "/etc/init.d/apache2 stop"
if cpu is greater than 60% for 2 cycles then alert
if cpu > 80% for 5 cycles then restart
if totalmem > 500 MB for 5 cycles then restart
if children > 250 then restart
if loadavg(5min) greater than 10 for 8 cycles then stop
if failed host 127.0.0.1 port 80 then restart
if 3 restarts within 5 cycles then timeout

check process postfix with pidfile /var/spool/postfix/pid/master.pid
group mail
start program = "/etc/init.d/postfix start"
stop program = "/etc/init.d/postfix stop"
if failed port 25 protocol smtp then restart
if 5 restarts within 5 cycles then timeout

check process postgres with pidfile /var/lib/postgresql/8.4/main/postmaster.pid
group database
start program = "/etc/init.d/postgresql start"
stop program = "/etc/init.d/postgresql stop"
if failed unixsocket /var/run/postgresql/.s.PGSQL.5432 protocol pgsql
then restart
if failed host 127.0.0.1 port 5432 protocol pgsql then restart
if 5 restarts within 5 cycles then timeout

check process vsftpd with pidfile /var/run/vsftpd/vsftpd.pid
start program = "/etc/init.d/vsftpd start"
stop program = "/etc/init.d/vsftpd stop"
if failed port 21 protocol ftp then restart
if 5 restarts within 5 cycles then timeout

 

Sve je manje-vise self explanatory, na procese koji inace trose najvise resursa (apache, mysql) stavljamo mali monitoring procesa,memorije i cpu iskoristenosti u ostalim slucajevima restart servisa se vrsi na fail konektovanja na port/socket.

 

Monit

Monit configuration examples