monit – monitoring za velike
Kad imate vise servera o kojima se brinete, cesto se desi da servisi na istima iz nekih razloga odu u nirvana state a da to niste ni primjetili. Nekad zbog pogresne konfiguracije servisi se i ne startaju, nekad child procesi servisa pojedu sve resurse.. jednostavno ne sluze ono sto bi trebali.
Zasto Monit?
Nagios je sigurno najpoznatiji i najkoristeniji monitoring servis out there, ali isto tako ne bas lagan za konfigurisati.
Ideja je jednostavna: ukoliko se desi $x posalji mi mail na $y i restartuj servis sa komandom $z
Za neki standardni LAMPP setup monitrc bi to otprilike izgledao ovako (kako instalirati monit na omiljeni distro reci ce vam Google):
set logfile syslog facility log_daemon
set mailserver localhost
set mail-format { from: monit@nula.ba
subject: $SERVICE $EVENT at $DATE
message: Monit $ACTION $SERVICE at $DATE on $HOST: $DESCRIPTION }
set alert amar.cosic@gmail.com
check process sshd with pidfile /var/run/sshd.pid
start program "/etc/init.d/ssh start"
stop program "/etc/init.d/ssh stop"
if failed port 22 protocol ssh then restart
if 5 restarts within 5 cycles then timeout
check process mysql with pidfile /var/run/mysqld/mysqld.pid
group database
start program = "/etc/init.d/mysql start"
stop program = "/etc/init.d/mysql stop"
if mem usage > 80% for 5 cycles then alert
if cpu usage > 80% for 5 cycles then alert
if failed host 127.0.0.1 port 3306 then restart
if 5 restarts within 5 cycles then timeout
check process apache with pidfile /var/run/apache2.pid
group www
start program = "/etc/init.d/apache2 start"
stop program = "/etc/init.d/apache2 stop"
if cpu is greater than 60% for 2 cycles then alert
if cpu > 80% for 5 cycles then restart
if totalmem > 500 MB for 5 cycles then restart
if children > 250 then restart
if loadavg(5min) greater than 10 for 8 cycles then stop
if failed host 127.0.0.1 port 80 then restart
if 3 restarts within 5 cycles then timeout
check process postfix with pidfile /var/spool/postfix/pid/master.pid
group mail
start program = "/etc/init.d/postfix start"
stop program = "/etc/init.d/postfix stop"
if failed port 25 protocol smtp then restart
if 5 restarts within 5 cycles then timeout
check process postgres with pidfile /var/lib/postgresql/8.4/main/postmaster.pid
group database
start program = "/etc/init.d/postgresql start"
stop program = "/etc/init.d/postgresql stop"
if failed unixsocket /var/run/postgresql/.s.PGSQL.5432 protocol pgsql
then restart
if failed host 127.0.0.1 port 5432 protocol pgsql then restart
if 5 restarts within 5 cycles then timeout
check process vsftpd with pidfile /var/run/vsftpd/vsftpd.pid
start program = "/etc/init.d/vsftpd start"
stop program = "/etc/init.d/vsftpd stop"
if failed port 21 protocol ftp then restart
if 5 restarts within 5 cycles then timeout
Sve je manje-vise self explanatory, na procese koji inace trose najvise resursa (apache, mysql) stavljamo mali monitoring procesa,memorije i cpu iskoristenosti u ostalim slucajevima restart servisa se vrsi na fail konektovanja na port/socket.