linux-kernel - Re: processes/threads monitor

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <13755.1233458060@turing-police.cc.vt.edu>
Date:	Sat, 31 Jan 2009 22:14:20 -0500
From:	Valdis.Kletnieks@...edu
To:	Angelo Borsotti <angelo.borsotti@...il.com>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: processes/threads monitor

On Sat, 31 Jan 2009 22:57:36 +0100, Angelo Borsotti said:

> The need occurs on Linux installations in which there are some process
> control applications,
> or some continuous sever applications that need some resilience
> towards process failures.
> When a process providing an essential service terminates, come
> "process monitor" would
> create it again, restoring the service automatically (and without the
> need for human intervention).

Usually, this is easily done with any sane /sbin/init, by sticking a
line in /etc/inittab that looks like:

vip:12345:respawn:/usr/bin/my-important-process

It terminates, it gets respawned.

The *tricky* case isn't dealing with a process that manages to terminate,
the real uglyness starts when you have a process that becomes wedged up
but failing to terminate (which can happen for any number of reasons - it
went into an infinite loop, or two threads managed to deadlock, or....)

And unfortunately, there's no clean general-purpose way to do *that*, mostly
because there's no good config language that allows you to code stuff like
"if Apache wedges up and fails to respond in 0.5 seconds or less, shoot it
and restart it, unless you have detected that the failure is actually due to
unrelated cause A, in which case do X, or if B happened, then do Y, or if
C happened, then do Z, or...."

If you have real requirements for all-the-time continuous operations, then
you don't *really* want a monitor anyhow.  What you *want* is redundant
hardware with some good high-availability clustering software to do hot failover.

Because if your server comes to a screeching halt because a memory card or
other hardware just bit the dust, that monitor isn't going to do *anything* for you.

Plus - if you can't do failover, how do you do a system upgrade if needed?

Content of type "application/pgp-signature" skipped