[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080316230346.GA379@tv-sign.ru>
Date: Mon, 17 Mar 2008 02:03:46 +0300
From: Oleg Nesterov <oleg@...sign.ru>
To: Roland McGrath <roland@...hat.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Davide Libenzi <davidel@...ilserver.org>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
Ingo Molnar <mingo@...e.hu>,
Laurent Riffard <laurent.riffard@...e.fr>,
Pavel Emelyanov <xemul@...nvz.org>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 4/5] don't panic if /sbin/init exits or killed
On 03/16, Roland McGrath wrote:
>
(re-ordered)
> Have you tested how recoverable it really is? I wonder what happens
> with init having exited when things get reparented to it. Don't the
> zombies just pile up?
Yes sure, we leak the re-parented zombies, and nobody can take care of
/etc/inittab. As expected.
But otherwise the system runs fine.
> BUG() does not seem right to me. This does not diagnose any kernel bug.
> The kernel source location and backtrace are not useful. In fact, they
> are likely to mislead the user into reporting the bug to the wrong place
> (because it will look like a kernel bug).
But panic() isn't better? It doesn't provide any useful info.
> I gather your motivation is to get something "recoverable" rather than
> always rebooting. This might be useful for developers like you and me.
> I suspect that conservative administrators of production systems prefer
> the current behavior. If the boot init dies, that is reasonably likely
> to be a "catastrophic" failure of the system as a whole as far as the
> proprietor of a production system is concerned. That is, the system may
> no longer behave as expected in ways essential for its normal operation.
> If it sticks around in that condition, appearing to be available but not
> doing everything it should, that is usually worse than a quick and
> orderly crash (which the installation's procedures and monitoring
> infrastructure are often prepared to handle).
Well, I think the generic "if we have a chance to survive, we should try
to survive" rule is good.
If the boot init dies, at least the admin has a chance to figure out what
has happened, and -o remount,ro /.
Every BUG/BUG_ON in fact means the system is not useable, but still it does
not panic(), but tries to proceed.
In short, I can't see why panic() is better. Except we have panic_timeout,
but we can take it into account if init exits.
> panic is a bit extreme for the situation, where we have no reason yet to
> think kernel data structures are inconsistent. A sync+reboot or sync+crash
> without bust_spinlocks et al might be better.
>
> For letting init die and calling it recoverable for hacking purposes, a
> sysctl to disable the panic/crash makes sense. But I don't think we
> should change the default setting.
OK, I won't argue (not that I agree ;).
Oleg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists