linux-kernel - Re: additional oom-killer tuneable worth submitting?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <9a8748490612071425g41eabc1fnf2683e85bde262a8@mail.gmail.com>
Date:	Thu, 7 Dec 2006 23:25:07 +0100
From:	"Jesper Juhl" <jesper.juhl@...il.com>
To:	"Chris Friesen" <cfriesen@...tel.com>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: additional oom-killer tuneable worth submitting?

On 07/12/06, Chris Friesen <cfriesen@...tel.com> wrote:
> Jesper Juhl wrote:
> >> Jesper Juhl wrote:
>
> >> > What happens in the case where the OOM killer really, really needs to
> >> > kill one or more processes since there is not a single drop of memory
> >> > available, but all processes are below their configured thresholds?
>
> > I realize that if this case happens the system is misconfigured as far
> > as oomthresh goes, but if this is a knob that we put in the mainline
> > kernel then I believe there should be some sort of emergency handling
> > code that takes this situation into account.  Perhaps throw some very
> > nasty looking log messages and then fall back to the classic OOM
> > killer behaviour..?
>
> Yeah, I can see that the reboot might be a bit drastic for mainline.  I
> think the fallback to classic behaviour might work okay.
>
> Anyway, the chances of hitting that case are likely pretty slim.  The
> way we've been using this is to only set the threshold for fairly
> important long-lived daemons.  Much of the "standard" stuff (shell, cat,
> cp, mv, etc.) is left unprotected.
>
Sure, that's sensible, to only protect the important stuff.
But even if the chances of hitting this are slim, we still need a way
out. For most people anything is better than a hung box.

Some examples;

 For a desktop (where people may be experimenting with the feature) -
seeing your firefox process evaporate due to the OOM killer and then
finding a message explaining what happened in dmesg is a lot less
frustrating than a hang or sudden reboot.

 For a server - If you mis-configure the new feature you may be in for
a long drive to reboot a box whereas falling back to the classic OOM
killer (+ nasty messages in dmesg) will likely save you the trip and
clue you in as to what you mis-configured.

 For an embedded box - triggering a reboot would probably be better
than both a hang or classic OOM kill in many cases (better to have the
device reboot and come back working than to hang or start
malfunctioning due to a missing process).

So maybe what's needed is an additional knob for people to tweak - one
that selects what should happen in this rare case: 1) fallback to
classic OOM (default), 2) reboot, 3) hang.   In all cases messages
should be logged explaining what happened.
Or is that overkill?  If so I'd personally prefer just falling back to
classic OOM kill in this case.

A way out for the "OOM but all processes below threshold" case +
perhaps coupled with oomthresh applying to process groups instead of
just processes and I personally start to like this feature.

Let's see some code...

-- 
Jesper Juhl <jesper.juhl@...il.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/