lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 13 Jan 2016 17:26:10 +0100
From:	Michal Hocko <mhocko@...nel.org>
To:	Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
Cc:	rientjes@...gle.com, akpm@...ux-foundation.org, mgorman@...e.de,
	torvalds@...ux-foundation.org, oleg@...hat.com, hughd@...gle.com,
	andrea@...nel.org, riel@...hat.com, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm,oom: Re-enable OOM killer using timers.

On Wed 13-01-16 21:11:30, Tetsuo Handa wrote:
[...]
> Those who use panic_on_oom = 1 expect that the system triggers kernel panic
> rather than stall forever. This is a translation of administrator's wish that
> "Please press SysRq-c on behalf of me if the memory exhausted. In that way,
> I don't need to stand by in front of the console twenty-four seven."
> 
> Those who use panic_on_oom = 0 expect that the OOM killer solves OOM condition
> rather than stall forever. This is a translation of administrator's wish that
> "Please press SysRq-f on behalf of me if the memory exhausted. In that way,
> I don't need to stand by in front of the console twenty-four seven."

I think you are missing an important point. There is _no reliable_ way
to resolve the OOM condition in general except to panic the system. Even
killing all user space tasks might not be sufficient in general because
they might be blocked by an unkillable context (e.g. kernel thread).
So if you need a reliable behavior then either use panic_on_oom=1 or
provide a measure to panic after fixed timeout if the OOM cannot get
resolved. We have seen patches in that regards but there was no general
interest in them to merge them.

All we can do is a best effort approach which tries to be optimized to
reduce the impact of an unexpected SIGKILL sent to a "random" task. And
this is a reasonable objective IMHO. This works well in 99% of cases.
You can argue you do care about that 1% and I sympathy with you but
steps to mitigate those shouldn't involve steps which bring another
level of non-determinism into an already complicated system. This was
the biggest issue of the early OOM killer.

-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ