lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 10 Jun 2015 17:56:47 +0200
From:	Michal Hocko <mhocko@...e.cz>
To:	Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
Cc:	linux-mm@...ck.org, rientjes@...gle.com, hannes@...xchg.org,
	tj@...nel.org, akpm@...ux-foundation.org,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC] panic_on_oom_timeout

On Wed 10-06-15 16:28:01, Michal Hocko wrote:
> On Wed 10-06-15 21:20:58, Tetsuo Handa wrote:
[...]
> > Since my version uses per a "struct task_struct" variable (memdie_start),
> > 5 seconds of timeout is checked for individual memory cgroup. It can avoid
> > unnecessary panic() calls if nobody needs to call out_of_memory() again
> > (probably because somebody volunteered memory) when the OOM victim cannot
> > be terminated for some reason. If we want distinction between "the entire
> > system is under OOM" and "some memory cgroup is under OOM" because the
> > former is urgent but the latter is less urgent, it can be modified to
> > allow different timeout period for system-wide OOM and cgroup OOM.
> > Finally, it can give a hint for "in what sequence threads got stuck" and
> > "which thread did take 5 seconds" when analyzing vmcore.
> 
> I will have a look how you have implemented that but separate timeouts
> sound like a major over engineering. Also note that global vs. memcg OOM
> is not sufficient because there are other oom domains as mentioned above.

Your patch is doing way too many things at once :/ So let me just focus
on the "panic if a task is stuck with TIF_MEMDIE for too long". It looks
like an alternative to the approach I've chosen. It doesn't consider
the allocation restriction so a locked up cpuset/numa node(s) might
panic the system which doesn't sound like a good idea but that is easily
fixable. Could you tear just this part out and repost it so that we can
compare the two approaches?

The panic_on_oom=2 would be still weird because some nodes might stay in
OOM condition without triggering the panic but maybe this is acceptable.

Thanks!
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ