linux-kernel - Re: [patch v3] mm, oom: fix unnecessary killing of additional processes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180710100735.GF14284@dhcp22.suse.cz>
Date:   Tue, 10 Jul 2018 13:01:49 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     David Rientjes <rientjes@...gle.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        kbuild test robot <fengguang.wu@...el.com>,
        Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [patch v3] mm, oom: fix unnecessary killing of additional
 processes

On Mon 09-07-18 13:30:10, David Rientjes wrote:
> On Mon, 9 Jul 2018, Michal Hocko wrote:
> 
> > > Blockable mmu notifiers and mlocked memory is not the extent of the 
> > > problem, if a process has a lot of virtual memory we must wait until 
> > > free_pgtables() completes in exit_mmap() to prevent unnecessary oom 
> > > killing.  For implementations such as tcmalloc, which does not release 
> > > virtual memory, this is important because, well, it releases this only at 
> > > exit_mmap().  Of course we cannot do that with only the protection of 
> > > mm->mmap_sem for read.
> > 
> > And how exactly a timeout helps to prevent from "unnecessary killing" in
> > that case?
> 
> As my patch does, it becomes mandatory to move MMF_OOM_SKIP to after 
> free_pgtables() in exit_mmap() and then repurpose MMF_UNSTABLE to 
> indicate that the oom reaper should not operate on a given mm.  In the 
> event we cannot reach MMF_OOM_SKIP, we need to ensure forward progress and 
> that is possible with a timeout period in the very rare instance where 
> additional memory freeing is needed, and without unnecessary oom killing 
> when it is not needed.

But such a timeout doesn't really know how much to wait so it is more
a hack than anything else. The only reason why we set MMF_OOM_SKIP so
early in the exit path now is inability to reap mlocked memory. That
is something fundamentally solvable. In fact we can really postpone
MMF_OOM_SKIP to after free_pgtables. It would require to extend the
current handover between the oom reaper and the exit path but it is
doable AFAICS. Only the exit path can call free_pgtables but the oom
reaper doesn't have to set MMF_OOM_SKIP if it _knows_ that the exit_mmap
is already past any point of blocking.

Btw, I am quite surprise you are now worried about oom victims with
basically no memory mapped and a huge amount of memory in page tables.
We have never handled that case properly IIRC. So oom_reaper hasn't
added anything new here.

That being said, I haven't heard any bug reports for over eager oom
killer just because of the oom reaper except your rather non-specific
claims about millions of pointless oom invocations. So I am not really
convinced we have to rush into a solution. I would much rather work
on a proper and comprehensible solution than put one band aid over
another. This has been the case in the oom proper for many years and
we have ended up with a subtle code which is way too easy to break and
nightmare to maintain. Let's not repeat that again please.

So do not rush into first idea and let's do the proper development
here. This means the proper analysis of the problem, find a solution
space and chose one which is the most reasonable long term.
-- 
Michal Hocko
SUSE Labs