linux-kernel - Re: [PATCH] mm,oom_kill: Close race window of needlessly selecting new victims.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170621131850.GA27494@dhcp22.suse.cz>
Date:   Wed, 21 Jun 2017 15:18:50 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     David Rientjes <rientjes@...gle.com>
Cc:     Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
        akpm@...ux-foundation.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm,oom_kill: Close race window of needlessly selecting
 new victims.

On Tue 20-06-17 15:12:55, David Rientjes wrote:
[...]
> This doesn't prevent serial oom killing for either the system oom killer 
> or for the memcg oom killer.
> 
> The oom killer cannot detect tsk_is_oom_victim() if the task has either 
> been removed from the tasklist or has already done cgroup_exit(). For 
> memcg oom killings in particular, cgroup_exit() is usually called very 
> shortly after the oom killer has sent the SIGKILL.  If the oom reaper does 
> not fail (for example by failing to grab mm->mmap_sem) before another 
> memcg charge after cgroup_exit(victim), additional processes are killed 
> because the iteration does not view the victim.
> 
> This easily kills all processes attached to the memcg with no memory 
> freeing from any victim.

It took me some time to decrypt the above but you are right. Pinning
mm_users will prevent exit path to exit_mmap and that can indeed cause
another premature oom killing because the task might be unhashed or
removed from the memcg before the oom reaper has a chance to reap the
task. Thanks for pointing this out. This means that we either have to
reimplement the unhashing/cgroup_exit for oom victims or get back to
allowing oom reaper to race with exit_mmap. The later sounds much more
easier to me.

I was offline last two days but I will revisit my original idea ASAP.

-- 
Michal Hocko
SUSE Labs