[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170621131850.GA27494@dhcp22.suse.cz>
Date: Wed, 21 Jun 2017 15:18:50 +0200
From: Michal Hocko <mhocko@...nel.org>
To: David Rientjes <rientjes@...gle.com>
Cc: Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
akpm@...ux-foundation.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm,oom_kill: Close race window of needlessly selecting
new victims.
On Tue 20-06-17 15:12:55, David Rientjes wrote:
[...]
> This doesn't prevent serial oom killing for either the system oom killer
> or for the memcg oom killer.
>
> The oom killer cannot detect tsk_is_oom_victim() if the task has either
> been removed from the tasklist or has already done cgroup_exit(). For
> memcg oom killings in particular, cgroup_exit() is usually called very
> shortly after the oom killer has sent the SIGKILL. If the oom reaper does
> not fail (for example by failing to grab mm->mmap_sem) before another
> memcg charge after cgroup_exit(victim), additional processes are killed
> because the iteration does not view the victim.
>
> This easily kills all processes attached to the memcg with no memory
> freeing from any victim.
It took me some time to decrypt the above but you are right. Pinning
mm_users will prevent exit path to exit_mmap and that can indeed cause
another premature oom killing because the task might be unhashed or
removed from the memcg before the oom reaper has a chance to reap the
task. Thanks for pointing this out. This means that we either have to
reimplement the unhashing/cgroup_exit for oom victims or get back to
allowing oom reaper to race with exit_mmap. The later sounds much more
easier to me.
I was offline last two days but I will revisit my original idea ASAP.
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists