[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20170623124550.GX5308@dhcp22.suse.cz>
Date: Fri, 23 Jun 2017 14:45:51 +0200
From: Michal Hocko <mhocko@...nel.org>
To: Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
Cc: David Rientjes <rientjes@...gle.com>, akpm@...ux-foundation.org,
linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm,oom_kill: Close race window of needlessly selecting
new victims.
On Thu 22-06-17 09:53:48, Tetsuo Handa wrote:
> David Rientjes wrote:
> > On Wed, 21 Jun 2017, Tetsuo Handa wrote:
> > > Umm... So, you are pointing out that select_bad_process() aborts based on
> > > TIF_MEMDIE or MMF_OOM_SKIP is broken because victim threads can be removed
> > > from global task list or cgroup's task list. Then, the OOM killer will have to
> > > wait until all mm_struct of interested OOM domain (system wide or some cgroup)
> > > is reaped by the OOM reaper. Simplest way is to wait until all mm_struct are
> > > reaped by the OOM reaper, for currently we are not tracking which memory cgroup
> > > each mm_struct belongs to, are we? But that can cause needless delay when
> > > multiple OOM events occurred in different OOM domains. Do we want to (and can we)
> > > make it possible to tell whether each mm_struct queued to the OOM reaper's list
> > > belongs to the thread calling out_of_memory() ?
> > >
> >
> > I am saying that taking mmget() in mark_oom_victim() and then only
> > dropping it with mmput_async() after it can grab mm->mmap_sem, which the
> > exit path itself takes, or the oom reaper happens to schedule, causes
> > __mmput() to be called much later and thus we remove the process from the
> > tasklist or call cgroup_exit() earlier than the memory can be unmapped
> > with your patch. As a result, subsequent calls to the oom killer kills
> > everything before the original victim's mm can undergo __mmput() because
> > the oom reaper still holds the reference.
>
> Here is "wait for all mm_struct are reaped by the OOM reaper" version.
Well, this is getting more and more hairy. I think we should explore the
possibility of oom_reaper vs. exit_mmap working together after all.
Yes, I've said that a solution fully withing the oom proper would be
preferable but this just grows into complex hairy mess. Maybe we just
find out that oom_reaper vs. exit_mmap is just not feasible and we will
reconsider this approach in the end but let's try a clean solution
first. As I've said there is nothing fundamentally hard about parallel
unmapping MADV_DONTNEED does that already. We just have to iron out
those tiny details.
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists