[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160408113425.GF29820@dhcp22.suse.cz>
Date: Fri, 8 Apr 2016 13:34:25 +0200
From: Michal Hocko <mhocko@...nel.org>
To: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
Cc: linux-mm@...ck.org, rientjes@...gle.com, akpm@...ux-foundation.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/3] mm, oom_reaper: clear TIF_MEMDIE for all tasks
queued for oom_reaper
On Thu 07-04-16 20:55:34, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > The first obvious one is when the oom victim clears its mm and gets
> > stuck later on. oom_reaper would back of on find_lock_task_mm returning
> > NULL. We can safely try to clear TIF_MEMDIE in this case because such a
> > task would be ignored by the oom killer anyway. The flag would be
> > cleared by that time already most of the time anyway.
>
> I didn't understand what this wants to tell. The OOM victim will clear
> TIF_MEMDIE as soon as it sets current->mm = NULL.
No it clears the flag _after_ it returns from mmput. There is no
guarantee it won't get stuck somewhere on the way there - e.g. exit_aio
waits for completion and who knows what else might get stuck.
> Even if the oom victim
> clears its mm and gets stuck later on (e.g. at exit_task_work()),
> TIF_MEMDIE was already cleared by that moment by the OOM victim.
>
> >
> > The less obvious one is when the oom reaper fails due to mmap_sem
> > contention. Even if we clear TIF_MEMDIE for this task then it is not
> > very likely that we would select another task too easily because
> > we haven't reaped the last victim and so it would be still the #1
> > candidate. There is a rare race condition possible when the current
> > victim terminates before the next select_bad_process but considering
> > that oom_reap_task had retried several times before giving up then
> > this sounds like a borderline thing.
>
> Is it helpful? Allowing the OOM killer to select the same thread again
> simply makes the kernel log buffer flooded with the OOM kill messages.
I am trying to be as conservative as possible here. The likelyhood of
mmap sem contention will be reduced considerably after my
down_write_killable series will get merged. If this turns out to be a
problem (trivial to spot as the same task will be killed again) then we
can think about a fix for that (e.g. ignore the task if the has been
selected more than N times).
> I think we should not allow the OOM killer to select the same thread again
> by e.g. doing tsk->signal->oom_score_adj = OOM_SCORE_ADJ_MIN regardless of
> whether reaping that thread's memory succeeded or not.
I think this comes with some risk and so it should go as a separate
patch with a full justification why the outcome is better. Especially
after the mmap_sem contention will be reduced by other means.
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists