[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151002135201.GA28533@redhat.com>
Date: Fri, 2 Oct 2015 15:52:01 +0200
From: Oleg Nesterov <oleg@...hat.com>
To: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
Cc: mhocko@...nel.org, akpm@...ux-foundation.org, rientjes@...gle.com,
kwalker@...hat.com, skozina@...hat.com,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH -mm v2 1/3] mm/oom_kill: remove the wrong
fatal_signal_pending() check in oom_kill_process()
Tetsuo, sorry, I don't understand your question...
On 10/02, Tetsuo Handa wrote:
>
> Oleg Nesterov wrote:
> > On 10/01, Michal Hocko wrote:
> > >
> > > zap_process will add SIGKILL to all threads but the
> > > current which will go on without being killed and if this is not a
> > > thread group leader then we would miss it.
> >
> > Yes. And note that de_thread() does the same. Speaking of oom-killer
> > this is mostly fine, the execing thread is going to release its old
> > ->mm and it has already passed the copy_strings() stage which can use
> > a lot more memory.
>
> So, we have the same wrong fatal_signal_pending() check in out_of_memory()
Yes, sure, it is not right too. Again, this is even documented in
d003f371b27016354c:
fatal_signal_pending() can be true because of SIGNAL_GROUP_COREDUMP so
out_of_memory() and mem_cgroup_out_of_memory() shouldn't blindly trust it.
This is off-topic in a sense that this series only tries to ensure that
if we are going to kill a memory hog we can't miss a process which shares
the same mm (ignoring the OOM_SCORE_ADJ_MIN condition below).
> /*
> * If current has a pending SIGKILL or is exiting, then automatically
> * select it. The goal is to allow it to allocate so that it may
> * quickly exit and free its memory.
> *
> * But don't select if current has already released its mm and cleared
> * TIF_MEMDIE flag at exit_mm(), otherwise an OOM livelock may occur.
> */
> if (current->mm &&
> (fatal_signal_pending(current) || task_will_free_mem(current))) {
> mark_oom_victim(current);
> return true;
> }
>
> because it is possible that T starts the coredump, T sends SIGKILL to P,
> P calls out_of_memory() on GFP_FS allocation,
yes, and since fatal_signal_pending() == T we do not even check
task_will_free_mem().
> P misses to set SIGKILL on T?
>
> Since T sends SIGKILL to all clone(CLONE_VM) tasks upon coredump, P needs
> to do
>
> [...snip...]
> after mark_oom_victim(current) in case T is not in the same thread group?
I do not see how this depends on "not in the same thread group". This
fatal_signal_pending() doesn't look right in any case.
> If yes, what happens if some task failed to receive SIGKILL due to
> p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN condition?
Oh. This is another issue. I already tried to suggest to remove this
check. But this needs more discussion, hopefully we can do this later.
Oleg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists