[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170726163928.GB29716@redhat.com>
Date: Wed, 26 Jul 2017 18:39:28 +0200
From: Andrea Arcangeli <aarcange@...hat.com>
To: Michal Hocko <mhocko@...nel.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
David Rientjes <rientjes@...gle.com>,
Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
Oleg Nesterov <oleg@...hat.com>,
Hugh Dickins <hughd@...gle.com>, linux-mm@...ck.org,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap
On Wed, Jul 26, 2017 at 07:45:33AM +0200, Michal Hocko wrote:
> Yes, exit_aio is the only blocking call I know of currently. But I would
> like this to be as robust as possible and so I do not want to rely on
> the current implementation. This can change in future and I can
> guarantee that nobody will think about the oom path when adding
> something to the final __mmput path.
I think ksm_exit may block too waiting for allocations, the generic
idea is those calls before exit_mmap can cause a problem yes.
> > exit_mmap would have no issue, if there was enough time in the
> > lifetime CPU to allocate the memory, sure the memory will also be
> > freed in finite amount of time by exit_mmap.
>
> I am not sure I understand. Say that any call prior to unmap_vmas blocks
> on a lock which is held by another call path which cannot proceed with
> the allocation...
What I meant was, if three was no prior call to exit_mmap->unmap_vmas.
> I really do not want to rely on any timing. This just too fragile. Once
> we have killed a task then we shouldn't pick another victim until it
> passed exit_mmap or the oom_reaper did its job. Otherwise we just risk
> false positives while we have already disrupted the workload.
On smaller systems lack or parallelism in OOM killing surely isn't a
problem.
> This will work more or less the same to what we have currently.
>
> [victim] [oom reaper] [oom killer]
> do_exit __oom_reap_task_mm
> mmput
> __mmput
> mmget_not_zero
> test_and_set_bit(MMF_OOM_SKIP)
> oom_evaluate_task
> # select next victim
> # reap the mm
> unmap_vmas
>
> so we can select a next victim while the current one is still not
> completely torn down.
How does oom_evaluate_task possibly run at the same time of
test_and_set_bit in __oom_reap_task_mm considering both are running
under the oom_lock? It's hard to see how what you describe above could
materialize as second and third column cannot run in parallel because
of the oom_lock.
I don't think there was any issue, but then you pointed out the
locking on signal->oom_mm that is protected by the task_lock vs
current->mm NULL check, so I can replace in my patch the
test_and_set_bit with set_bit on one side and the oom_mm task_lock
protected locking on the other side. This way I can put back a set_bit
in the __mmput fast path (instead of test_and_set_bit) and it's even
more efficient. With such a change, I'll also stop depending on the
oom_lock to prevent second and third column to run in parallel.
I still didn't remove the oom_lock outright that seems orthogonal
change unrelated to this issue but now you could remove it as far as
the above is concerned.
> I hope 3f70dc38cec2 ("mm: make sure that kthreads will not refault oom
> reaped memory") will clarify this code. If not please start a new thread
> so that we do not conflate different things together.
I'll look into that, thanks.
Andrea
Powered by blists - more mailing lists