[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150429183135.GH31341@dhcp22.suse.cz>
Date: Wed, 29 Apr 2015 20:31:36 +0200
From: Michal Hocko <mhocko@...e.cz>
To: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
david@...morbit.com
Cc: hannes@...xchg.org, akpm@...ux-foundation.org, aarcange@...hat.com,
rientjes@...gle.com, vbabka@...e.cz, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/9] mm: improve OOM mechanism v2
On Thu 30-04-15 02:27:44, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Wed 29-04-15 08:55:06, Johannes Weiner wrote:
> > > What we can do to mitigate this is tie the timeout to the setting of
> > > TIF_MEMDIE so that the wait is not 5s from the point of calling
> > > out_of_memory() but from the point of where TIF_MEMDIE was set.
> > > Subsequent allocations will then go straight to the reserves.
> >
> > That would deplete the reserves very easily. Shouldn't we rather
> > go other way around? Allow OOM killer context to dive into memory
> > reserves some more (ALLOC_OOM on top of current ALLOC flags and
> > __zone_watermark_ok would allow an additional 1/4 of the reserves) and
> > start waiting for the victim after that reserve is depleted. We would
> > still have some room for TIF_MEMDIE to allocate, the reserves consumption
> > would be throttled somehow and the holders of resources would have some
> > chance to release them and allow the victim to die.
>
> Does OOM killer context mean memory allocations which can call out_of_memory()?
Yes, that was the idea, because others will not reclaim any memory. Even
all those which invoke out_of_memory will not kill a new task but one
killed task should compensate for the ALLOC_OOM part of the memory
reserves.
> If yes, there is no guarantee that such memory reserve is used by threads which
> the OOM victim is waiting for, for they might do only !__GFP_FS allocations.
OK, so we are back to GFP_NOFS. Right, those are your main pain point
because you can see i_mutex deadlocks. But really, those allocations
should simply fail because looping in the allocator and rely on others
to make a progress is simply retarded.
I thought that Dave was quite explicit that they do not strictly
need nofail behavior of GFP_NOFS but rather a GFP flag which
would allow to dive into reserves some more for specific contexts
(http://marc.info/?l=linux-mm&m=142897087230385&w=2). I also do not
remember him or anybody else saying that _every_ GFP_NOFS should get the
access to reserves automatically.
Dave, could you clarify/confirm, please?
Because we are going back and forth about GFP_NOFS without any progress
for a very long time already and it seems one class of issues could be
handled by this change already.
I mean we should eventually fail all the allocation types but GFP_NOFS
is coming from _carefully_ handled code paths which is an easier starting
point than a random code path in the kernel/drivers. So can we finally
move at least in this direction?
> Likewise, there is possibility that such memory reserve is used by threads
> which the OOM victim is not waiting for, for malloc() + memset() causes
> __GFP_FS allocations.
We cannot be certain without complete dependency tracking. This is
just a heuristic.
--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists