[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.0906020016060.24915@chino.kir.corp.google.com>
Date: Tue, 2 Jun 2009 00:26:50 -0700 (PDT)
From: David Rientjes <rientjes@...gle.com>
To: Andrew Morton <akpm@...ux-foundation.org>
cc: Nick Piggin <npiggin@...e.de>, Rik van Riel <riel@...hat.com>,
Mel Gorman <mel@....ul.ie>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Christoph Lameter <cl@...ux-foundation.org>,
Dave Hansen <dave@...ux.vnet.ibm.com>,
linux-kernel@...r.kernel.org
Subject: Re: [patch 3/3 -mmotm] oom: invoke oom killer for __GFP_NOFAIL
On Mon, 1 Jun 2009, Andrew Morton wrote:
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1547,7 +1547,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
> > goto out;
> >
> > /* The OOM killer will not help higher order allocs */
> > - if (order > PAGE_ALLOC_COSTLY_ORDER)
> > + if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_NOFAIL))
> > goto out;
> >
> > /* Exhausted what can be done so it's blamo time */
> > @@ -1765,11 +1765,13 @@ rebalance:
> > goto got_pg;
> >
> > /*
> > - * The OOM killer does not trigger for high-order allocations
> > - * but if no progress is being made, there are no other
> > - * options and retrying is unlikely to help
> > + * The OOM killer does not trigger for high-order
> > + * ~__GFP_NOFAIL allocations so if no progress is being
> > + * made, there are no other options and retrying is
> > + * unlikely to help.
> > */
> > - if (order > PAGE_ALLOC_COSTLY_ORDER)
> > + if (order > PAGE_ALLOC_COSTLY_ORDER &&
> > + !(gfp_mask & __GFP_NOFAIL))
> > goto nopage;
> >
> > goto restart;
>
> I really think/hope/expect that this is unneeded.
>
> Do we know of any callsites which do greater-than-order-0 allocations
> with GFP_NOFAIL? If so, we should fix them.
>
> Then just ban order>0 && GFP_NOFAIL allocations.
>
That seems like a different topic: banning higher-order __GFP_NOFAIL
allocations or just deprecating __GFP_NOFAIL altogether and slowly
switching users over is a worthwhile effort, but is unrelated.
This patch is necessary because we explicitly deny the oom killer from
being used when the order is greater than PAGE_ALLOC_COSTLY_ORDER because
of an assumption that it won't help. That assumption isn't always true,
especially for large memory-hogging tasks that have mlocked large chunks
of contiguous memory, for example. The only thing we do know is that
direct reclaim has not made any progress so we're unlikely to get a
substantial amount of memory freeing in the immediate future. Such an
instance will simply loop forever without killing that rogue task for a
__GFP_NOFAIL allocation.
So while it's better in the long-term to deprecate the flag as much as
possible and perhaps someday remove it from the page allocator entirely,
we're faced with the current behavior of either looping endlessly or
freeing memory so the kernel allocation may succeed when direct reclaim
has failed, which also makes this a rare instance where the oom killer
will never needlessly kill a task.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists