[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.1111161438160.16596@chino.kir.corp.google.com>
Date: Wed, 16 Nov 2011 14:48:12 -0800 (PST)
From: David Rientjes <rientjes@...gle.com>
To: Minchan Kim <barrioskmc@...il.com>
cc: Pekka Enberg <penberg@...nel.org>,
Minchan Kim <minchan.kim@...il.com>,
Colin Cross <ccross@...roid.com>, Mel Gorman <mgorman@...e.de>,
Andrew Morton <akpm@...ux-foundation.org>,
Pekka Enberg <penberg@...helsinki.fi>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
Andrea Arcangeli <aarcange@...hat.com>,
LKML <linux-kernel@...r.kernel.org>,
Linux-MM <linux-mm@...ck.org>, "Rafael J. Wysocki" <rjw@...k.pl>
Subject: Re: [PATCH] mm: avoid livelock on !__GFP_FS allocations
On Thu, 17 Nov 2011, Minchan Kim wrote:
> >> > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> >> > index fdd4263..01aa9b5 100644
> >> > --- a/kernel/power/suspend.c
> >> > +++ b/kernel/power/suspend.c
> >> > @@ -297,9 +297,11 @@ int enter_state(suspend_state_t state)
> >> > goto Finish;
> >> >
> >> > pr_debug("PM: Entering %s sleep\n", pm_states[state]);
> >> > + oom_killer_disable();
> >> > pm_restrict_gfp_mask();
> >> > error = suspend_devices_and_enter(state);
> >> > pm_restore_gfp_mask();
> >> > + oom_killer_enable();
> >> >
> >> > Finish:
> >> > pr_debug("PM: Finishing wakeup.\n");
> >> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >> > index 6e8ecb6..d8c31b7 100644
> >> > --- a/mm/page_alloc.c
> >> > +++ b/mm/page_alloc.c
> >> > @@ -2177,9 +2177,9 @@ rebalance:
> >> > * running out of options and have to consider going OOM
> >> > */
> >> > if (!did_some_progress) {
> >> > - if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) {
> >> > - if (oom_killer_disabled)
> >> > + if (oom_killer_disabled)
> >> > goto nopage;
> >
> > You're allowing __GFP_NOFAIL allocations to fail.
> >
> >> > + if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) {
> >> > page = __alloc_pages_may_oom(gfp_mask, order,
> >> > zonelist, high_zoneidx,
> >> > nodemask, preferred_zone,
> >> >
> >>
> >> I'd prefer something like this. The whole 'gfp_allowed_flags' thing was
> >> designed to make GFP_KERNEL work during boot time where it's obviously safe to
> >> do that. I really don't think that's going to work suspend cleanly.
> >>
> >
> > Adding Rafael to the cc.
> >
> > This has been done since 2.6.34 and presumably has been working quite
> > well. I don't have a specific objection to gfp_allowed_flags to be used
> > outside of boot since it seems plausible that there are system-level
> > contexts that would need different behavior in the page allocator and this
> > does it effectively without major surgery or a slower fastpath. Suspend
> > is using it just like boot does before irqs are enabled, so I don't have
> > an objection to it.
> >
>
> My point isn't using gfp_allowed_flags(maybe it's Pekka's concern) but
> why adding new special case handling code like pm_suspended_storage.
> I think we can handle the issue with oom_killer_disabled(but the naming is bad)
>
Ignore the name of the function that Mel is introducing, it's only related
to suspend because that's the only thing that (currently) alters
gfp_allowed_mask after boot. If something else were to clear __GFP_FS and
__GFP_IO in the future, we'd simply need to rename the function. I was
going to ask for a comment specifically about that, but I think it's
proximity to the function that allows gfp_allowed_mask to be altered is
sufficient.
We'd really like to avoid the loop if __GFP_FS and __GFP_IO are not set.
If oom_killer_disabled is set anytime that gfp_allowed_mask does not allow
them for _all_ page allocations, then we could certainly replace the new
pm_suspended_storage() check in should_alloc_retry() with a check on
oom_killer_disabled.
I'll let you and Rafael work out whether that can be done or not, I just
see pm_restrict_gfp_mask() being called in kernel/power/hibernate.c and
kernel/power/suspend.c whereas oom_killer_disable() is only called in
kernel/power/process.c. oom_killer_disable() would be unnecessary if
__GFP_FS were already cleared, so it would require some changes in the
suspend code.
I like Mel's patch because it's easily maintainable and only depends on
the state of reclaim and the gfp flags being passed in, which is the
direct cause of the infinite loop.
Powered by blists - more mailing lists