[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20101018140116.3AE8.A69D9226@jp.fujitsu.com>
Date: Mon, 18 Oct 2010 14:04:57 +0900 (JST)
From: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
To: Neil Brown <neilb@...e.de>, Wu Fengguang <fengguang.wu@...el.com>
Cc: kosaki.motohiro@...fujitsu.com, Rik van Riel <riel@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
Li Shaohua <shaohua.li@...el.com>
Subject: Re: Deadlock possibly caused by too_many_isolated.
> On Wed, 15 Sep 2010 18:44:34 +1000
> Neil Brown <neilb@...e.de> wrote:
>
> > On Wed, 15 Sep 2010 16:28:43 +0800
> > Wu Fengguang <fengguang.wu@...el.com> wrote:
> >
> > > Neil,
> > >
> > > Sorry for the rushed and imaginary ideas this morning..
> > >
> > > > @@ -1101,6 +1101,12 @@ static unsigned long shrink_inactive_lis
> > > > int lumpy_reclaim = 0;
> > > >
> > > > while (unlikely(too_many_isolated(zone, file, sc))) {
> > > > + if ((sc->gfp_mask & GFP_IOFS) != GFP_IOFS)
> > > > + /* Not allowed to do IO, so mustn't wait
> > > > + * on processes that might try to
> > > > + */
> > > > + return SWAP_CLUSTER_MAX;
> > > > +
> > >
> > > The above patch should behavior like this: it returns SWAP_CLUSTER_MAX
> > > to cheat all the way up to believe "enough pages have been reclaimed".
> > > So __alloc_pages_direct_reclaim() see non-zero *did_some_progress and
> > > go on to call get_page_from_freelist(). That normally fails because
> > > the task didn't really scanned the LRU lists. However it does have the
> > > possibility to succeed -- when so many processes are doing concurrent
> > > direct reclaims, it may luckily get one free page reclaimed by other
> > > tasks. What's more, if it does fail to get a free page, the upper
> > > layer __alloc_pages_slowpath() will be repeat recalling
> > > __alloc_pages_direct_reclaim(). So, sooner or later it will succeed in
> > > "stealing" a free page reclaimed by other tasks.
> > >
> > > In summary, the patch behavior for !__GFP_IO/FS is
> > > - won't do any page reclaim
> > > - won't fail the page allocation (unexpected)
> > > - will wait and steal one free page from others (unreasonable)
> > >
> > > So it will address the problem you encountered, however it sounds
> > > pretty unexpected and illogical behavior, right?
> > >
> > > I believe this patch will address the problem equally well.
> > > What do you think?
> >
> > Thank you for the detailed explanation. Is agree with your reasoning and
> > now understand why your patch is sufficient.
> >
> > I will get it tested and let you know how that goes.
>
> (sorry this has taken a month to follow up).
>
> Testing shows that this patch seems to work.
> The test load (essentially kernbench) doesn't deadlock any more, though it
> does get bogged down thrashing in swap so it doesn't make a lot more
> progress :-) I guess that is to be expected.
>
> One observation is that the kernbench generated 10%-20% more context switches
> with the patch than without. Is that to be expected?
>
> Do you have plans for sending this patch upstream?
Wow, I had thought this patch has been merged already. Wu, can you please
repost this one? and please add my and Neil's ack tag.
Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists