[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1304431982.2576.5.camel@mulgrave.site>
Date: Tue, 03 May 2011 09:13:02 -0500
From: James Bottomley <James.Bottomley@...e.de>
To: Mel Gorman <mgorman@...ell.com>
Cc: Mel Gorman <mgorman@...e.de>, Jan Kara <jack@...e.cz>,
colin.king@...onical.com, Chris Mason <chris.mason@...cle.com>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
linux-mm <linux-mm@...ck.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
linux-ext4 <linux-ext4@...r.kernel.org>
Subject: Re: [BUG] fatal hang untarring 90GB file, possibly writeback
related.
On Tue, 2011-05-03 at 10:13 +0100, Mel Gorman wrote:
> On Thu, Apr 28, 2011 at 05:43:48PM -0500, James Bottomley wrote:
> > On Thu, 2011-04-28 at 16:12 -0500, James Bottomley wrote:
> > > On Thu, 2011-04-28 at 14:59 -0500, James Bottomley wrote:
> > > > Actually, talking to Chris, I think I can get the system up using
> > > > init=/bin/bash without systemd, so I can try the no cgroup config.
> > >
> > > OK, so a non-PREEMPT non-CGROUP kernel has survived three back to back
> > > runs of untar without locking or getting kswapd pegged, so I'm pretty
> > > certain this is cgroups related. The next steps are to turn cgroups
> > > back on but try disabling the memory and IO controllers.
> >
> > I tried non-PREEMPT CGROUP but disabled GROUP_MEM_RES_CTLR.
> >
> > The results are curious: the tar does complete (I've done three back to
> > back). However, I did get one soft lockup in kswapd (below). But the
> > system recovers instead of halting I/O and hanging like it did
> > previously.
> >
> > The soft lockup is in shrink_slab, so perhaps it's a combination of slab
> > shrinker and cgroup memory controller issues?
> >
>
> So, kswapd is still looping in reclaim and spending a lot of time in
> shrink_slab but it must not be the shrinker itself or that debug patch
> would have triggered. It's curious that cgroups are involved with
> systemd considering that one would expect those groups to be fairly
> small. I still don't have a new theory but will get hold of a Fedora 15
> install CD and see can I reproduce it locally.
I've got a ftrace output of kswapd ... it's 500k compressed, so I'll
send under separate cover.
> One last thing, what is the value of /proc/sys/vm/zone_reclaim_mode? Two
> of the reporting machines could be NUMA and if that proc file reads as
> 1, I'd be interested in hearing the results of a test with it set to 0.
> Thanks.
It's zero, I'm afraid
James
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists