linux-ext4 - Re: [BUG] fatal hang untarring 90GB file, possibly writeback related.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <1304431982.2576.5.camel@mulgrave.site>
Date:	Tue, 03 May 2011 09:13:02 -0500
From:	James Bottomley <James.Bottomley@...e.de>
To:	Mel Gorman <mgorman@...ell.com>
Cc:	Mel Gorman <mgorman@...e.de>, Jan Kara <jack@...e.cz>,
	colin.king@...onical.com, Chris Mason <chris.mason@...cle.com>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	linux-mm <linux-mm@...ck.org>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	linux-ext4 <linux-ext4@...r.kernel.org>
Subject: Re: [BUG] fatal hang untarring 90GB file, possibly writeback
 related.

On Tue, 2011-05-03 at 10:13 +0100, Mel Gorman wrote:
> On Thu, Apr 28, 2011 at 05:43:48PM -0500, James Bottomley wrote:
> > On Thu, 2011-04-28 at 16:12 -0500, James Bottomley wrote:
> > > On Thu, 2011-04-28 at 14:59 -0500, James Bottomley wrote:
> > > > Actually, talking to Chris, I think I can get the system up using
> > > > init=/bin/bash without systemd, so I can try the no cgroup config.
> > > 
> > > OK, so a non-PREEMPT non-CGROUP kernel has survived three back to back
> > > runs of untar without locking or getting kswapd pegged, so I'm pretty
> > > certain this is cgroups related.  The next steps are to turn cgroups
> > > back on but try disabling the memory and IO controllers.
> > 
> > I tried non-PREEMPT CGROUP but disabled GROUP_MEM_RES_CTLR.
> > 
> > The results are curious:  the tar does complete (I've done three back to
> > back).  However, I did get one soft lockup in kswapd (below).  But the
> > system recovers instead of halting I/O and hanging like it did
> > previously.
> > 
> > The soft lockup is in shrink_slab, so perhaps it's a combination of slab
> > shrinker and cgroup memory controller issues?
> > 
> 
> So, kswapd is still looping in reclaim and spending a lot of time in
> shrink_slab but it must not be the shrinker itself or that debug patch
> would have triggered. It's curious that cgroups are involved with
> systemd considering that one would expect those groups to be fairly
> small. I still don't have a new theory but will get hold of a Fedora 15
> install CD and see can I reproduce it locally.

I've got a ftrace output of kswapd ... it's 500k compressed, so I'll
send under separate cover.

> One last thing, what is the value of /proc/sys/vm/zone_reclaim_mode? Two
> of the reporting machines could be NUMA and if that proc file reads as
> 1, I'd be interested in hearing the results of a test with it set to 0.
> Thanks.

It's zero, I'm afraid

James


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html