linux-ext4 - Re: Livelock when running xfstests generic/127 on ext4 with 3.15

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20140625144514.GA10397@thunk.org>
Date:	Wed, 25 Jun 2014 10:45:14 -0400
From:	Theodore Ts'o <tytso@....edu>
To:	Matthew Wilcox <willy@...ux.intel.com>
Cc:	Jan Kara <jack@...e.cz>, linux-ext4@...r.kernel.org,
	linux-fsdevel@...r.kernel.org
Subject: Re: Livelock when running xfstests generic/127 on ext4 with 3.15

On Wed, Jun 25, 2014 at 10:17:56AM -0400, Matthew Wilcox wrote:
> 
> Okay ... but why is it so much worse in 3.15 than 3.14?
> 
> And does ext4 think of "running out of space" as a percentage
> free, or an absolute number of blocks remaining?  From the code in
> ext4_nonda_switch(), it seems to be the former, although maybe excessive
> fragmentation has caused ext4 to think it's running out of space?

When the blocks that were allocated using delayed allocation exceeds
50% of the free space, we initiate writeback.  When delalloc blocks
exceeds 66% of the free space, we fall back to nodelalloc, which among
other things, means blocks are allocated for each write system call,
and we also have to add and remove the inode from on the orphan inode
list so that if we crash in the middle of the write system call, we
don't end up exposing stale data.

We did have a change to the orphan inode code to improve scalability,
so that could have been a possible cause; but that happened after
3.15, so that can't be it.  The other possibility is that there's
simply a chance in the writeback code that is changing how
aggressively we start writeback when we exceed the 50% threshold, so
that we end up switching into nonda mode more often.

Any chance you can run generic/127 under perf so we can see where
we're spending all of our CPU time?  The other thing I can imagine
doing is to add tracepoint when whenver we drop into nonda mode, so we
can see if that's happening more often under 3.15 versus 3.14.

    	   	  	    	       	     - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html