lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20091230053720.GK4429@thunk.org>
Date:	Wed, 30 Dec 2009 00:37:20 -0500
From:	tytso@....edu
To:	Alexander Beregalov <a.beregalov@...il.com>,
	Dmitry Monakhov <dmonakhov@...nvz.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	linux-ext4@...r.kernel.org, Jens Axboe <jens.axboe@...cle.com>,
	dmitry.torokhov@...il.com, Jan Kara <jack@...e.cz>,
	aanisimov@...ox.ru, pl4nkton@...glemail.com
Subject: Re: 2.6.33-rc1: kernel BUG at fs/ext4/inode.c:1063 (sparc)

On Sun, Dec 27, 2009 at 10:51:59PM -0500, tytso@....EDU wrote:
> OK, i've been able to reproduce the problem using xfsqa test #74
> (fstest) when an ext3 file system is mounted the ext4 file system
> driver.  I was then able to bisect it down to commit d21cd8f6, which
> was introduced between 2.6.33-rc1 and 2.6.33-rc2, as part of
> quota/ext4 patch series pushed by Jan.

OK, here's a patch which I think should avoid the BUG in
fs/ext4/inode.c.  It should fix the regression, but in the long run we
need to pretty seriously rethink how we account for the need for
potentially new meta-data blocks when doing delayed allocation.

The remaining problem with this machinery is that
ext4_da_update_reserve_space() and ext4_da_release_space() is that
they both try to calculate how many metadata blocks will potentially
required by calculating ext4_calc_metadata_amount() based on the
number of delayed allocation blocks found in i_reserved_data_blocks.
The problem is that ext4_calc_metadata_amount() assumes that the
number of blocks passed to it is contiguous, and what might be left
remaining to be written in the page cache could be anything but
contiguous.  This is a problem which has always been there, so it's
not a regression per se; just a design flaw.

The patch below should fixes the regression caused by commit d21cd8f,
but we need to look much more closely to find a better way of
accounting for the potential need for metadata for inodes facing
delayed allocation.  Could people who are having problems with the BUG
in line 1063 of fs/ext4/inode.c try this patch?

Thanks!!

    	      	       		    	   - Ted


commit 48b71e562ecd35ab12f6b6420a92fb3c9145da92
Author: Theodore Ts'o <tytso@....edu>
Date:   Wed Dec 30 00:04:04 2009 -0500

    ext4: Patch up how we claim metadata blocks for quota purposes
    
    Commit d21cd8f triggered a BUG in the function
    ext4_da_update_reserve_space() found in fs/ext4/inode.c, which was
    caused by fact that ext4_calc_metadata_amount() can over-estimate how
    many metadata blocks will be needed, especially when using direct
    block-mapped files.  Work around this by not claiming any excess
    metadata blocks than we are prepared to claim at this point.
    
    Signed-off-by: "Theodore Ts'o" <tytso@....edu>

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3e3b454..d6e84b4 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1058,14 +1058,23 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used)
 	mdb_free = EXT4_I(inode)->i_reserved_meta_blocks - mdb;
 
 	if (mdb_free) {
-		/* Account for allocated meta_blocks */
+		/* 
+		 * Account for allocated meta_blocks; it is possible
+		 * for us to have allocated more meta blocks than we
+		 * are prepared to free at this point.  This is
+		 * because ext4_calc_metadata_amount can over-estimate
+		 * how many blocks are still needed.  So we may not be
+		 * able to claim all of the allocated meta blocks
+		 * right away.  The accounting will work out in the end...
+		 */
 		mdb_claim = EXT4_I(inode)->i_allocated_meta_blocks;
-		BUG_ON(mdb_free < mdb_claim);
+		if (mdb_free < mdb_claim)
+			mdb_claim = mdb_free;
 		mdb_free -= mdb_claim;
 
 		/* update fs dirty blocks counter */
 		percpu_counter_sub(&sbi->s_dirtyblocks_counter, mdb_free);
-		EXT4_I(inode)->i_allocated_meta_blocks = 0;
+		EXT4_I(inode)->i_allocated_meta_blocks -= mdb_claim;
 		EXT4_I(inode)->i_reserved_meta_blocks = mdb;
 	}
 
@@ -1845,7 +1854,7 @@ repeat:
 static void ext4_da_release_space(struct inode *inode, int to_free)
 {
 	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
-	int total, mdb, mdb_free, release;
+	int total, mdb, mdb_free, mdb_claim, release;
 
 	if (!to_free)
 		return;		/* Nothing to release, exit */
@@ -1874,6 +1883,16 @@ static void ext4_da_release_space(struct inode *inode, int to_free)
 	BUG_ON(mdb > EXT4_I(inode)->i_reserved_meta_blocks);
 	mdb_free = EXT4_I(inode)->i_reserved_meta_blocks - mdb;
 
+	if (mdb_free) {
+		/* Account for allocated meta_blocks */
+		mdb_claim = EXT4_I(inode)->i_allocated_meta_blocks;
+		if (mdb_free < mdb_claim)
+			mdb_claim = mdb_free;
+		mdb_free -= mdb_claim;
+
+		EXT4_I(inode)->i_allocated_meta_blocks -= mdb_claim;
+	}
+
 	release = to_free + mdb_free;
 
 	/* update fs dirty blocks counter for truncate case */
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ