linux-ext4 - Re: error in ext4_mb_release_inode

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20080708074907.GB23723@skywalker>
Date:	Tue, 8 Jul 2008 13:19:07 +0530
From:	"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>
To:	Andreas Dilger <adilger@....com>
Cc:	Eric Sandeen <sandeen@...hat.com>,
	ext4 development <linux-ext4@...r.kernel.org>
Subject: Re: error in ext4_mb_release_inode_pa

On Mon, Jul 07, 2008 at 05:20:22PM -0600, Andreas Dilger wrote:
> On Jul 07, 2008  10:29 -0500, Eric Sandeen wrote:
> > This was on #linuxfs last night, and I think I've seen at least one
> > other report of it:
> > 
> > [22:44]  <shehjart> any ideas why i get the following two lines on the
> > serial console when writing to ext4 over software raid0:
> > [22:44]  <shehjart> pa e00001004112d450: logic 11928, phys. 47003288,
> > len 360
> > [22:45]  <shehjart> EXT4-fs error (device md0):
> > ext4_mb_release_inode_pa: free 176, pa_free 174
> 
> The bug that I recalled from Lustre is unlikely to be the same.  It is
> https://bugzilla.lustre.org/show_bug.cgi?id=15932
> 
> 	"error: N blocks in bitmap, M in gd"

The first part of the fix is not needed. I guess we are initializing
block bitmap properly. The second part which states "We cannot trust
find_next_bit() to return a value < max. So we must check its
return for overflow." can be done as below 

How about ?

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index a1e58fb..d2c61eb 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -381,22 +381,28 @@ static inline void mb_clear_bit_atomic(spinlock_t *lock, int bit, void *addr)
 
 static inline int mb_find_next_zero_bit(void *addr, int max, int start)
 {
-	int fix = 0;
+	int fix = 0, ret, tmpmax;
 	addr = mb_correct_addr_and_bit(&fix, addr);
-	max += fix;
+	tmpmax = max + fix;
 	start += fix;
 
-	return ext4_find_next_zero_bit(addr, max, start) - fix;
+	ret = ext4_find_next_zero_bit(addr, tmpmax, start) - fix;
+	if (ret > max)
+		return max;
+	return ret;
 }
 
 static inline int mb_find_next_bit(void *addr, int max, int start)
 {
-	int fix = 0;
+	int fix = 0, ret, tmpmax;
 	addr = mb_correct_addr_and_bit(&fix, addr);
-	max += fix;
+	tmpmax = max + fix;
 	start += fix;
 
-	return ext4_find_next_bit(addr, max, start) - fix;
+	ret = ext4_find_next_bit(addr, tmpmax, start) - fix;
+	if (ret > max)
+		return max;
+	return ret;
 }
 
 static void *mb_find_buddy(struct ext4_buddy *e4b, int order, int *max)
@@ -3633,8 +3639,6 @@ ext4_mb_release_inode_pa(struct ext4_buddy *e4b, struct buffer_head *bitmap_bh,
 		if (bit >= end)
 			break;
 		next = mb_find_next_bit(bitmap_bh->b_data, end, bit);
-		if (next > end)
-			next = end;
 		start = group * EXT4_BLOCKS_PER_GROUP(sb) + bit +
 				le32_to_cpu(sbi->s_es->s_first_data_block);
 		mb_debug("    free preallocated %u/%u in group %u\n",


> 
> There was a second bug in ext3_mb_use_best_found() hit on > 8TB filesystems:
> https://bugzilla.lustre.org/show_bug.cgi?id=16101
> 
> 	BUG_ON(ac->ac_b_ex.fe_group != e3b->bd_group);
> 

This fix is not needed I guess because we use the ext4_group_t for group
I don't know why the bd_blkbits change is needed

-+	__u16 bd_blkbits;
-+	__u16 bd_group;
++	unsigned bd_group;
++	unsigned bd_blkbits;
 +};


-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html