lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 15 Nov 2006 21:45:34 -0800
From:	Andrew Morton <akpm@...l.org>
To:	Hugh Dickins <hugh@...itas.com>
Cc:	Mel Gorman <mel@...net.ie>, "Martin J. Bligh" <mbligh@...igh.org>,
	linux-kernel@...r.kernel.org,
	"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
	Mingming Cao <cmm@...ibm.com>
Subject: Re: Boot failure with ext2 and initrds

On Wed, 15 Nov 2006 14:17:01 +0000 (GMT)
Hugh Dickins <hugh@...itas.com> wrote:

> On Tue, 14 Nov 2006, Hugh Dickins wrote:
> > On Tue, 14 Nov 2006, Andrew Morton wrote:
> > > 
> > > The below might help.
> > 
> > Indeed it does (with Martin's E2FSBLK warning fix),
> > seems to be running well on all machines now.
> 
> i386 and ppc64 still doing builds, but after an hour on x86_64,
> an ld got stuck in a loop under ext2_try_to_allocate_with_rsv,
> alternating between ext2_rsv_window_add and rsv_window_remove.
> Send me a patch and I'll try it...
> 
> ext2_try_to_allocate_with_rsv+0x288
> ext2_new_blocks+0x21e
> ext2_get_blocks+0x398
> ext2_get_block+0x46
> __block_prepare_write+0x171
> block_prepare_write+0x39
> ext2_prepare_write+0x2c
> generic_file_buffered_write+0x2b0
> __generic_file_aio_write_nolock+0x4bc
> generic_file_aio_write+0x6d
> do_sync_write+0xf9
> vfs_write+0xc8
> sys_write+0x51

OK, I have a theory.

This must have been the seventeenth damn time I've stared at
find_next_zero_bit() wondering what the damn return value is and wondering
how any even slightly non-sadistic person could write a damn function like
that and not damn well document it.

int find_next_zero_bit(const unsigned long *addr, int size, int offset)

It returns the offset of the first zero bit relative to addr.  

ext3's bitmap_search_next_usable_block() assumed that find_next_zero_bit()
returns the offset of the first zero bit relative to (addr+offset).

The while loop in ext3's bitmap_search_next_usable_block() serendipitously
covered that bug up.

ext2's bitmap_search_next_usable_block() doesn't need that while loop, so
ext3's benign bug became ext2's fatal bug.

So...

--- a/fs/ext2/balloc.c~a
+++ a/fs/ext2/balloc.c
@@ -524,7 +524,7 @@ bitmap_search_next_usable_block(ext2_grp
 	ext2_grpblk_t next;
 
 	next = ext2_find_next_zero_bit(bh->b_data, maxblocks, start);
-	if (next >= maxblocks)
+	if (next >= start + maxblocks)
 		return -1;
 	return next;
 }
_

Anyway, I think that's the bug.  Or a bug, at least.  If so, the cause of
this bug is inadequate code commenting, pure and simple.  And ext3 and ext4
need fixing.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ