lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20101108011919.GN13830@dastard>
Date:	Mon, 8 Nov 2010 12:19:19 +1100
From:	Dave Chinner <david@...morbit.com>
To:	Ted Ts'o <tytso@....edu>, linux-kernel@...r.kernel.org,
	linux-ext4@...r.kernel.org
Subject: Re: [PATCH -BUGFIX] Re: BUG in ext4 with 2.6.37-rc1

On Sun, Nov 07, 2010 at 04:21:43PM -0500, Ted Ts'o wrote:
> On Wed, Nov 03, 2010 at 04:53:06PM -0400, Nick Bowler wrote:
> > 
> > OK, it's 100% reproducible: the kernel BUGs, without fail, every time I
> > do 'make install' in the gcc build tree.  After applying the patch, it
> > seems that the original BUG is gone, but now there's a new one:
> 
> OK, I think I have a new version of the patch that should fix both the
> original and the second BUG_ON which you reported.  Unfortunately,
> I've not been able to reproduce the BUG_ON, even by downloading gcc
> 4.5.1, configuring to build just gcc, and then doing a "make install".
> I'm not sure what languages you had enabled, or how much memory you
> have on your machine, etc., all of which might have made it harder for
> me to repro the bug.  Still, I think I now understand what's causing
> it.
> 
> Can you give this a try and let me know if this fixes things for you?
> 
>     	     	    	    	   	   - Ted
> 
> From 94acf883b316f50b38018f710341a943cdd38d0d Mon Sep 17 00:00:00 2001
> From: Theodore Ts'o <tytso@....edu>
> Date: Sun, 7 Nov 2010 16:11:42 -0500
> Subject: [PATCH -v2] ext4: handle writeback of inodes which are being freed
> 
> The following BUG can occur when an inode which is getting freed when
> it still has dirty pages outstanding, and it gets deleted (in this
> because it was the target of a rename).  In ordered mode, we need to
> make sure the data pages are written just in case we crash before the
> rename (or unlink) is committed.  If the inode is being freed then
> when we try to igrab the inode, we end up tripping the BUG_ON at
> fs/ext4/page-io.c:146.
> 
> In this case, don't need to do any of the endio handling, so we can
> drop the BUG_ON and then arrange to make ext4_bio_end() handle this
> case appropriately by releasing the pages and ending the writeback on
> those pages, and then returning early without queueing the io_end
> structure on the workqueue.
> 
>   kernel BUG at /scratch_space/linux-2.6/fs/ext4/page-io.c:146!
>   Call Trace:
>    [<ffffffff811075b1>] ext4_bio_write_page+0x172/0x307
>    [<ffffffff811033a7>] mpage_da_submit_io+0x2f9/0x37b
>    [<ffffffff811068d7>] mpage_da_map_and_submit+0x2cc/0x2e2
>    [<ffffffff811069b3>] mpage_add_bh_to_extent+0xc6/0xd5
>    [<ffffffff81106c66>] write_cache_pages_da+0x2a4/0x3ac
>    [<ffffffff81107044>] ext4_da_writepages+0x2d6/0x44d
>    [<ffffffff81087910>] do_writepages+0x1c/0x25
>    [<ffffffff810810a4>] __filemap_fdatawrite_range+0x4b/0x4d
>    [<ffffffff810815f5>] filemap_fdatawrite_range+0xe/0x10
>    [<ffffffff81122a2e>] jbd2_journal_begin_ordered_truncate+0x7b/0xa2
>    [<ffffffff8110615d>] ext4_evict_inode+0x57/0x24c
>    [<ffffffff810c14a3>] evict+0x22/0x92
>    [<ffffffff810c1a3d>] iput+0x212/0x249
>    [<ffffffff810bdf16>] dentry_iput+0xa1/0xb9
>    [<ffffffff810bdf6b>] d_kill+0x3d/0x5d
>    [<ffffffff810be613>] dput+0x13a/0x147
>    [<ffffffff810b990d>] sys_renameat+0x1b5/0x258
>    [<ffffffff81145f71>] ? _atomic_dec_and_lock+0x2d/0x4c
>    [<ffffffff810b2950>] ? cp_new_stat+0xde/0xea
>    [<ffffffff810b29c1>] ? sys_newlstat+0x2d/0x38
>    [<ffffffff810b99c6>] sys_rename+0x16/0x18
>    [<ffffffff81002a2b>] system_call_fastpath+0x16/0x1b
> 
> Reported-by: Nick Bowler <nbowler@...iptictech.com>
> Signed-off-by: "Theodore Ts'o" <tytso@....edu>
> ---
>  fs/ext4/page-io.c |   45 ++++++++++++++++++++++-----------------------
>  1 files changed, 22 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
> index 46a7d6a..94bbdb4 100644
> --- a/fs/ext4/page-io.c
> +++ b/fs/ext4/page-io.c
> @@ -143,7 +143,6 @@ ext4_io_end_t *ext4_init_io_end(struct inode *inode, gfp_t flags)
>  	if (io) {
>  		memset(io, 0, sizeof(*io));
>  		io->inode = igrab(inode);
> -		BUG_ON(!io->inode);
>  		INIT_WORK(&io->work, ext4_end_io_work);
>  		INIT_LIST_HEAD(&io->list);
>  	}
> @@ -171,35 +170,15 @@ static void ext4_end_bio(struct bio *bio, int error)
>  	struct workqueue_struct *wq;
>  	struct inode *inode;
>  	unsigned long flags;
> -	ext4_fsblk_t err_block;
>  	int i;
>  
>  	BUG_ON(!io_end);
> -	inode = io_end->inode;
>  	bio->bi_private = NULL;
>  	bio->bi_end_io = NULL;
>  	if (test_bit(BIO_UPTODATE, &bio->bi_flags))
>  		error = 0;
> -	err_block = bio->bi_sector >> (inode->i_blkbits - 9);
>  	bio_put(bio);
>  
> -	if (!(inode->i_sb->s_flags & MS_ACTIVE)) {
> -		pr_err("sb umounted, discard end_io request for inode %lu\n",
> -			io_end->inode->i_ino);
> -		ext4_free_io_end(io_end);
> -		return;
> -	}
> -
> -	if (error) {
> -		io_end->flag |= EXT4_IO_END_ERROR;
> -		ext4_warning(inode->i_sb, "I/O error writing to inode %lu "
> -			     "(offset %llu size %ld starting block %llu)",
> -			     inode->i_ino,
> -			     (unsigned long long) io_end->offset,
> -			     (long) io_end->size,
> -			     (unsigned long long) err_block);
> -	}
> -
>  	for (i = 0; i < io_end->num_io_pages; i++) {
>  		struct page *page = io_end->pages[i]->p_page;
>  		struct buffer_head *bh, *head;
> @@ -254,9 +233,30 @@ static void ext4_end_bio(struct bio *bio, int error)
>  		if (!partial_write)
>  			SetPageUptodate(page);
>  	}
> -
>  	io_end->num_io_pages = 0;
>  
> +	if ((inode = io_end->inode) == NULL)
> +		goto no_work;
> +
> +	if (!(inode->i_sb->s_flags & MS_ACTIVE)) {
> +		pr_err("sb umounted, discard end_io request for inode %lu\n",
> +			io_end->inode->i_ino);
> +	no_work:
> +		ext4_free_io_end(io_end);
> +		return;
> +	}

Doesn't this mean you are tossing away unwritten extent conversion
processing when IO is issued on any inode in the I_WILL_FREE/I_FREEING
state, or completing IO after the unmount is in progress?

Cheers,

Dave.
> +
> +	if (error) {
> +		io_end->flag |= EXT4_IO_END_ERROR;
> +		ext4_warning(inode->i_sb, "I/O error writing to inode %lu "
> +			     "(offset %llu size %ld starting block %llu)",
> +			     inode->i_ino,
> +			     (unsigned long long) io_end->offset,
> +			     (long) io_end->size,
> +			     (unsigned long long)
> +			     bio->bi_sector >> (inode->i_blkbits - 9));
> +	}
> +
>  	/* Add the io_end to per-inode completed io list*/
>  	spin_lock_irqsave(&EXT4_I(inode)->i_completed_io_lock, flags);
>  	list_add_tail(&io_end->list, &EXT4_I(inode)->i_completed_io_list);
> @@ -305,7 +305,6 @@ static int io_submit_init(struct ext4_io_submit *io,
>  	bio->bi_private = io->io_end = io_end;
>  	bio->bi_end_io = ext4_end_bio;
>  
> -	io_end->inode = inode;
>  	io_end->offset = (page->index << PAGE_CACHE_SHIFT) + bh_offset(bh);
>  
>  	io->io_bio = bio;
> -- 
> 1.7.3.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ