linux-kernel - Re: Clear PG_error before reading a page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20070515101144.f7072476.akpm@linux-foundation.org>
Date:	Tue, 15 May 2007 10:11:44 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Johann Lombardi <johann@...sterfs.com>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: Clear PG_error before reading a page

On Tue, 15 May 2007 16:37:26 +0200 Johann Lombardi <johann@...sterfs.com> wrote:

> We've observed that transient disk errors can result in persistent I/O issues
> at the filesystem level. I can at least put forward the problem with ext2/ext3
> and scsci_debug.
> 
> Here are the steps to reproduce:
> * format an ext3 fs on a SCSI disk simulated by scsci_debug
> * mount the fs and create a 10MB file
> * unmount/remount the fs
> * enable the medium error flag in scsi_debug (please note that I've slightly
>   modified scsi_debug to return a medium error indication when any sector >=
>   0x1234 is read)
> * try to read the file (as expected, got -EIO)
> * disable medium error injection in scsi_debug
> * try to read the file (keep getting -EIO)
> 
> In fact, when the underlying device experiences errors, block_read_full_page()
> calls ext3_get_block() which correctly returns an I/O error.
> Consequently, block_read_full_page() set the PG_error flag.
> However, the problem is that afterwards, when the device no longer returns
> any errors, PG_error is never cleared and, as a consequence, we are still
> getting -EIO at the filesystem level.
> 
> Granted that block_read_full_page() always tries to fill a full page,
> I think it should first clear the PG_error flag and set it back if a call
> to get_block() fails. Besides, we should handle this in block_read_full_page
> but also in other places that do full page reads. Any comments?
> I am not able to reproduce the problem with the patch below.
> 
> Johann
> 
> Signed-off-by: Johann Lombardi <johann@...sterfs.com>
> --
> 
> --- linux-2.6.12.6.orig/fs/buffer.c	2005-08-29 18:55:27.000000000 +0200
> +++ linux-2.6.12.6/fs/buffer.c	2007-05-11 15:51:03.000000000 +0200
> @@ -2078,6 +2078,7 @@ int block_read_full_page(struct page *pa
>  	int fully_mapped = 1;
>  
>  	BUG_ON(!PageLocked(page));
> +	ClearPageError(page);
>  	blocksize = 1 << inode->i_blkbits;
>  	if (!page_has_buffers(page))
>  		create_empty_buffers(page, blocksize, 0);

We need to make sure that this page has PG_uptodate cleared, so
that a re-read is forced.  And the affected buffer_head, if any, should have
buffer_uptodate() cleared.

This change might have horrid interactions with readahead and various
application access patterns: if for some reason a dud sector takes 30
seconds of driver futzing to return -EIO and someone (application or
kernel) tries to read the same sector tens or hundreds of times, suckiness
ensues.  This will be hard to test for.

Most reads don't (or shouldn't) go through block_read_full_page(). 
mpage_readpages() does the heavy lifting.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/