linux-ext4 - Re: [PATCH 00/10] ext4: fix inconsistency since reading old metadata from disk

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200611165523.GQ1347934@mit.edu>
Date:   Thu, 11 Jun 2020 12:55:23 -0400
From:   "Theodore Y. Ts'o" <tytso@....edu>
To:     Jan Kara <jack@...e.cz>
Cc:     "zhangyi (F)" <yi.zhang@...wei.com>, linux-ext4@...r.kernel.org,
        adilger.kernel@...ger.ca, zhangxiaoxu5@...wei.com
Subject: Re: [PATCH 00/10] ext4: fix inconsistency since reading old metadata
 from disk

On Thu, Jun 11, 2020 at 10:21:03AM +0200, Jan Kara wrote:
> > I have thought about this solution, we could add a hook in 'struct super_operations'
> > and call it in blkdev_writepage() like blkdev_releasepage() does, and pick out a
> > wrapper from block_write_full_page() to pass our endio handler in, something like
> > this.
> > 
> > static const struct super_operations ext4_sops = {
> > ...
> > 	.bdev_write_page = ext4_bdev_write_page,
> > ...
> > };
> > 
> > static int blkdev_writepage(struct page *page, struct writeback_control *wbc)
> > {
> > 	struct super_block *super = BDEV_I(page->mapping->host)->bdev.bd_super;
> > 
> > 	if (super && super->s_op->bdev_write_page)
> > 		return super->s_op->bdev_write_page(page, blkdev_get_block, wbc);
> > 
> > 	return block_write_full_page(page, blkdev_get_block, wbc);
> > }
> > 
> > But I'm not sure it's a optimal ieda. So I continue to realize the "wb_err"
> > solution now ?
> 
> The above idea looks good to me. I'm fine with either that solution or
> "wb_err" idea so maybe let's leave it for Ted to decide...

My preference would be to be able to get the (error from the callback
right away.  My reasoning behind that is (a) it allows the file system
to be notified about the problem right away, (b) in the case of a file
system resize, we _really_ want to know about the failure ASAP, so we
can fail the resize before we start allocating inodes and blocks to
use the new space, and (c) over time, we might be able to add some
more intelligence handling of some write errors.

For example, we already have a way of handling CRC errors when we are
reading an allocation bitmap; we simply avoid allocating blocks and
inodes from that blockgroup.  Over time, we could theoretically do
other things to try to recover from some write errors --- for example,
we could try allocating a new block for an extent tree block, and try
writing it, and if that succeeds, updating its parent node to point at
the new location.  Is it worth it to try to add that kind of
complexity?  I'm really not sure; at the end of the day, it might be
simpler to just call ext4_error() and abort using the entire file
system until a system administrator can sort out the mess.  But I
think (a) and (b) are still reasons for doing this by intercepting the
writeback error from the buffer head.

Cheers,

						- Ted