lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 20 Apr 2021 22:05:35 +0800 From: yebin <yebin10@...wei.com> To: Theodore Ts'o <tytso@....edu> CC: <adilger.kernel@...ger.ca>, <linux-ext4@...r.kernel.org>, <linux-kernel@...r.kernel.org> Subject: Re: [PATCH] ext4: Fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed On 2021/4/9 9:47, Theodore Ts'o wrote: > On Wed, Apr 07, 2021 at 09:41:57AM +0800, yebin wrote: >>>> If call ext4_ext_insert_extent failed but new extent already inserted, we just >>>> update "ex->ee_len = orig_ex.ee_len", this will lead to extent overlap, then >>>> cause bug on when cache extent. >>> How did this happen in the first place? It sounds like if the extent >>> was already inserted, that would be casue there was an on-disk file >>> system corruption, no? >>> >>> In that case, shouldn't we call ext4_error() to declare the file >>> system has an inconsistency, so it can be fixed by fsck? >> We inject IO fault when runing fsstress, JBD detect IO error then trigger >> JBD abort. At the same time, >> if ext4_ext_insert_extent already insert new exntent then call >> ext4_ext_dirty to dirty metadata , but >> JBD already aborted , ext4_ext_dirty will return error. >> In ext4_ext_dirty function call ext4_ext_check_inode check extent if ok, if >> not, trigger BUG_ON and >> also print extent detail information. > In this particular case, skipping the "ex->ee_len = orig_ex.ee_len" > may avoid the BUG_ON. But it's not clear that this is always the > right thing to do. The fundamental question is what should we do we > run into an error while we are in the middle of making changes to > on-disk and in-memory data structures? > > In the ideal world, we should undo the changes that we were in the > middle of making before we return an error. That way, the semantics > are very clear; on success, the function has made the requested change > to the file system. If the function returns an error, then no changes > should be made. > > That was the reasoning behind resetting ex->ee_len to orig_ex.ee_len > in the fix_extent_len inside ext4_split_extent_at(). Unofrtunately, > ext4_ext_insert_extent() does *not* always follow this convention, and > that's because it would be extremely difficult for it to do so --- the > mutations that it makes can be quite complex, including potentially > increasing the height of the extent tree. > > However, I don't think your fix is by any means the ideal one, because > the much more common way that ext4_ext_insert_extent() is when it > needs to insert a new leaf node, or need to increase the height of the > extent tree --- and in it returns an ENOSPC failure. In that case, it > won't have made any changes changes in the extent tree, and so having > ext4_split_extent_at() undo the change to ex->ee_len is the right > thing to do. > > Having blocks get leaked when there is an ENOSPC failure, requiring > fsck to be run --- and without giving the user any warning that this > has happened is *not* a good way to fail. So I don't think the > proposed patch is the right way to go. > > A better way to go would be to teach ext4_ext_insert_extent() so if > there is a late failure, that it unwinds the leaf node back to its > original state (at least from a semantic value). Since the extent > leaf node could have been split, and/or adjacent extent entries may > have been merged, what it would need to do is to remember the starting > block number and length, and make whatever changes are necessaries to > the extent entries in that leaf node corresponding to that starting > block number and length. > > If you don't want to do that, then a "do no harm" fix would be > something like this: > > ... > } else if (err == -EROFS) { > return err; > } else if (err) > goto fix_extent_len; Thanks for your advice. I will send v2 patch. > > So in the journal abort case, when err is set to EROFS, we don't try > to reset the length, since in theory the file system is read-only > already anyway. However, in the ENOSPC case, we won't end up silently > leaking blocks that will be lost until the user somehow decides to run > fsck. > > There are still times when this doesn't get things completely right > (e.g., what if we get a late ENOMEM error versus an early ENOMEM > failure), where the only real fix is to make ext4_ext_insert_extent() > obey the convention that if it returns an error, it must not result in > any user-visible state change. > > Cheers, > > - Ted > .
Powered by blists - more mailing lists