linux-ext4 - Re: [PATCH] ext4: Ensure zeroout blocks have no dirty metadata

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6601abe90912101001o1f6f6ed6jc6ecb6e4c26d2463@mail.gmail.com>
Date:	Thu, 10 Dec 2009 10:01:21 -0800
From:	Curt Wohlgemuth <curtw@...gle.com>
To:	Eric Sandeen <sandeen@...hat.com>
Cc:	ext4 development <linux-ext4@...r.kernel.org>
Subject: Re: [PATCH] ext4: Ensure zeroout blocks have no dirty metadata

On Thu, Dec 10, 2009 at 9:44 AM, Eric Sandeen <sandeen@...hat.com> wrote:
> Curt Wohlgemuth wrote:
>> This fixes a bug in which new blocks returned from an extent created with
>> ext4_ext_zeroout() can have dirty metadata still associated with them.
>
> Do you have a testcase or at least end result details for this corruption?

<sigh>  I wish I had a testcase.  I do know some facts about the
workload(s) involved:

1. No journal
2. Nearly full ext2 partition, mounted as ext4 -- tune2fs used to turn
on "extent dir_index"
3. Existing non-extent based files are removed, new (extent-based)
files are created.
4. Files created with O_DIRECT, ~8MB, fallocate(KEEP_SIZE) used,
writes are generally in increasing offset order, ~64K usually.
5. Some (~1%) writes are to holes in the fallocate'd file; we're not
yet using Mingming's patches, so these writes fall back to buffered
writes.
6. Lots of processes, lots of threads.
7. End result is a block (or sometimes 2 blocks) of all zeros, at a
block-aligned offset.  This zero'ed block is *always* somewhere in a
14-block extent created from using ext4_ext_zeroout().

Lots and lots of tracing showed that these blocks were originally
dirtied metadata blocks from unlinked (hence truncated) non-extent
based files.  These indirect blocks in ext4 have their direct block
pointers turned to zero as the blocks are truncated, and these
indirect blocks are marked as dirty.  Even though bforget() is called
on these metadata blocks, bforget() won't wait on the buffer.  Waiting
on the buffer is meant to happen at allocate time, for "new" buffers.

>
>>       Signed-off-by: Curt Wohlgemuth <curtw@...gle.com>
>> ---
>>
>> This is for the problem I reported on 23 Nov ("Bug in extent zeroout: blocks
>> not marked as new").  I'm not seeing the corruption with this fix that I was
>> seeing without it.
>>
>> diff -uprN orig/fs/ext4/extents.c new/fs/ext4/extents.c
>> --- orig/fs/ext4/extents.c    2009-12-09 15:09:25.000000000 -0800
>> +++ new/fs/ext4/extents.c     2009-12-09 15:09:37.000000000 -0800
>> @@ -2474,9 +2474,21 @@ static int ext4_ext_zeroout(struct inode
>>               submit_bio(WRITE, bio);
>>               wait_for_completion(&event);
>>
>> -             if (test_bit(BIO_UPTODATE, &bio->bi_flags))
>> +             /* On success, we need to insure all metadata associated
>
> nitpick, "ensure" I think, although I guess they're mostly synonymous
> today so do with that what you will :)

I'll let Ted decide :-)

>
> -Eric
>
>> +              * with each of these blocks is unmapped. */
>> +             if (test_bit(BIO_UPTODATE, &bio->bi_flags)) {
>> +                     sector_t block = ee_pblock;
>> +
>>                       ret = 0;
>> -             else {
>> +                     done = 0;
>> +                     while (done < len) {
>> +                             unmap_underlying_metadata(inode->i_sb->s_bdev,
>> +                                                       block);
>> +
>> +                             done++;
>> +                             block++;
>> +                     }
>> +             } else {
>>                       ret = -EIO;
>>                       break;
>>               }
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@...r.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html