[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <274AFF41-0B92-4486-8175-39AEE11D3C5A@dilger.ca>
Date: Tue, 3 Jul 2018 11:16:32 -0600
From: Andreas Dilger <adilger@...ger.ca>
To: "Theodore Y. Ts'o" <tytso@....edu>
Cc: Lukas Czerner <lczerner@...hat.com>, linux-ext4@...r.kernel.org
Subject: Re: [PATCH] e2fsck: do not allow initialized blocks pass i_size
On Jul 2, 2018, at 2:30 PM, Theodore Y. Ts'o <tytso@....edu> wrote:
>
> On Fri, Jun 29, 2018 at 01:35:41PM -0600, Andreas Dilger wrote:
>>>>> Right. So there are two choices:
>>>>>
>>>>> 1) Keep the blocks beyond i_size marked as uninitialized. You
>>>>> transfer and write the full PAGE_SIZE of data, but it simply will
>>>>> never be available to the user.
>>>
>>> Yes, that's for extent mapped files.
>>>
>>>>> 2) Zero the page, write it out to the file, and then extend i_size and
>>>>> mark the extents as uninitialized.
>>>
>>> Except at that point you do not really need to mark the extent as
>>> unitialized, the blocks are allocated and written to and i_size is
>>> extended. That's how it needs to be done for indirect block mapped
>>> files.
>
>>>>> Why is it that Lustre is choosing to keep i_size where it is, but to
>>>>> mark the blocks beyond it as initialized?
>>>>
>>>> This isn't about initialized vs. uninitialized extents. It is only about
>>>> allocated vs. unallocated blocks, possibly with block-mapped files. There
>>>> is no way to have uninitialized blocks with a block-mapped file.
>
> Does Lustre really support block-mapped files today? If so, why?
We used to support block-mapped files on the data servers, and we
can't say for sure that all such files are gone. Also, we recently
added a feature to support small files on the metadata servers, which
are formatted without extents because they are < 16TB and it is more
efficient to use block-mapped dirs than extent-mapped dirs.
> And if it must support block-mapped files and not just only
> extent-mapped files, is there any reason why Lustre can just make sure
> (a) there are no blocks allocated past i_size --- ext4 can handle this
> case just fine, even if that means there are parts of the page which
> are not mapped to a block. Alternatively, (b) if (a) is impossible,
> to simply make sure i_size is moved to page_size boundary and all of
> the allocated blocks are zero'ed if they haven't been written yet?
I would have to see how hard (a) is to implement, but it was definitely
implemented in this way for a reason in the first place.
I don't see how (b) is possible, since i_size will not be correct
in that case? We definitely zero the end of the page beyond i_size
so that the data is correct if the file is truncated to a larger
size, or blocks are written beyond i_size.
>> Like I said previously, this is done with Lustre, which has a different IO submission path than stock ext4. I don't think
>> there is any requirement that this only be in upstream ext4,
>> since e2fsprogs also has code to support running on BSD, Windows,
>> even Hurd.
>
> If neither (a) or (b) is possible, I'm willing to entertain this. If
> we have to go down that path, then we it should be something that
> should be configured, perhaps via /etc/e2fsck.conf. The reason for
> this is Lustre really is minority use case; and it is *useful* for
> e2fsck to flag cases where there are initialized blocks past, i_size,
> since it should never happen with the Linux stack. And if it does,
> it's a bug, and we should (for example) flag it when running xfstests.
>
> So I think what I'm going to do for 1.44.3 is to take Lukas's patch.
>
> We can possibly put it back under some kind of conditional, either via
> e2fsck.conf, or via some kind of superblock flag. Or it can be
> something that can be patched back in for the Lustre fork of
> e2fsprogs.
>
> - Ted
Cheers, Andreas
Download attachment "signature.asc" of type "application/pgp-signature" (874 bytes)
Powered by blists - more mailing lists