[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <BB31D81D-F4A9-490E-8F9D-2BC6350CE6B0@gmail.com>
Date: Fri, 19 Feb 2021 16:53:05 +0300
From: Alexey Lyashkov <alexey.lyashkov@...il.com>
To: Lukas Czerner <lczerner@...hat.com>
Cc: Andreas Dilger <adilger@...ger.ca>,
Artem Blagodarenko <artem.blagodarenko@...il.com>,
linux-ext4 <linux-ext4@...r.kernel.org>,
Eric Sandeen <sandeen@...hat.com>
Subject: Re: [PATCH v2] mmp: do not use O_DIRECT when working with regular
file
> 19 февр. 2021 г., в 16:34, Lukas Czerner <lczerner@...hat.com> написал(а):
>
> On Fri, Feb 19, 2021 at 02:49:16PM +0300, Alexey Lyashkov wrote:
>> Lukas,
>>
>> because e2fsprogs have an bad assumption about IO size for the O_DIRECT case.
>> and because library uses a code like
>>>>
>> set_block_size(1k);
>> seek(fs, 1);
>> read_block();
>>>>>
>> which caused an 1k read inside of 4k disk block size not aligned by block size, which is prohibited and caused an error report.
>>
>> Reference to patch.
>> https://patchwork.ozlabs.org/project/linux-ext4/patch/20201023112659.1559-1-artem.blagodarenko@gmail.com/
>
> Alright, I skimmed through your patch proposal and I am not sure I
> completely understand the problem because you have not provided the code
> adding O_DIRECT support for e2image.
debugfs -D … will hit same problem.
>
> However I think that it is a reasonable assumption to make that there is
> not going to be a file system on a block device such that the fs blocksize
> is smaller than device sector size. You can't create such fs with mke2fs
> and you can't mount such file system either.
>
This is don’t need to be create this FS, calling an ext2_open2 is enough.
> All that said I can now see that there is a problem in case of mke2fs
> and debugfs when used with O_DIRECT (-D) on a file system image with 1k
> block size stored on a file in the host file system on the block device
> with sector size larger than 1k (...I am getting Inception flashbacks now)
if you have open a large (>256T) device with debugfs without -D you will be see a large swap.
once this FS want to consume a ~10G for bitmaps and other parts.
with cached read you memory consumption increased by two.
btw.
you can easy replicate it with losetup which able to specify a block size.
>
> In fact I can confirm that indeed, both mke2fs and debugfs will fail in
> such scenario. The question is whether we care enough to support
> O_DIRECT in such situations. Personally I don't care enough about this.
> However it would be nice to at least have a check (probably in
> ext2fs_open2, unix_open_channel or such) and notify user about the
> problem.
it’s not a tools problem. It’s problem of e2fsprogs library as ext2_open2 affected by this bug.
But this is not a single function where bug lives.
>
> Note that this conversation does not affect my patch since
> ext2fs_mmp_read() does not use the unix_io infrastructure.
>
It’s good to convert to use it.
Alex
> -Lukas
>
>>
>> Alex
>>
>>> 19 февр. 2021 г., в 13:57, Lukas Czerner <lczerner@...hat.com> написал(а):
>>>
>>> On Fri, Feb 19, 2021 at 01:08:17PM +0300, Alexey Lyashkov wrote:
>>>> Andreas,
>>>>
>>>> What about to disable a O_DIRECT global on any block devices in the e2fsprogs library as this don’t work on 4k disk drives at all ?
>>>> Instead of fixing an O_DIRECT access with patches sends early.
>>>
>>> Why would it not work at all ? This is a fix for a specific problem and
>>> I am not currently aware of ony other problems e2fsprogs should have
>>> with 4k sector size drives. Do you have a specific problem in mind ?
>>>
>>> Thanks!
>>> -Lukas
>>>
>>>>
>>>>
>>>> Alex
>>>>
>>>>> 19 февр. 2021 г., в 1:20, Andreas Dilger <adilger@...ger.ca> написал(а):
>>>>>
>>>>> On Feb 18, 2021, at 2:51 AM, Lukas Czerner <lczerner@...hat.com> wrote:
>>>>>>
>>>>>> Currently the mmp block is read using O_DIRECT to avoid any caching that
>>>>>> may be done by the VM. However when working with regular files this
>>>>>> creates alignment issues when the device of the host file system has
>>>>>> sector size larger than the blocksize of the file system in the file
>>>>>> we're working with.
>>>>>>
>>>>>> This can be reproduced with t_mmp_fail test when run on the device with
>>>>>> 4k sector size because the mke2fs fails when trying to read the mmp
>>>>>> block.
>>>>>>
>>>>>> Fix it by disabling O_DIRECT when working with regular files. I don't
>>>>>> think there is any risk of doing so since the file system layer, unlike
>>>>>> shared block device, should guarantee cache consistency.
>>>>>>
>>>>>> Signed-off-by: Lukas Czerner <lczerner@...hat.com>
>>>>>> Reviewed-by: Eric Sandeen <sandeen@...hat.com>
>>>>>
>>>>> Reviewed-by: Andreas Dilger <adilger@...ger.ca>
>>>>>
>>>>>> ---
>>>>>> v2: Fix comment - it avoids problems when the sector size is larger not
>>>>>> smaller than blocksize
>>>>>>
>>>>>> lib/ext2fs/mmp.c | 22 +++++++++++-----------
>>>>>> 1 file changed, 11 insertions(+), 11 deletions(-)
>>>>>>
>>>>>> diff --git a/lib/ext2fs/mmp.c b/lib/ext2fs/mmp.c
>>>>>> index c21ae272..cca2873b 100644
>>>>>> --- a/lib/ext2fs/mmp.c
>>>>>> +++ b/lib/ext2fs/mmp.c
>>>>>> @@ -57,21 +57,21 @@ errcode_t ext2fs_mmp_read(ext2_filsys fs, blk64_t mmp_blk, void *buf)
>>>>>> * regardless of how the io_manager is doing reads, to avoid caching of
>>>>>> * the MMP block by the io_manager or the VM. It needs to be fresh. */
>>>>>> if (fs->mmp_fd <= 0) {
>>>>>> + struct stat st;
>>>>>> int flags = O_RDWR | O_DIRECT;
>>>>>>
>>>>>> -retry:
>>>>>> + /*
>>>>>> + * There is no reason for using O_DIRECT if we're working with
>>>>>> + * regular file. Disabling it also avoids problems with
>>>>>> + * alignment when the device of the host file system has sector
>>>>>> + * size larger than blocksize of the fs we're working with.
>>>>>> + */
>>>>>> + if (stat(fs->device_name, &st) == 0 &&
>>>>>> + S_ISREG(st.st_mode))
>>>>>> + flags &= ~O_DIRECT;
>>>>>> +
>>>>>> fs->mmp_fd = open(fs->device_name, flags);
>>>>>> if (fs->mmp_fd < 0) {
>>>>>> - struct stat st;
>>>>>> -
>>>>>> - /* Avoid O_DIRECT for filesystem image files if open
>>>>>> - * fails, since it breaks when running on tmpfs. */
>>>>>> - if (errno == EINVAL && (flags & O_DIRECT) &&
>>>>>> - stat(fs->device_name, &st) == 0 &&
>>>>>> - S_ISREG(st.st_mode)) {
>>>>>> - flags &= ~O_DIRECT;
>>>>>> - goto retry;
>>>>>> - }
>>>>>> retval = EXT2_ET_MMP_OPEN_DIRECT;
>>>>>> goto out;
>>>>>> }
>>>>>> --
>>>>>> 2.26.2
>>>>>>
>>>>>
>>>>>
>>>>> Cheers, Andreas
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
Powered by blists - more mailing lists