[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <9f0f6f1e-01e9-3256-d486-3a877d4e26a0@huawei.com>
Date: Mon, 11 Apr 2022 18:50:59 +0800
From: Zhang Yi <yi.zhang@...wei.com>
To: Jan Kara <jack@...e.cz>
CC: <linux-ext4@...r.kernel.org>, <tytso@....edu>,
<adilger.kernel@...ger.ca>, <yukuai3@...wei.com>,
<yebin10@...wei.com>
Subject: Re: [RFC PATCH] ext4: convert symlink external data block mapping to
bdev
On 2022/4/11 15:55, Jan Kara wrote:
> On Fri 08-04-22 13:45:24, Zhang Yi wrote:
>> On 2022/4/7 21:55, Jan Kara wrote:
>>> On Thu 07-04-22 16:14:24, Zhang Yi wrote:
>>>> On 2022/4/7 1:17, Jan Kara wrote:
>>>>> On Wed 06-04-22 16:45:03, Zhang Yi wrote:
>>>>>> Symlink's external data block is one kind of metadata block, and now
>>>>>> that almost all ext4 metadata block's page cache (e.g. directory blocks,
>>>>>> quota blocks...) belongs to bdev backing inode except the symlink. It
>>>>>> is essentially worked in data=journal mode like other regular file's
>>>>>> data block because probably in order to make it simple for generic VFS
>>>>>> code handling symlinks or some other historical reasons, but the logic
>>>>>> of creating external data block in ext4_symlink() is complicated. and it
>>>>>> also make things confused if user do not want to let the filesystem
>>>>>> worked in data=journal mode. This patch convert the final exceptional
>>>>>> case and make things clean, move the mapping of the symlink's external
>>>>>> data block to bdev like any other metadata block does.
>>>>>>
>>>>>> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
>>>>>> ---
>>>>>> This RFC patch follow the talking of whether if we could unify the
>>>>>> journal mode of ext4 metadata blocks[1], it stop using the data=journal
>>>>>> mode for the final exception case of symlink's external data block. Any
>>>>>> comments are welcome, thanks.
>>>>>>
>>>>>> [1]. https://lore.kernel.org/linux-ext4/20220321151141.hypnhr6o4vng2sa6@quack3.lan/T/#m84b942a6bb838ba60ae8afd906ebbb987a577488
>>>>>>
>>>>>> fs/ext4/inode.c | 9 +---
>>>>>> fs/ext4/namei.c | 123 +++++++++++++++++++++-------------------------
>>>>>> fs/ext4/symlink.c | 44 ++++++++++++++---
>>>>>> 3 files changed, 93 insertions(+), 83 deletions(-)
>>>>>
>>>>> Hum, we don't save on code but I'd say the result is somewhat more
>>>>> standard. So I guess this makes some sense. Let's see what Ted thinks...
>>>>>
>>>>> Otherwise I've found just one small bug below.
>>>>>
>>>>>> @@ -3270,26 +3296,8 @@ static int ext4_symlink(struct user_namespace *mnt_userns, struct inode *dir,
>>>>>> if (err)
>>>>>> return err;
>>>>>>
>>>>>> - if ((disk_link.len > EXT4_N_BLOCKS * 4)) {
>>>>>> - /*
>>>>>> - * For non-fast symlinks, we just allocate inode and put it on
>>>>>> - * orphan list in the first transaction => we need bitmap,
>>>>>> - * group descriptor, sb, inode block, quota blocks, and
>>>>>> - * possibly selinux xattr blocks.
>>>>>> - */
>>>>>> - credits = 4 + EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb) +
>>>>>> - EXT4_XATTR_TRANS_BLOCKS;
>>>>>> - } else {
>>>>>> - /*
>>>>>> - * Fast symlink. We have to add entry to directory
>>>>>> - * (EXT4_DATA_TRANS_BLOCKS + EXT4_INDEX_EXTRA_TRANS_BLOCKS),
>>>>>> - * allocate new inode (bitmap, group descriptor, inode block,
>>>>>> - * quota blocks, sb is already counted in previous macros).
>>>>>> - */
>>>>>> - credits = EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
>>>>>> - EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3;
>>>>>> - }
>>>>>> -
>>>>>> + credits = EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
>>>>>> + EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3;
>>>>>
>>>>> This does not seem like enough credits - we may need to allocate inode, add
>>>>> entry to directory, allocate & initialize symlink block. So I think you
>>>>> need to add 4 for block allocation + init in case of non-fast symlink. And
>>>>> please keep the comment explaining what is actually counted in the number
>>>>> of credits...
>>>>>
>>>>
>>>> Thanks for pointing this out, and ext4_mkdir() seems has the same problem
>>>> too because we also need to allocate one more block to store '.' and '..'
>>>> entries for a new created empty directory.
>>>
>>> OK, I was thinking a bit more about this and the comment was actually a bit
>>> misleading AFAICT. So we have:
>>>
>>> EXT4_INDEX_EXTRA_TRANS_BLOCKS for addition of entry into the directory.
>>> +3 for inode, inode bitmap, group descriptor allocation
>>> EXT4_DATA_TRANS_BLOCKS for the data block allocation and modification.
>>>
>>> So things actually look OK, just the comment was wrong and in the old code
>>> the credits were overestimated (because we've allocated the data block in a
>>> separate transaction).
>>>
>>
>> Yes,I will update the comments in my v2 iteration.
>>
>>>> BTW, look the credits calculation in depth, the definition of
>>>> EXT4_DATA_TRANS_BLOCKS is weird, the '-2' subtraction looks wrong.
>>>>
>>>>> #define EXT4_DATA_TRANS_BLOCKS(sb) (EXT4_SINGLEDATA_TRANS_BLOCKS(sb) + \
>>>>> EXT4_XATTR_TRANS_BLOCKS - 2 + \
>>>>> EXT4_MAXQUOTAS_TRANS_BLOCKS(sb))
>>>>
>>>> I see the history log, before commit[1], the '-2' subtract the 2 more duplicate
>>>> counted super block in '3 * EXT3_SINGLEDATA_TRANS_BLOCKS', but after this commit,
>>>> it seems buggy because we have only count the super block once. It's a long time
>>>> ago, I'm not sure am I missing something?
>>>
>>> Yes, -2 looks strange but at the same time I fail to see why
>>> EXT4_XATTR_TRANS_BLOCKS would need to be accounted for normal data
>>> operation and why we're reserving 6 blocks there. I'll raise it on today's
>>> ext4 call if someone remembers...
>>>
>>
>> I guess the 6 blocks were:
>>
>> 1. Ref count update on old xattr block
>> 2. new xattr block
>> 3. block bitmap update for new xattr block
>> 4. group descriptor for new xattr block
>> 5. block bitmap update for old xattr block
>> 6. group descriptor for old block
>>
>> The 5 and 6 looks like were overestimated in cases of 1) we just update the
>> old ref count to no zero, 2) we free the old xattr block and the credits has
>> already counted in the default revoke credits.
>
> Yes, your explanation of 6 blocks in EXT4_XATTR_TRANS_BLOCKS looks good.
> But I still wonder why we count with modification of xattrs for each data
> block write. EXT4_XATTR_TRANS_BLOCKS was added to EXT4_DATA_TRANS_BLOCKS
> back at the times when it was still ext3 and we have added xattr support to
> ext3. Looking at places where EXT4_DATA_TRANS_BLOCKS is used (mostly in
> fs/ext4/namei.c when adding entry into a directory), this was probably to
> account for a fact that when we create new inode, we may be cloning or
> otherwise modifying xattr block as well. So it seems that
> EXT4_DATA_TRANS_BLOCKS has somewhat misleading name (it should rather be
> called EXT4_INODE_CREATE_BLOCKS or something like that) but in principle we
> indeed need to count with xattr block modifications. Anyway, that's for a
> separate cleanup.
>
Indeed.
Thanks,
Yi.
Powered by blists - more mailing lists