lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 8 Apr 2022 13:45:24 +0800
From:   Zhang Yi <yi.zhang@...wei.com>
To:     Jan Kara <jack@...e.cz>
CC:     <linux-ext4@...r.kernel.org>, <tytso@....edu>,
        <adilger.kernel@...ger.ca>, <yukuai3@...wei.com>,
        <yebin10@...wei.com>
Subject: Re: [RFC PATCH] ext4: convert symlink external data block mapping to
 bdev

On 2022/4/7 21:55, Jan Kara wrote:
> On Thu 07-04-22 16:14:24, Zhang Yi wrote:
>> On 2022/4/7 1:17, Jan Kara wrote:
>>> On Wed 06-04-22 16:45:03, Zhang Yi wrote:
>>>> Symlink's external data block is one kind of metadata block, and now
>>>> that almost all ext4 metadata block's page cache (e.g. directory blocks,
>>>> quota blocks...) belongs to bdev backing inode except the symlink. It
>>>> is essentially worked in data=journal mode like other regular file's
>>>> data block because probably in order to make it simple for generic VFS
>>>> code handling symlinks or some other historical reasons, but the logic
>>>> of creating external data block in ext4_symlink() is complicated. and it
>>>> also make things confused if user do not want to let the filesystem
>>>> worked in data=journal mode. This patch convert the final exceptional
>>>> case and make things clean, move the mapping of the symlink's external
>>>> data block to bdev like any other metadata block does.
>>>>
>>>> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
>>>> ---
>>>> This RFC patch follow the talking of whether if we could unify the
>>>> journal mode of ext4 metadata blocks[1], it stop using the data=journal
>>>> mode for the final exception case of symlink's external data block. Any
>>>> comments are welcome, thanks.
>>>>
>>>> [1]. https://lore.kernel.org/linux-ext4/20220321151141.hypnhr6o4vng2sa6@quack3.lan/T/#m84b942a6bb838ba60ae8afd906ebbb987a577488
>>>>
>>>>  fs/ext4/inode.c   |   9 +---
>>>>  fs/ext4/namei.c   | 123 +++++++++++++++++++++-------------------------
>>>>  fs/ext4/symlink.c |  44 ++++++++++++++---
>>>>  3 files changed, 93 insertions(+), 83 deletions(-)
>>>
>>> Hum, we don't save on code but I'd say the result is somewhat more
>>> standard. So I guess this makes some sense. Let's see what Ted thinks...
>>>
>>> Otherwise I've found just one small bug below.
>>>
>>>> @@ -3270,26 +3296,8 @@ static int ext4_symlink(struct user_namespace *mnt_userns, struct inode *dir,
>>>>  	if (err)
>>>>  		return err;
>>>>  
>>>> -	if ((disk_link.len > EXT4_N_BLOCKS * 4)) {
>>>> -		/*
>>>> -		 * For non-fast symlinks, we just allocate inode and put it on
>>>> -		 * orphan list in the first transaction => we need bitmap,
>>>> -		 * group descriptor, sb, inode block, quota blocks, and
>>>> -		 * possibly selinux xattr blocks.
>>>> -		 */
>>>> -		credits = 4 + EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb) +
>>>> -			  EXT4_XATTR_TRANS_BLOCKS;
>>>> -	} else {
>>>> -		/*
>>>> -		 * Fast symlink. We have to add entry to directory
>>>> -		 * (EXT4_DATA_TRANS_BLOCKS + EXT4_INDEX_EXTRA_TRANS_BLOCKS),
>>>> -		 * allocate new inode (bitmap, group descriptor, inode block,
>>>> -		 * quota blocks, sb is already counted in previous macros).
>>>> -		 */
>>>> -		credits = EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
>>>> -			  EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3;
>>>> -	}
>>>> -
>>>> +	credits = EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
>>>> +		  EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3;
>>>
>>> This does not seem like enough credits - we may need to allocate inode, add
>>> entry to directory, allocate & initialize symlink block. So I think you
>>> need to add 4 for block allocation + init in case of non-fast symlink. And
>>> please keep the comment explaining what is actually counted in the number
>>> of credits...
>>>
>>
>> Thanks for pointing this out, and ext4_mkdir() seems has the same problem
>> too because we also need to allocate one more block to store '.' and '..'
>> entries for a new created empty directory.
> 
> OK, I was thinking a bit more about this and the comment was actually a bit
> misleading AFAICT. So we have:
> 
> EXT4_INDEX_EXTRA_TRANS_BLOCKS for addition of entry into the directory.
> +3 for inode, inode bitmap, group descriptor allocation
> EXT4_DATA_TRANS_BLOCKS for the data block allocation and modification.
> 
> So things actually look OK, just the comment was wrong and in the old code
> the credits were overestimated (because we've allocated the data block in a
> separate transaction).
> 

Yes´╝îI will update the comments in my v2 iteration.

>> BTW, look the credits calculation in depth, the definition of
>> EXT4_DATA_TRANS_BLOCKS is weird, the '-2' subtraction looks wrong.
>>
>>> #define EXT4_DATA_TRANS_BLOCKS(sb)	(EXT4_SINGLEDATA_TRANS_BLOCKS(sb) + \
>>> 					 EXT4_XATTR_TRANS_BLOCKS - 2 + \
>>> 					 EXT4_MAXQUOTAS_TRANS_BLOCKS(sb))
>>
>> I see the history log, before commit[1], the '-2' subtract the 2 more duplicate
>> counted super block in '3 * EXT3_SINGLEDATA_TRANS_BLOCKS', but after this commit,
>> it seems buggy because we have only count the super block once. It's a long time
>> ago, I'm not sure am I missing something?
> 
> Yes, -2 looks strange but at the same time I fail to see why
> EXT4_XATTR_TRANS_BLOCKS would need to be accounted for normal data
> operation and why we're reserving 6 blocks there. I'll raise it on today's
> ext4 call if someone remembers...
> 

I guess the 6 blocks were:

1. Ref count update on old xattr block
2. new xattr block
3. block bitmap update for new xattr block
4. group descriptor for new xattr block
5. block bitmap update for old xattr block
6. group descriptor for old block

The 5 and 6 looks like were overestimated in cases of 1) we just update the
old ref count to no zero, 2) we free the old xattr block and the credits has
already counted in the default revoke credits.

Thanks,
Yi.

>> [1]. https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=2df2c24aa6d2cd56777570d96494b921568b4405

Powered by blists - more mailing lists