lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b5549a88-6805-99a8-4b0a-3bbf49da794c@huawei.com>
Date:   Tue, 6 Aug 2019 09:36:36 +0800
From:   Chao Yu <yuchao0@...wei.com>
To:     Jaegeuk Kim <jaegeuk@...nel.org>
CC:     <linux-f2fs-devel@...ts.sourceforge.net>,
        <linux-kernel@...r.kernel.org>, <chao@...nel.org>
Subject: Re: [PATCH v2] f2fs: separate NOCoW and pinfile semantics

On 2019/8/6 8:37, Jaegeuk Kim wrote:
> On 08/02, Chao Yu wrote:
>> On 2019/8/2 6:27, Jaegeuk Kim wrote:
>>> On 08/01, Chao Yu wrote:
>>>> On 2019/8/1 12:14, Jaegeuk Kim wrote:
>>>>> On 07/31, Chao Yu wrote:
>>>>>> On 2019/7/31 2:02, Jaegeuk Kim wrote:
>>>>>>> On 07/29, Chao Yu wrote:
>>>>>>>> On 2019/7/29 13:57, Jaegeuk Kim wrote:
>>>>>>>>> On 07/23, Chao Yu wrote:
>>>>>>>>>> On 2019/7/23 10:36, Jaegeuk Kim wrote:
>>>>>>>>>>> On 07/19, Chao Yu wrote:
>>>>>>>>>>>> Pinning a file is heavy, because skipping pinned files make GC
>>>>>>>>>>>> running with heavy load or no effect.
>>>>>>>>>>>
>>>>>>>>>>> Pinned file is a part of NOCOW files, so I don't think we can simply drop it
>>>>>>>>>>> for backward compatibility.
>>>>>>>>>>
>>>>>>>>>> Yes,
>>>>>>>>>>
>>>>>>>>>> But what I concerned is that pin file is too heavy, so in order to satisfy below
>>>>>>>>>> demand, how about introducing pin_file_2 flag to triggering IPU only during
>>>>>>>>>> flush/writeback.
>>>>>>>>>
>>>>>>>>> That can be done by cold files?
>>>>>>>>
>>>>>>>> Then it may inherit property of cold type file, e.g. a) goes into cold area; b)
>>>>>>>> update with very low frequency.
>>>>>>>>
>>>>>>>> Actually pin_file_2 could be used by db-wal/log file, which are updated
>>>>>>>> frequently, and should go to hot/warm area, it does not match above two property.
>>>>>>>
>>>>>>> How about considering another name like "IPU-only mode"?
>>>>>>>
>>>>>>>               fallocate         write    Flag         GC
>>>>>>> Pin_file:     preallocate       IPU      FS_NOCOW_FL  Not allowed
>>>>>>> IPU_file:     Not preallocate   IPU      N/A          Default by temperature
>>>>>>
>>>>>> One question, do we need preallocate physical block address for IPU_file as
>>>>>> Pin_file? since it can enhance db file's sequential read performance, not sure,
>>>>>> db can handle random data in preallocated blocks.
>>>>>
>>>>> db file will do atomic writes, which can not be used with this. -wal may be able
>>>>
>>>> Now WAL mode were set by default in Android, so most of db file are -wal type now.
>>>
>>> Will be back again tho.
>>
>> R?
> 
> Q.
> 
>>
>>>
>>>>
>>>>> to preallocate blocks, but it can eat disk space unnecessarily.
>>>>
>>>> I meant .db-wal file rather than .db.
>>>>
>>>> Yes, that's ext4 style, that would bring better performance due to less holes in
>>>> block distribution.
>>>>
>>>> I don't think we need to worry about space issue for db-wal file. I tracked
>>>> .db-wal file's update before:
>>>> - there are very frequently truncation and deletion, that means the preallocated
>>>> blocks won't exist for long time.
>>>> - and also there are very frequently append writes, I suspect there almost very
>>>> few preallocate block are not written.
>>>> - total db-wal file number is less.
>>>
>>> Sometimes it can be large enough for system.
>>
>> For this, it's trade off:
>> - lose a few disk space at the very begin of db-wal lifecycle Or
>> - face fragment and read performance degradation.
>>
>>> If it's from user apps and short lived, why do we need preallocation?
>>
>> It triggers sequential read on db-wal file during checkpoint, though it's short
>> lived, still it can affect performance.
>>
>> What do you think of doing some performance test on WAL file to decide the
>> preallocation policy?
> 
> Good idea. Can we?

Let me test for numbers later.

Thanks,

> 
>>
>> Thanks,
>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>> Other behaviors looks good to me. :)
>>>>>>
>>>>>> I plan to use last bit in inode.i_inline to store this flag.
>>>>>
>>>>> Why not using i_flag like FS_NOCOW_FL?
>>>>
>>>> Oops, as you listed in last email, I can see you don't want to break
>>>> FS_NOCOW_FL's semantics for backward compatibility.
>>>>
>>>> 			Flag
>>>> IPU_file		N/A			
>>>>
>>>> If we plan to use FS_NOCOW_FL, that's what this patch has already did, you can
>>>> merge it directly... :P
>>>>
>>>>>
>>>>>>
>>>>>>> Cold_file:    Not preallocate   IPU      N/A          Move in cold area
>>>>>>> Hot_file:     Not preallocate   IPU/OPU  N/A          Move in hot area
>>>>>>
>>>>>> Should hot file be gced to hot area? That would mix new hot data with old 'hot'
>>>>>> data which actually become cold.
>>>>>
>>>>> But, user explicitly specified this is hot.
>>>>
>>>> With current implementation, GC will migrate data from hot/warm/cold area to
>>>> cold area.
>>>>
>>>> Thanks,
>>>>
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Thank,
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> So that this patch propose to separate nocow and pinfile semantics:
>>>>>>>>>>>> - NOCoW flag can only be set on regular file.
>>>>>>>>>>>> - NOCoW file will only trigger IPU at common writeback/flush.
>>>>>>>>>>>> - NOCow file will do OPU during GC.
>>>>>>>>>>>>
>>>>>>>>>>>> For the demand of 1) avoid fragment of file's physical block and
>>>>>>>>>>>> 2) userspace don't care about file's specific physical address,
>>>>>>>>>>>> tagging file as NOCoW will be cheaper than pinned one.
>>>>>>>>>>
>>>>>>>>>> ^^^
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Chao Yu <yuchao0@...wei.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>> v2:
>>>>>>>>>>>> - rebase code to fix compile error.
>>>>>>>>>>>>  fs/f2fs/data.c |  3 ++-
>>>>>>>>>>>>  fs/f2fs/f2fs.h |  1 +
>>>>>>>>>>>>  fs/f2fs/file.c | 22 +++++++++++++++++++---
>>>>>>>>>>>>  3 files changed, 22 insertions(+), 4 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>>>>>>>>>>>> index a2a28bb269bf..15fb8954c363 100644
>>>>>>>>>>>> --- a/fs/f2fs/data.c
>>>>>>>>>>>> +++ b/fs/f2fs/data.c
>>>>>>>>>>>> @@ -1884,7 +1884,8 @@ static inline bool check_inplace_update_policy(struct inode *inode,
>>>>>>>>>>>>  
>>>>>>>>>>>>  bool f2fs_should_update_inplace(struct inode *inode, struct f2fs_io_info *fio)
>>>>>>>>>>>>  {
>>>>>>>>>>>> -	if (f2fs_is_pinned_file(inode))
>>>>>>>>>>>> +	if (f2fs_is_pinned_file(inode) ||
>>>>>>>>>>>> +			F2FS_I(inode)->i_flags & F2FS_NOCOW_FL)
>>>>>>>>>>>>  		return true;
>>>>>>>>>>>>  
>>>>>>>>>>>>  	/* if this is cold file, we should overwrite to avoid fragmentation */
>>>>>>>>>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>>>>>>>>>>> index 596ab3e1dd7b..f6c5a3d2e659 100644
>>>>>>>>>>>> --- a/fs/f2fs/f2fs.h
>>>>>>>>>>>> +++ b/fs/f2fs/f2fs.h
>>>>>>>>>>>> @@ -2374,6 +2374,7 @@ static inline void f2fs_change_bit(unsigned int nr, char *addr)
>>>>>>>>>>>>  #define F2FS_NOATIME_FL			0x00000080 /* do not update atime */
>>>>>>>>>>>>  #define F2FS_INDEX_FL			0x00001000 /* hash-indexed directory */
>>>>>>>>>>>>  #define F2FS_DIRSYNC_FL			0x00010000 /* dirsync behaviour (directories only) */
>>>>>>>>>>>> +#define F2FS_NOCOW_FL			0x00800000 /* Do not cow file */
>>>>>>>>>>>>  #define F2FS_PROJINHERIT_FL		0x20000000 /* Create with parents projid */
>>>>>>>>>>>>  
>>>>>>>>>>>>  /* Flags that should be inherited by new inodes from their parent. */
>>>>>>>>>>>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
>>>>>>>>>>>> index 7ca545874060..ae0fec54cac6 100644
>>>>>>>>>>>> --- a/fs/f2fs/file.c
>>>>>>>>>>>> +++ b/fs/f2fs/file.c
>>>>>>>>>>>> @@ -1692,6 +1692,7 @@ static const struct {
>>>>>>>>>>>>  	{ F2FS_NOATIME_FL,	FS_NOATIME_FL },
>>>>>>>>>>>>  	{ F2FS_INDEX_FL,	FS_INDEX_FL },
>>>>>>>>>>>>  	{ F2FS_DIRSYNC_FL,	FS_DIRSYNC_FL },
>>>>>>>>>>>> +	{ F2FS_NOCOW_FL,	FS_NOCOW_FL },
>>>>>>>>>>>>  	{ F2FS_PROJINHERIT_FL,	FS_PROJINHERIT_FL },
>>>>>>>>>>>>  };
>>>>>>>>>>>>  
>>>>>>>>>>>> @@ -1715,7 +1716,8 @@ static const struct {
>>>>>>>>>>>>  		FS_NODUMP_FL |		\
>>>>>>>>>>>>  		FS_NOATIME_FL |		\
>>>>>>>>>>>>  		FS_DIRSYNC_FL |		\
>>>>>>>>>>>> -		FS_PROJINHERIT_FL)
>>>>>>>>>>>> +		FS_PROJINHERIT_FL |	\
>>>>>>>>>>>> +		FS_NOCOW_FL)
>>>>>>>>>>>>  
>>>>>>>>>>>>  /* Convert f2fs on-disk i_flags to FS_IOC_{GET,SET}FLAGS flags */
>>>>>>>>>>>>  static inline u32 f2fs_iflags_to_fsflags(u32 iflags)
>>>>>>>>>>>> @@ -1753,8 +1755,6 @@ static int f2fs_ioc_getflags(struct file *filp, unsigned long arg)
>>>>>>>>>>>>  		fsflags |= FS_ENCRYPT_FL;
>>>>>>>>>>>>  	if (f2fs_has_inline_data(inode) || f2fs_has_inline_dentry(inode))
>>>>>>>>>>>>  		fsflags |= FS_INLINE_DATA_FL;
>>>>>>>>>>>> -	if (is_inode_flag_set(inode, FI_PIN_FILE))
>>>>>>>>>>>> -		fsflags |= FS_NOCOW_FL;
>>>>>>>>>>>>  
>>>>>>>>>>>>  	fsflags &= F2FS_GETTABLE_FS_FL;
>>>>>>>>>>>>  
>>>>>>>>>>>> @@ -1794,6 +1794,22 @@ static int f2fs_ioc_setflags(struct file *filp, unsigned long arg)
>>>>>>>>>>>>  	if (ret)
>>>>>>>>>>>>  		goto out;
>>>>>>>>>>>>  
>>>>>>>>>>>> +	if ((fsflags ^ old_fsflags) & FS_NOCOW_FL) {
>>>>>>>>>>>> +		if (!S_ISREG(inode->i_mode)) {
>>>>>>>>>>>> +			ret = -EINVAL;
>>>>>>>>>>>> +			goto out;
>>>>>>>>>>>> +		}
>>>>>>>>>>>> +
>>>>>>>>>>>> +		if (f2fs_should_update_outplace(inode, NULL)) {
>>>>>>>>>>>> +			ret = -EINVAL;
>>>>>>>>>>>> +			goto out;
>>>>>>>>>>>> +		}
>>>>>>>>>>>> +
>>>>>>>>>>>> +		ret = f2fs_convert_inline_inode(inode);
>>>>>>>>>>>> +		if (ret)
>>>>>>>>>>>> +			goto out;
>>>>>>>>>>>> +	}
>>>>>>>>>>>> +
>>>>>>>>>>>>  	ret = f2fs_setflags_common(inode, iflags,
>>>>>>>>>>>>  			f2fs_fsflags_to_iflags(F2FS_SETTABLE_FS_FL));
>>>>>>>>>>>>  out:
>>>>>>>>>>>> -- 
>>>>>>>>>>>> 2.18.0.rc1
>>>>>>>>>>> .
>>>>>>>>>>>
>>>>>>>>> .
>>>>>>>>>
>>>>>>> .
>>>>>>>
>>>>> .
>>>>>
>>> .
>>>
> .
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ