[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190801222746.GA27597@jaegeuk-macbookpro.roam.corp.google.com>
Date: Thu, 1 Aug 2019 15:27:46 -0700
From: Jaegeuk Kim <jaegeuk@...nel.org>
To: Chao Yu <yuchao0@...wei.com>
Cc: linux-f2fs-devel@...ts.sourceforge.net,
linux-kernel@...r.kernel.org, chao@...nel.org
Subject: Re: [PATCH v2] f2fs: separate NOCoW and pinfile semantics
On 08/01, Chao Yu wrote:
> On 2019/8/1 12:14, Jaegeuk Kim wrote:
> > On 07/31, Chao Yu wrote:
> >> On 2019/7/31 2:02, Jaegeuk Kim wrote:
> >>> On 07/29, Chao Yu wrote:
> >>>> On 2019/7/29 13:57, Jaegeuk Kim wrote:
> >>>>> On 07/23, Chao Yu wrote:
> >>>>>> On 2019/7/23 10:36, Jaegeuk Kim wrote:
> >>>>>>> On 07/19, Chao Yu wrote:
> >>>>>>>> Pinning a file is heavy, because skipping pinned files make GC
> >>>>>>>> running with heavy load or no effect.
> >>>>>>>
> >>>>>>> Pinned file is a part of NOCOW files, so I don't think we can simply drop it
> >>>>>>> for backward compatibility.
> >>>>>>
> >>>>>> Yes,
> >>>>>>
> >>>>>> But what I concerned is that pin file is too heavy, so in order to satisfy below
> >>>>>> demand, how about introducing pin_file_2 flag to triggering IPU only during
> >>>>>> flush/writeback.
> >>>>>
> >>>>> That can be done by cold files?
> >>>>
> >>>> Then it may inherit property of cold type file, e.g. a) goes into cold area; b)
> >>>> update with very low frequency.
> >>>>
> >>>> Actually pin_file_2 could be used by db-wal/log file, which are updated
> >>>> frequently, and should go to hot/warm area, it does not match above two property.
> >>>
> >>> How about considering another name like "IPU-only mode"?
> >>>
> >>> fallocate write Flag GC
> >>> Pin_file: preallocate IPU FS_NOCOW_FL Not allowed
> >>> IPU_file: Not preallocate IPU N/A Default by temperature
> >>
> >> One question, do we need preallocate physical block address for IPU_file as
> >> Pin_file? since it can enhance db file's sequential read performance, not sure,
> >> db can handle random data in preallocated blocks.
> >
> > db file will do atomic writes, which can not be used with this. -wal may be able
>
> Now WAL mode were set by default in Android, so most of db file are -wal type now.
Will be back again tho.
>
> > to preallocate blocks, but it can eat disk space unnecessarily.
>
> I meant .db-wal file rather than .db.
>
> Yes, that's ext4 style, that would bring better performance due to less holes in
> block distribution.
>
> I don't think we need to worry about space issue for db-wal file. I tracked
> .db-wal file's update before:
> - there are very frequently truncation and deletion, that means the preallocated
> blocks won't exist for long time.
> - and also there are very frequently append writes, I suspect there almost very
> few preallocate block are not written.
> - total db-wal file number is less.
Sometimes it can be large enough for system. If it's from user apps and short
lived, why do we need preallocation?
>
> >
> >>
> >> Other behaviors looks good to me. :)
> >>
> >> I plan to use last bit in inode.i_inline to store this flag.
> >
> > Why not using i_flag like FS_NOCOW_FL?
>
> Oops, as you listed in last email, I can see you don't want to break
> FS_NOCOW_FL's semantics for backward compatibility.
>
> Flag
> IPU_file N/A
>
> If we plan to use FS_NOCOW_FL, that's what this patch has already did, you can
> merge it directly... :P
>
> >
> >>
> >>> Cold_file: Not preallocate IPU N/A Move in cold area
> >>> Hot_file: Not preallocate IPU/OPU N/A Move in hot area
> >>
> >> Should hot file be gced to hot area? That would mix new hot data with old 'hot'
> >> data which actually become cold.
> >
> > But, user explicitly specified this is hot.
>
> With current implementation, GC will migrate data from hot/warm/cold area to
> cold area.
>
> Thanks,
>
> >
> >>
> >> Thanks,
> >>
> >>>
> >>>>
> >>>> Thank,
> >>>>
> >>>>>
> >>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> So that this patch propose to separate nocow and pinfile semantics:
> >>>>>>>> - NOCoW flag can only be set on regular file.
> >>>>>>>> - NOCoW file will only trigger IPU at common writeback/flush.
> >>>>>>>> - NOCow file will do OPU during GC.
> >>>>>>>>
> >>>>>>>> For the demand of 1) avoid fragment of file's physical block and
> >>>>>>>> 2) userspace don't care about file's specific physical address,
> >>>>>>>> tagging file as NOCoW will be cheaper than pinned one.
> >>>>>>
> >>>>>> ^^^
> >>>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>>>>
> >>>>>>>> Signed-off-by: Chao Yu <yuchao0@...wei.com>
> >>>>>>>> ---
> >>>>>>>> v2:
> >>>>>>>> - rebase code to fix compile error.
> >>>>>>>> fs/f2fs/data.c | 3 ++-
> >>>>>>>> fs/f2fs/f2fs.h | 1 +
> >>>>>>>> fs/f2fs/file.c | 22 +++++++++++++++++++---
> >>>>>>>> 3 files changed, 22 insertions(+), 4 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> >>>>>>>> index a2a28bb269bf..15fb8954c363 100644
> >>>>>>>> --- a/fs/f2fs/data.c
> >>>>>>>> +++ b/fs/f2fs/data.c
> >>>>>>>> @@ -1884,7 +1884,8 @@ static inline bool check_inplace_update_policy(struct inode *inode,
> >>>>>>>>
> >>>>>>>> bool f2fs_should_update_inplace(struct inode *inode, struct f2fs_io_info *fio)
> >>>>>>>> {
> >>>>>>>> - if (f2fs_is_pinned_file(inode))
> >>>>>>>> + if (f2fs_is_pinned_file(inode) ||
> >>>>>>>> + F2FS_I(inode)->i_flags & F2FS_NOCOW_FL)
> >>>>>>>> return true;
> >>>>>>>>
> >>>>>>>> /* if this is cold file, we should overwrite to avoid fragmentation */
> >>>>>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> >>>>>>>> index 596ab3e1dd7b..f6c5a3d2e659 100644
> >>>>>>>> --- a/fs/f2fs/f2fs.h
> >>>>>>>> +++ b/fs/f2fs/f2fs.h
> >>>>>>>> @@ -2374,6 +2374,7 @@ static inline void f2fs_change_bit(unsigned int nr, char *addr)
> >>>>>>>> #define F2FS_NOATIME_FL 0x00000080 /* do not update atime */
> >>>>>>>> #define F2FS_INDEX_FL 0x00001000 /* hash-indexed directory */
> >>>>>>>> #define F2FS_DIRSYNC_FL 0x00010000 /* dirsync behaviour (directories only) */
> >>>>>>>> +#define F2FS_NOCOW_FL 0x00800000 /* Do not cow file */
> >>>>>>>> #define F2FS_PROJINHERIT_FL 0x20000000 /* Create with parents projid */
> >>>>>>>>
> >>>>>>>> /* Flags that should be inherited by new inodes from their parent. */
> >>>>>>>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> >>>>>>>> index 7ca545874060..ae0fec54cac6 100644
> >>>>>>>> --- a/fs/f2fs/file.c
> >>>>>>>> +++ b/fs/f2fs/file.c
> >>>>>>>> @@ -1692,6 +1692,7 @@ static const struct {
> >>>>>>>> { F2FS_NOATIME_FL, FS_NOATIME_FL },
> >>>>>>>> { F2FS_INDEX_FL, FS_INDEX_FL },
> >>>>>>>> { F2FS_DIRSYNC_FL, FS_DIRSYNC_FL },
> >>>>>>>> + { F2FS_NOCOW_FL, FS_NOCOW_FL },
> >>>>>>>> { F2FS_PROJINHERIT_FL, FS_PROJINHERIT_FL },
> >>>>>>>> };
> >>>>>>>>
> >>>>>>>> @@ -1715,7 +1716,8 @@ static const struct {
> >>>>>>>> FS_NODUMP_FL | \
> >>>>>>>> FS_NOATIME_FL | \
> >>>>>>>> FS_DIRSYNC_FL | \
> >>>>>>>> - FS_PROJINHERIT_FL)
> >>>>>>>> + FS_PROJINHERIT_FL | \
> >>>>>>>> + FS_NOCOW_FL)
> >>>>>>>>
> >>>>>>>> /* Convert f2fs on-disk i_flags to FS_IOC_{GET,SET}FLAGS flags */
> >>>>>>>> static inline u32 f2fs_iflags_to_fsflags(u32 iflags)
> >>>>>>>> @@ -1753,8 +1755,6 @@ static int f2fs_ioc_getflags(struct file *filp, unsigned long arg)
> >>>>>>>> fsflags |= FS_ENCRYPT_FL;
> >>>>>>>> if (f2fs_has_inline_data(inode) || f2fs_has_inline_dentry(inode))
> >>>>>>>> fsflags |= FS_INLINE_DATA_FL;
> >>>>>>>> - if (is_inode_flag_set(inode, FI_PIN_FILE))
> >>>>>>>> - fsflags |= FS_NOCOW_FL;
> >>>>>>>>
> >>>>>>>> fsflags &= F2FS_GETTABLE_FS_FL;
> >>>>>>>>
> >>>>>>>> @@ -1794,6 +1794,22 @@ static int f2fs_ioc_setflags(struct file *filp, unsigned long arg)
> >>>>>>>> if (ret)
> >>>>>>>> goto out;
> >>>>>>>>
> >>>>>>>> + if ((fsflags ^ old_fsflags) & FS_NOCOW_FL) {
> >>>>>>>> + if (!S_ISREG(inode->i_mode)) {
> >>>>>>>> + ret = -EINVAL;
> >>>>>>>> + goto out;
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> + if (f2fs_should_update_outplace(inode, NULL)) {
> >>>>>>>> + ret = -EINVAL;
> >>>>>>>> + goto out;
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> + ret = f2fs_convert_inline_inode(inode);
> >>>>>>>> + if (ret)
> >>>>>>>> + goto out;
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> ret = f2fs_setflags_common(inode, iflags,
> >>>>>>>> f2fs_fsflags_to_iflags(F2FS_SETTABLE_FS_FL));
> >>>>>>>> out:
> >>>>>>>> --
> >>>>>>>> 2.18.0.rc1
> >>>>>>> .
> >>>>>>>
> >>>>> .
> >>>>>
> >>> .
> >>>
> > .
> >
Powered by blists - more mailing lists