linux-kernel - Re: [f2fs-dev] [PATCH v2] f2fs: fix to read source block before invalidating it

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <01d0c12b-e749-42a0-66c2-4922b482030f@huawei.com>
Date:   Mon, 29 Jul 2019 15:14:14 +0800
From:   Chao Yu <yuchao0@...wei.com>
To:     Jaegeuk Kim <jaegeuk@...nel.org>
CC:     <linux-kernel@...r.kernel.org>,
        <linux-f2fs-devel@...ts.sourceforge.net>
Subject: Re: [f2fs-dev] [PATCH v2] f2fs: fix to read source block before
 invalidating it

On 2019/7/29 13:54, Jaegeuk Kim wrote:
> On 07/23, Chao Yu wrote:
>> On 2019/7/23 9:27, Jaegeuk Kim wrote:
>>> On 07/18, Chao Yu wrote:
>>>> On 2019/7/18 12:00, Jaegeuk Kim wrote:
>>>>> On 07/18, Chao Yu wrote:
>>>>>> On 2019/7/18 11:12, Jaegeuk Kim wrote:
>>>>>>> f2fs_allocate_data_block() invalidates old block address and enable new block
>>>>>>> address. Then, if we try to read old block by f2fs_submit_page_bio(), it will
>>>>>>> give WARN due to reading invalid blocks.
>>>>>>>
>>>>>>> Let's make the order sanely back.
>>>>>>
>>>>>> Hmm.. to avoid WARM, we may suffer one more memcpy, I suspect this can reduce
>>>>>> online resize or foreground gc ioctl performance...
>>>>>
>>>>> I worried about performance tho, more concern came to me that there may exist a
>>>>> chance that other thread can allocate and write something in old block address.
>>>>
>>>> Me too, however, previous invalid block address should be reused after a
>>>> checkpoint, and checkpoint should have invalidated meta cache already, so there
>>>> shouldn't be any race here.
>>>
>>> I think SSR can reuse that before checkpoint.
>>
>> Yes, I should have considered that when I introduced readahead feature for
>> migration of block, we've kept invalidating meta page cache in old block address
>> whenever the block address is not valid.
>>
>> quoted from ("f2fs: readahead encrypted block during GC")
>>
>> "Note that for OPU, truncation, deletion, we need to invalid meta
>> page after we invalid old block address, to make sure we won't load
>> invalid data from target meta page during encrypted block migration."
>>
>> But to avoid potential issue, how about just enable meta page cache during GC?
>> that is saying we should truncate all valid meta cache after one section has
>> been moved.
> 
> We may need to invalidate metapage when writing new data, which looks like
> different issue.

Yeah, we have already did this as below:

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 7dcfe38..30779aa 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2079,6 +2079,8 @@ void f2fs_invalidate_blocks(struct f2fs_sb_info *sbi,
block_t addr)
 	if (addr == NEW_ADDR)
 		return;

+	invalidate_mapping_pages(META_MAPPING(sbi), addr, addr);
+
 	/* add it into sit main buffer */
 	down_write(&sit_i->sentry_lock);

@@ -2978,6 +2980,9 @@ static void do_write_page(struct f2fs_summary *sum, struct
f2fs_io_info *fio)
 reallocate:
 	f2fs_allocate_data_block(fio->sbi, fio->page, fio->old_blkaddr,
 			&fio->new_blkaddr, sum, type, fio, true);
+	if (GET_SEGNO(fio->sbi, fio->old_blkaddr) != NULL_SEGNO)
+		invalidate_mapping_pages(META_MAPPING(fio->sbi),
+					fio->old_blkaddr, fio->old_blkaddr);

 	/* writeout dirty page into bdev */
 	f2fs_submit_page_write(fio);
@@ -3132,8 +3137,11 @@ void f2fs_do_replace_block(struct f2fs_sb_info *sbi,
struct f2fs_summary *sum,

 	if (!recover_curseg || recover_newaddr)
 		update_sit_entry(sbi, new_blkaddr, 1);
-	if (GET_SEGNO(sbi, old_blkaddr) != NULL_SEGNO)
+	if (GET_SEGNO(sbi, old_blkaddr) != NULL_SEGNO) {
+		invalidate_mapping_pages(META_MAPPING(sbi),
+					old_blkaddr, old_blkaddr);
 		update_sit_entry(sbi, old_blkaddr, -1);
+	}

 	locate_dirty_segment(sbi, GET_SEGNO(sbi, old_blkaddr));
 	locate_dirty_segment(sbi, GET_SEGNO(sbi, new_blkaddr));

> 
>>
>> One more concern is whether below case exists during SSR?
>> - write 4k to fileA;
>> - fsync fileA, 4k data is writebacked to lbaA;
>> - write 4k to fileA;
>> - kworker flushs 4k to lbaB; dnode contain lbaB didn't be persisted yet;
>> - write 4k to fileB;
>> - kworker flush 4k to lbaA due to SSR;
>> - SPOR  -> dnode with lbaA will be recovered, however lbaA contains fileB's data..
> 
> Yes, it seems that's possible. We may need to keep another bitmap to record
> all the block allocation?

Yup, let me think about the implementation. :)

Thanks,

> 
>>
>> Thanks,
>>
>>>
>>>>
>>>> 	/*
>>>> 	 * invalidate intermediate page cache borrowed from meta inode
>>>> 	 * which are used for migration of encrypted inode's blocks.
>>>> 	 */
>>>> 	if (f2fs_sb_has_encrypt(sbi))
>>>> 		invalidate_mapping_pages(META_MAPPING(sbi),
>>>> 				MAIN_BLKADDR(sbi), MAX_BLKADDR(sbi) - 1);
>>>>
>>>> Thanks,
>>>>
>>>>>
>>>>>>
>>>>>> Can we just relief to use DATA_GENERIC_ENHANCE_READ for this case...?
>>>>>
>>>>> We need to keep consistency for this api.
>>>>>
>>>>> Thanks,
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Signed-off-by: Jaegeuk Kim <jaegeuk@...nel.org>
>>>>>>
>>>>>> Except performance, I'm okay with this change.
>>>>>>
>>>>>> Reviewed-by: Chao Yu <yuchao0@...wei.com>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>> ---
>>>>>>> v2:
>>>>>>> I was fixing the comments. :)
>>>>>>>
>>>>>>>  fs/f2fs/gc.c | 70 +++++++++++++++++++++++++---------------------------
>>>>>>>  1 file changed, 34 insertions(+), 36 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
>>>>>>> index 6691f526fa40..8974672db78f 100644
>>>>>>> --- a/fs/f2fs/gc.c
>>>>>>> +++ b/fs/f2fs/gc.c
>>>>>>> @@ -796,6 +796,29 @@ static int move_data_block(struct inode *inode, block_t bidx,
>>>>>>>  	if (lfs_mode)
>>>>>>>  		down_write(&fio.sbi->io_order_lock);
>>>>>>>  
>>>>>>> +	mpage = f2fs_grab_cache_page(META_MAPPING(fio.sbi),
>>>>>>> +					fio.old_blkaddr, false);
>>>>>>> +	if (!mpage)
>>>>>>> +		goto up_out;
>>>>>>> +
>>>>>>> +	fio.encrypted_page = mpage;
>>>>>>> +
>>>>>>> +	/* read source block in mpage */
>>>>>>> +	if (!PageUptodate(mpage)) {
>>>>>>> +		err = f2fs_submit_page_bio(&fio);
>>>>>>> +		if (err) {
>>>>>>> +			f2fs_put_page(mpage, 1);
>>>>>>> +			goto up_out;
>>>>>>> +		}
>>>>>>> +		lock_page(mpage);
>>>>>>> +		if (unlikely(mpage->mapping != META_MAPPING(fio.sbi) ||
>>>>>>> +						!PageUptodate(mpage))) {
>>>>>>> +			err = -EIO;
>>>>>>> +			f2fs_put_page(mpage, 1);
>>>>>>> +			goto up_out;
>>>>>>> +		}
>>>>>>> +	}
>>>>>>> +
>>>>>>>  	f2fs_allocate_data_block(fio.sbi, NULL, fio.old_blkaddr, &newaddr,
>>>>>>>  					&sum, CURSEG_COLD_DATA, NULL, false);
>>>>>>>  
>>>>>>> @@ -803,44 +826,18 @@ static int move_data_block(struct inode *inode, block_t bidx,
>>>>>>>  				newaddr, FGP_LOCK | FGP_CREAT, GFP_NOFS);
>>>>>>>  	if (!fio.encrypted_page) {
>>>>>>>  		err = -ENOMEM;
>>>>>>> -		goto recover_block;
>>>>>>> -	}
>>>>>>> -
>>>>>>> -	mpage = f2fs_pagecache_get_page(META_MAPPING(fio.sbi),
>>>>>>> -					fio.old_blkaddr, FGP_LOCK, GFP_NOFS);
>>>>>>> -	if (mpage) {
>>>>>>> -		bool updated = false;
>>>>>>> -
>>>>>>> -		if (PageUptodate(mpage)) {
>>>>>>> -			memcpy(page_address(fio.encrypted_page),
>>>>>>> -					page_address(mpage), PAGE_SIZE);
>>>>>>> -			updated = true;
>>>>>>> -		}
>>>>>>>  		f2fs_put_page(mpage, 1);
>>>>>>> -		invalidate_mapping_pages(META_MAPPING(fio.sbi),
>>>>>>> -					fio.old_blkaddr, fio.old_blkaddr);
>>>>>>> -		if (updated)
>>>>>>> -			goto write_page;
>>>>>>> -	}
>>>>>>> -
>>>>>>> -	err = f2fs_submit_page_bio(&fio);
>>>>>>> -	if (err)
>>>>>>> -		goto put_page_out;
>>>>>>> -
>>>>>>> -	/* write page */
>>>>>>> -	lock_page(fio.encrypted_page);
>>>>>>> -
>>>>>>> -	if (unlikely(fio.encrypted_page->mapping != META_MAPPING(fio.sbi))) {
>>>>>>> -		err = -EIO;
>>>>>>> -		goto put_page_out;
>>>>>>> -	}
>>>>>>> -	if (unlikely(!PageUptodate(fio.encrypted_page))) {
>>>>>>> -		err = -EIO;
>>>>>>> -		goto put_page_out;
>>>>>>> +		goto recover_block;
>>>>>>>  	}
>>>>>>>  
>>>>>>> -write_page:
>>>>>>> +	/* write target block */
>>>>>>>  	f2fs_wait_on_page_writeback(fio.encrypted_page, DATA, true, true);
>>>>>>> +	memcpy(page_address(fio.encrypted_page),
>>>>>>> +				page_address(mpage), PAGE_SIZE);
>>>>>>> +	f2fs_put_page(mpage, 1);
>>>>>>> +	invalidate_mapping_pages(META_MAPPING(fio.sbi),
>>>>>>> +				fio.old_blkaddr, fio.old_blkaddr);
>>>>>>> +
>>>>>>>  	set_page_dirty(fio.encrypted_page);
>>>>>>>  	if (clear_page_dirty_for_io(fio.encrypted_page))
>>>>>>>  		dec_page_count(fio.sbi, F2FS_DIRTY_META);
>>>>>>> @@ -871,11 +868,12 @@ static int move_data_block(struct inode *inode, block_t bidx,
>>>>>>>  put_page_out:
>>>>>>>  	f2fs_put_page(fio.encrypted_page, 1);
>>>>>>>  recover_block:
>>>>>>> -	if (lfs_mode)
>>>>>>> -		up_write(&fio.sbi->io_order_lock);
>>>>>>>  	if (err)
>>>>>>>  		f2fs_do_replace_block(fio.sbi, &sum, newaddr, fio.old_blkaddr,
>>>>>>>  								true, true);
>>>>>>> +up_out:
>>>>>>> +	if (lfs_mode)
>>>>>>> +		up_write(&fio.sbi->io_order_lock);
>>>>>>>  put_out:
>>>>>>>  	f2fs_put_dnode(&dn);
>>>>>>>  out:
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Linux-f2fs-devel mailing list
>>>>>> Linux-f2fs-devel@...ts.sourceforge.net
>>>>>> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
>>>>> .
>>>>>
>>> .
>>>
> .
>