linux-kernel - Re: [PATCH] f2fs: avoid race between zero

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d6e6a1c0-c56a-aac8-a210-945cce17f5b3@kernel.org>
Date:   Sun, 29 Jul 2018 11:03:06 +0800
From:   Chao Yu <chao@...nel.org>
To:     Jaegeuk Kim <jaegeuk@...nel.org>
Cc:     Chao Yu <yuchao0@...wei.com>,
        linux-f2fs-devel@...ts.sourceforge.net,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] f2fs: avoid race between zero_range and background GC

On 2018/7/29 10:59, Jaegeuk Kim wrote:
> On 07/29, Chao Yu wrote:
>> On 2018/7/29 10:02, Jaegeuk Kim wrote:
>>> On 07/27, Chao Yu wrote:
>>>> On 2018/7/27 18:29, Jaegeuk Kim wrote:
>>>>> On 07/26, Chao Yu wrote:
>>>>>> Thread A				Background GC
>>>>>> - f2fs_zero_range
>>>>>>  - truncate_pagecache_range
>>>>>> 					- gc_data_segment
>>>>>> 					 - get_read_data_page
>>>>>> 					  - move_data_page
>>>>>> 					   - set_page_dirty
>>>>>> 					   - set_cold_data
>>>>>>  - f2fs_do_zero_range
>>>>>>   - dn->data_blkaddr = NEW_ADDR;
>>>>>>   - f2fs_set_data_blkaddr
>>>>>>
>>>>>> Actually, we don't need to set dirty & checked flag on the page, since
>>>>>> all valid data in the page should be zeroed by zero_range().
>>>>>
>>>>> But, it doesn't matter too much, right?
>>>>
>>>> No, if the dirtied page is writebacked after f2fs_do_zero_range(), result of
>>>> zero_range() should be wrong, as zeroed page contains valid user data.
>>>
>>> How about truncating page caches after block address change or doing it twice
>>> before and after?
>>
>> Thread A				Background GC
>> - f2fs_zero_range
>>  - truncate_pagecache_range
>> 					- gc_data_segment
>> 					 - get_read_data_page
>> 					  - move_data_page
>> 					   - set_page_dirty
>> 					   - set_cold_data
>>  - f2fs_do_zero_range
>>   - dn->data_blkaddr = NEW_ADDR;
>>   - f2fs_set_data_blkaddr
>> 					bdi-flusher
>> 					- __write_data_page
>> 					 - f2fs_update_data_blkaddr
>> 					 : data_blkaddr has been updated here.
>>  - truncate_pagecache_range
>>  : data & dnode has been writebacked before page cache truncation?
>>
>> How about this case?
> 
> So, truncating pages under dnode lock can address it?

Normally, our lock dependency is

->writepage()
lock data page -> lock dnode page

here
lock dnode page -> truncate_pagecache_range::lock data page

Will easily cause deadlock?

Thanks,

> 
>>
>> Thanks,
>>
>>>
>>>>
>>>>>
>>>>>> Use i_gc_rwsem[WRITE] to avoid such race condition.
>>>>>
>>>>> Hope to avoid abusing i_gc_rwsem[] tho.
>>>>
>>>> Agreed, let's try avoiding until we have to use it.
>>>>
>>>> Thanks,
>>>>
>>>>>
>>>>>>
>>>>>> Signed-off-by: Chao Yu <yuchao0@...wei.com>
>>>>>> ---
>>>>>>  fs/f2fs/file.c | 2 ++
>>>>>>  1 file changed, 2 insertions(+)
>>>>>>
>>>>>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
>>>>>> index 267ec3794e1e..7bd2412a8c37 100644
>>>>>> --- a/fs/f2fs/file.c
>>>>>> +++ b/fs/f2fs/file.c
>>>>>> @@ -1309,6 +1309,7 @@ static int f2fs_zero_range(struct inode *inode, loff_t offset, loff_t len,
>>>>>>  	if (ret)
>>>>>>  		return ret;
>>>>>>  
>>>>>> +	down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
>>>>>>  	down_write(&F2FS_I(inode)->i_mmap_sem);
>>>>>>  	ret = filemap_write_and_wait_range(mapping, offset, offset + len - 1);
>>>>>>  	if (ret)
>>>>>> @@ -1389,6 +1390,7 @@ static int f2fs_zero_range(struct inode *inode, loff_t offset, loff_t len,
>>>>>>  	}
>>>>>>  out_sem:
>>>>>>  	up_write(&F2FS_I(inode)->i_mmap_sem);
>>>>>> +	up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
>>>>>>  
>>>>>>  	return ret;
>>>>>>  }
>>>>>> -- 
>>>>>> 2.18.0.rc1