linux-kernel - Re: [PATCH 1/4] fs: btrfs: fix a data race in btrfs_block_group

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <d991d28c-986a-0efc-e10e-62fe516dd7c6@gmail.com>
Date:   Sat, 9 May 2020 19:20:58 +0800
From:   Jia-Ju Bai <baijiaju1990@...il.com>
To:     Nikolay Borisov <nborisov@...e.com>, clm@...com,
        josef@...icpanda.com, dsterba@...e.com
Cc:     linux-btrfs@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/4] fs: btrfs: fix a data race in
 btrfs_block_group_done()



On 2020/5/9 18:53, Nikolay Borisov wrote:
>
> On 9.05.20 г. 8:20 ч., Jia-Ju Bai wrote:
>> The functions btrfs_block_group_done() and caching_thread() are
>> concurrently executed at runtime in the following call contexts:
>>
>> Thread 1:
>>    btrfs_sync_file()
>>      start_ordered_ops()
>>        btrfs_fdatawrite_range()
>>          btrfs_writepages() [via function pointer]
>>            extent_writepages()
>>              extent_write_cache_pages()
>>                __extent_writepage()
>>                  writepage_delalloc()
>>                    btrfs_run_delalloc_range()
>>                      cow_file_range()
>>                        btrfs_reserve_extent()
>>                          find_free_extent()
>>                            btrfs_block_group_done()
>>
>> Thread 2:
>>    caching_thread()
>>
>> In btrfs_block_group_done():
>>    smp_mb();
>>    return cache->cached == BTRFS_CACHE_FINISHED ||
>>        cache->cached == BTRFS_CACHE_ERROR;
>>
>> In caching_thread():
>>    spin_lock(&block_group->lock);
>>    block_group->caching_ctl = NULL;
>>    block_group->cached = ret ? BTRFS_CACHE_ERROR : BTRFS_CACHE_FINISHED;
>>    spin_unlock(&block_group->lock);
>>
>> The values cache->cached and block_group->cached access the same memory,
>> and thus a data race can occur.
>> This data race was found and actually reproduced by our concurrency
>> fuzzer.
>>
>> To fix this race, the spinlock cache->lock is used to protect the
>> access to cache->cached in btrfs_block_group_done().
>>
>> Signed-off-by: Jia-Ju Bai <baijiaju1990@...il.com>
>> ---
>>   fs/btrfs/block-group.h | 8 ++++++--
>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
>> index 107bb557ca8d..fb5f12acea40 100644
>> --- a/fs/btrfs/block-group.h
>> +++ b/fs/btrfs/block-group.h
>> @@ -278,9 +278,13 @@ static inline u64 btrfs_system_alloc_profile(struct btrfs_fs_info *fs_info)
>>   
>>   static inline int btrfs_block_group_done(struct btrfs_block_group *cache)
>>   {
>> +	int flag;
>>   	smp_mb();
>> -	return cache->cached == BTRFS_CACHE_FINISHED ||
>> -		cache->cached == BTRFS_CACHE_ERROR;
>> +	spin_lock(&cache->lock);
>> +	flag = (cache->cached == BTRFS_CACHE_FINISHED ||
>> +		cache->cached == BTRFS_CACHE_ERROR);
>> +	spin_unlock(&cache->lock);
>> +	return flag;
> Using the lock also obviates the need for the memory barrier.
> Furthermore this race is benign because even if it occurs and we call
> into btrfs_cache_block_group we do check cache->cached under
> btrfs_block_group::lock and do the right thing, though we would have
> done a bit more unnecessary wor if that's the case. So have you actually
> measured the effect of adding the the spinlock? This is likely done so
> as to make the fastpath lock-free. Perhaps a better approach would be to
> annotate the access of cached with READ/WRITE once so that it's fetched
> from memory and not optimized out by the compiler and leave the more
> heavy work in the unlikely path.
>
> Please exercise some critical thinking when looking into such issues as
> there might be a good reason why something has been coded in a
> particular way and it might look wrong on a first look but in fact is not.

Okay, thanks a lot for your explanation and advice :)


Best wishes,
Jia-Ju Bai