[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <06d64482-a263-668d-7ab1-9f411eb1c794@gmail.com>
Date: Wed, 13 May 2020 10:17:10 +0800
From: Jia-Ju Bai <baijiaju1990@...il.com>
To: dsterba@...e.cz, clm@...com, josef@...icpanda.com,
dsterba@...e.com, linux-btrfs@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/4] fs: btrfs: fix data races in
extent_write_cache_pages()
On 2020/5/13 5:56, David Sterba wrote:
> On Sat, May 09, 2020 at 01:27:01PM +0800, Jia-Ju Bai wrote:
>> The function extent_write_cache_pages is concurrently executed with
>> itself at runtime in the following call contexts:
>>
>> Thread 1:
>> btrfs_sync_file()
>> start_ordered_ops()
>> btrfs_fdatawrite_range()
>> btrfs_writepages() [via function pointer]
>> extent_writepages()
>> extent_write_cache_pages()
>>
>> Thread 2:
>> btrfs_writepages()
>> extent_writepages()
>> extent_write_cache_pages()
>>
>> In extent_write_cache_pages():
>> index = mapping->writeback_index;
>> ...
>> mapping->writeback_index = done_index;
>>
>> The accesses to mapping->writeback_index are not synchronized, and thus
>> data races for this value can occur.
>> These data races were found and actually reproduced by our concurrency
>> fuzzer.
>>
>> To fix these races, the spinlock mapping->private_lock is used to
>> protect the accesses to mapping->writeback_index.
>>
>> Signed-off-by: Jia-Ju Bai <baijiaju1990@...il.com>
>> ---
>> fs/btrfs/extent_io.c | 7 ++++++-
>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index 39e45b8a5031..8c33a60bde1d 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -4160,7 +4160,9 @@ static int extent_write_cache_pages(struct address_space *mapping,
>>
>> pagevec_init(&pvec);
>> if (wbc->range_cyclic) {
>> + spin_lock(&mapping->private_lock);
>> index = mapping->writeback_index; /* Start from prev offset */
>> + spin_unlock(&mapping->private_lock);
>> end = -1;
>> /*
>> * Start from the beginning does not need to cycle over the
>> @@ -4271,8 +4273,11 @@ static int extent_write_cache_pages(struct address_space *mapping,
>> goto retry;
>> }
>>
>> - if (wbc->range_cyclic || (wbc->nr_to_write > 0 && range_whole))
>> + if (wbc->range_cyclic || (wbc->nr_to_write > 0 && range_whole)) {
>> + spin_lock(&mapping->private_lock);
>> mapping->writeback_index = done_index;
>> + spin_unlock(&mapping->private_lock);
> I'm more and more curious what exactly is your fuzzer tool actualy
> reporting. Because adding the locks around the writeback index does not
> make any sense.
>
> The variable is of type unsigned long, this is written atomically so the
> only theoretical problem is on an achritecture that is not capable of
> storing that in one go, which means a lot more problems eg. because
> pointers are assumed to be the same width as unsigned long.
>
> So torn write is not possible and the lock leads to the same result as
> if it wasn't there and the read and write would happen not serialized by
> the spinlock but somewhere on the way from CPU caches to memory.
>
> CPU1 CPU2
>
> lock
> index = mapping->writeback_index
> unlock
> lock
> m->writeback_index = index;
> unlock
>
> Is the same as
>
> CPU1 CPU2
>
>
> index = mapping->writeback_index
> m->writeback_index = index;
>
> So maybe this makes your tool happy but there's no change from the
> correctness point of view, only added overhead from the lock/unlock
> calls.
>
> Lockless synchronization is a thing, using memory barriers etc., this
> was the case of some other patch, I think your tool needs to take that
> into account to give sensible results.
Thanks for the reply and explanation :)
I agree that only adding locks here makes no sense, because "index =
mapping->writeback_index" can be still executed before or after
"m->writeback_index = index" is executed.
So what is the expected order of the two statements? Read after write or
write after read?
Best wishes,
Jia-Ju Bai
Powered by blists - more mailing lists