[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <021970ad-942a-4fe8-ac95-c8089527f7d2@alu.unizg.hr>
Date: Sat, 14 Oct 2023 04:21:50 +0200
From: Mirsad Goran Todorovac <mirsad.todorovac@....unizg.hr>
To: Yury Norov <yury.norov@...il.com>, Jan Kara <jack@...e.cz>
Cc: Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
Rasmus Villemoes <linux@...musvillemoes.dk>,
Matthew Wilcox <willy@...radead.org>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] lib/find: Make functions safe on changing bitmaps
On 10/14/2023 2:15 AM, Yury Norov wrote:
> Restore LKML
>
> On Thu, Oct 12, 2023 at 02:21:10PM +0200, Jan Kara wrote:
>> On Wed 11-10-23 11:26:29, Yury Norov wrote:
>>> Long story short: KCSAN found some potential issues related to how
>>> people use bitmap API. And instead of working through that issues,
>>> the following code shuts down KCSAN by applying READ_ONCE() here
>>> and there.
>>
>> I'm sorry but this is not what the patch does. I'm not sure how to get the
>> message across so maybe let me start from a different angle:
>>
>> Bitmaps are perfectly fine to be used without any external locking if
>> only atomic bit ops (set_bit, clear_bit, test_and_{set/clear}_bit) are
>> used. This is a significant performance gain compared to using a spinlock
>> or other locking and people do this for a long time. I hope we agree on
>> that.
>>
>> Now it is also common that you need to find a set / clear bit in a bitmap.
>> To maintain lockless protocol and deal with races people employ schemes
>> like (the dumbest form):
>>
>> do {
>> bit = find_first_bit(bitmap, n);
>> if (bit >= n)
>> abort...
>> } while (!test_and_clear_bit(bit, bitmap));
>>
>> So the code loops until it finds a set bit that is successfully cleared by
>> it. This is perfectly fine and safe lockless code and such use should be
>> supported. Agreed?
>
> Great example. When you're running non-atomic functions concurrently,
> the result may easily become incorrect, and this is what you're
> demonstrating here.
>
> Regarding find_first_bit() it means that:
> - it may erroneously return unset bit;
> - it may erroneously return non-first set bit;
> - it may erroneously return no bits for non-empty bitmap.
>
> Effectively it means that find_first bit may just return a random number.
>
> Let's take another example:
>
> do {
> bit = get_random_number();
> if (bit >= n)
> abort...
> } while (!test_and_clear_bit(bit, bitmap));
>
> When running concurrently, the difference between this and your code
> is only in probability of getting set bit somewhere from around the
> beginning of bitmap.
>
> The key point is that find_bit() may return undef even if READ_ONCE() is
> used. If bitmap gets changed anytime in the process, the result becomes
> invalid. It may happen even after returning from find_first_bit().
>
> And if my understanding correct, your code is designed in the
> assumption that find_first_bit() may return garbage, so handles it
> correctly.
>
>> *Except* that the above actually is not safe due to find_first_bit()
>> implementation and KCSAN warns about that. The problem is that:
>>
>> Assume *addr == 1
>> CPU1 CPU2
>> find_first_bit(addr, 64)
>> val = *addr;
>> if (val) -> true
>> clear_bit(0, addr)
>> val = *addr -> compiler decided to refetch addr contents for whatever
>> reason in the generated assembly
>> __ffs(val) -> now executed for value 0 which has undefined results.
>
> Yes, __ffs(0) is undef. But the whole function is undef when accessing
> bitmap concurrently.
>
>> And the READ_ONCE() this patch adds prevents the compiler from adding the
>> refetching of addr into the assembly.
>
> That's true. But it doesn't improve on the situation. It was an undef
> before, and it's undef after, but a 2% slower undef.
>
> Now on that KCSAN warning. If I understand things correctly, for the
> example above, KCSAN warning is false-positive, because you're
> intentionally running lockless.
>
> But for some other people it may be a true error, and now they'll have
> no chance to catch it if KCSAN is forced to ignore find_bit() entirely.
>
> We've got the whole class of lockless algorithms that allow safe concurrent
> access to the memory. And now that there's a tool that searches for them
> (concurrent accesses), we need to have an option to somehow teach it
> to suppress irrelevant warnings. Maybe something like this?
>
> lockless_algorithm_begin(bitmap, bitmap_size(nbits));
> do {
> bit = find_first_bit(bitmap, nbits);
> if (bit >= nbits)
> break;
> } while (!test_and_clear_bit(bit, bitmap));
> lockless_algorithm_end(bitmap, bitmap_size(nbits));
>
> And, of course, as I suggested a couple iterations ago, you can invent
> a thread-safe version of find_bit(), that would be perfectly correct
> for lockless use:
>
> unsigned long _find_and_clear_bit(volatile unsigned long *addr, unsigned long size)
> {
> unsigned long bit = 0;
>
> while (!test_and_clear_bit(bit, bitmap) {
> bit = FIND_FIRST_BIT(addr[idx], /* nop */, size);
> if (bit >= size)
> return size;
> }
>
> return bit;
> }
Hi, Yuri,
But the code above effectively does the same as the READ_ONCE() macro
as defined in rwonce.h:
#ifndef __READ_ONCE
#define __READ_ONCE(x) (*(const volatile __unqual_scalar_typeof(x) *)&(x))
#endif
#define READ_ONCE(x) \
({ \
compiletime_assert_rwonce_type(x); \
__READ_ONCE(x); \
})
Both uses only prevent the funny stuff the compiler might have done to the
read of the addr[idx], there's no black magic in READ_ONCE().
Both examples would probably result in the same assembly and produce the
same 2% slowdown ...
Only you declare volatile in one place, and READ_ONCE() in each read, but
this will only compile a bit slower and generate the same machine code.
Best regards,
Mirsad Todorovac
> Didn't test that, but I hope 'volatile' specifier should be enough
> for compiler to realize that it shouldn't optimize memory access, and
> for KCSAN that everything's OK here.
>
> By the way, thank you for respectful and professional communication.
>
> Thanks,
> Yury
--
Mirsad Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
tel. +385 (0)1 3711 451
mob. +385 91 57 88 355
Powered by blists - more mailing lists