lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2c7f2acd-ef37-4504-a0e3-f74b66c455ec@alu.unizg.hr>
Date:   Fri, 6 Oct 2023 16:39:54 +0200
From:   Mirsad Todorovac <mirsad.todorovac@....hr>
To:     Matthew Wilcox <willy@...radead.org>,
        Yury Norov <yury.norov@...il.com>
Cc:     Mirsad Todorovac <mirsad.todorovac@....unizg.hr>,
        Jan Kara <jack@...e.cz>, Philipp Stanner <pstanner@...hat.com>,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        Chris Mason <clm@...com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Josef Bacik <josef@...icpanda.com>,
        David Sterba <dsterba@...e.com>, linux-btrfs@...r.kernel.org,
        linux-mm@...ck.org
Subject: Re: [PATCH v1 1/1] xarray: fix the data-race in xas_find_chunk() by
 using READ_ONCE()

On 9/19/2023 6:20 AM, Matthew Wilcox wrote:
> On Mon, Sep 18, 2023 at 11:56:36AM -0700, Yury Norov wrote:
>> Guys, I lost the track of the conversation. In the other email Mirsad
>> said:
>>          Which was the basic reason in the first place for all this, because something changed
>>          data from underneath our fingers ..
>>
>> It sounds clearly to me that this is a bug in xarray, *revealed* by
>> find_next_bit() function. But later in discussion you're trying to 'fix'
>> find_*_bit(), like if find_bit() corrupted the bitmap, but it's not.
> 
> No, you're really confused.  That happens.
> 
> KCSAN is looking for concurrency bugs.  That is, does another thread
> mutate the data "while" we're reading it.  It does that by reading
> the data, delaying for a few instructions and reading it again.  If it
> changed, clearly there's a race.  That does not mean there's a bug!
> 
> Some races are innocuous.  Many races are innocuous!  The problem is
> that compilers sometimes get overly clever and don't do the obvious
> thing you ask them to do.  READ_ONCE() serves two functions here;
> one is that it tells the compiler not to try anything fancy, and
> the other is that it tells KCSAN to not bother instrumenting this
> load; no load-delay-reload.
> 
>> In previous email Jan said:
>>          for any sane compiler the generated assembly with & without READ_ONCE()
>>          will be exactly the same.
>>
>> If the code generated with and without READ_ONCE() is the same, the
>> behavior would be the same, right? If you see the difference, the code
>> should differ.
> 
> Hopefully now you understand why this argument is wrong ...
> 
>> You say that READ_ONCE() in find_bit() 'fixes' 200 KCSAN BUG warnings. To
>> me it sounds like hiding the problems instead of fixing. If there's a race
>> between writing and reading bitmaps, it should be fixed properly by
>> adding an appropriate serialization mechanism. Shutting Kcsan up with
>> READ_ONCE() here and there is exactly the opposite path to the right direction.
> 
> Counterpoint: generally bitmaps are modified with set_bit() which
> actually is atomic.  We define so many bitmap things as being atomic
> already, it doesn't feel like making find_bit() "must be protected"
> as a useful use of time.
> 
> But hey, maybe I'm wrong.  Mirsad, can you send Yury the bug reports
> for find_bit and friends, and Yury can take the time to dig through them
> and see if there are any real races in that mess?
> 
>> Every READ_ONCE must be paired with WRITE_ONCE, just like atomic
>> reads/writes or spin locks/unlocks. Having that in mind, adding
>> READ_ONCE() in find_bit() requires adding it to every bitmap function
>> out there. And this is, as I said before, would be an overhead for
>> most users.
> 
> I don't believe you.  Telling the compiler to stop trying to be clever
> rarely results in a performance loss.

Hi Mr. Wilcox,

Do you think we should submit a formal patch for this data-race?

Thank you.

Best regards,
Mirsad Todorovac

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ