lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aCUlXbEg9wuyPEB6@dread.disaster.area>
Date: Thu, 15 May 2025 09:21:01 +1000
From: Dave Chinner <david@...morbit.com>
To: Christoph Hellwig <hch@....de>
Cc: Carlos Maiolino <cem@...nel.org>, linux-xfs@...r.kernel.org,
	cen zhang <zzzccc427@...il.com>, lkmm@...ts.linux.dev,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] xfs: mark the i_delayed_blks access in xfs_file_release
 as racy

On Wed, May 14, 2025 at 03:04:17PM +0200, Christoph Hellwig wrote:
> On Wed, May 14, 2025 at 10:00:28AM +0200, Carlos Maiolino wrote:
> > I agree with you here, and we could slowly start marking those shared accesses
> > as racy, but bots spitting false-positivies all the time doesn't help much,
> > other than taking somebody's else time to look into the report.
> > 
> > Taking as example one case in the previous report, where the report complained
> > about concurrent bp->b_addr access during the buffer instantiation.
> 
> I'd like to understand that one a bit more.  It might be because the
> validator doesn't understand a semaphore used as lock is a lock, but
> I'll follow up there.

Even so, it's not a race because at that point in time the object
cannot be seen by anything other than the process that is
initialising it.

We initialise objects without holding the object locked all over the
place - if this is something that the sanitiser cannot determine
automatically, then we're also going to need annotations to indicate
that the initialisation function(s) contain valid data races.


> 
> > So, I think Dave has a point too. Like what happens with syzkaller
> > and random people reporting random syzkaller warnings.
> > 
> > While I appreciate the reports too, I think it would be fair for the reporters
> > to spend some time to at least craft a RFC patch fixing the warning.
> 
> Well, it was polite mails about their finding, which I find useful.
> If we got a huge amount of spam that might be different.

That's kinda of my point about it being the "thin edge of the
wedge".

Once we indicate that this is something we want reported so we can
address, then the floodgates open.

I'm wary of this, because at this point I suspect that there aren't
a lot of people with sufficient time and knowledge to adequately
address these issues.

Asking the reporter to "send a patch to data_race() that instance"
isn't a good solution to the problem because it doesn't address the
wider issue indicated by the specific report. It just kicks the can
down the road and introduces technical debt that we will eventually
have to address.

We should have learnt this lesson from lockdep - false positive
whack-a-mole shut up individual reports but introduced technical
debt that had to be addressed later because whack-a-mole didn't
address the underlying issues. When the stack of cards fell, someone
(i.e. me) had to work out how to do lockdep annotations properly
(e.g. via the complex inode locking subclass stuff) to make the
issues go away and require minimal long term maintenance.

I don't want to see this pattern repeated.

We need a sane policy for addressing the entire classes of issuesi
underlying individual reports, not just apply a band-aid that
silences the indivual report.

So, let's look at what kcsan provides us with.

We need functions like xfs_vn_getattr(), the allocation AG selection
alogrithms and object initialisation functions to be marked as
inherently racy, because they intentionally don't hold locks for any
of the accesses they make. kcsan provides:

/* comment on why __no_kcsan is needed */
__no_kcsan void
xfs_foo(
	struct xfs_bar	*bar)
{
...

the __no_kcsan attribute for function declarations to mark the
entire function as inherently racy and so should be ignored.

For variables like ip->i_delayed_blks, where we intentionally
serialise modifications but do not serialise readers, we have:

-	uint64_t                i_delayed_blks; /* count of delay alloc blks */
+	uint64_t __data_racy    i_delayed_blks; /* count of delay alloc blks */

This means all accesses to the variable are considered to be racy
and kcsan will ignore it and not report it. We can do the same for
lip->li_lsn and other variables.

IOWs, we do not need to spew data_race() wrappers or random
READ_ONCE/WRITE_ONCE macros all over the code to silence known false
positives.  If we mark the variables and functions we know have racy
accesses, these one-line changes will avoid -all- false positives on
those variables/from those functions.

This, IMO, is a much better solution than "send a patch marking that
access as data_race()". We fix the issue for all accesses and entire
algorithms with simple changes to variable and/or function
declarations.

To reiterate my point: if we are goign to make XFS KCSAN friendly,
we need to work out how to do it quickly and efficiently before we
start changing code. Engaging in random whack-a-mole shootdown games
will not lead to a viable long term solution, so let's not waste
time on playing whack-a-mole.

-Dave.
-- 
Dave Chinner
david@...morbit.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ