linux-kernel - Re: [PATCH] xfs: mark the i_delayed_blks access in xfs_file

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ymjsjb7ich2s5f7tmhslhlnymjmso5o2lsvdoudy3dtbr7vjwk@moxzvvjdh6zl>
Date: Wed, 14 May 2025 10:00:28 +0200
From: Carlos Maiolino <cem@...nel.org>
To: Christoph Hellwig <hch@....de>
Cc: Dave Chinner <david@...morbit.com>, linux-xfs@...r.kernel.org, 
	cen zhang <zzzccc427@...il.com>, lkmm@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] xfs: mark the i_delayed_blks access in xfs_file_release
 as racy

Hi.

On Wed, May 14, 2025 at 06:29:46AM +0200, Christoph Hellwig wrote:
> On Wed, May 14, 2025 at 07:37:14AM +1000, Dave Chinner wrote:
> > On Tue, May 13, 2025 at 07:26:14AM +0200, Christoph Hellwig wrote:
> > > We don't bother with the ILOCK as this is best-effort and thus a racy
> > > access is ok.  Add a data_race() annotation to make that clear to
> > > memory model verifiers.
> >
> > IMO, that's the thin edge of a wedge. There are dozens of places in
> > XFS where we check variable values without holding the lock needed
> > to serialise the read against modification.
> 
> Yes. And the linux kernel memory consistency model ask us to mark them,
> see tools/memory-model/Documentation/access-marking.txt.
> 
> This fails painful at first, but I'd actually wish we'd have tools
> enforcing this as strongly as possible as developers (well me at least)
> seem to think a racy access is just fine more often than they should, and
> needing an annotation and a comment is a pretty good way to sure that.
> 
> > Hence my question - are we now going to make it policy that every
> > possible racy access must be marked with data_race() because there
> > is some new bot that someone is running that will complain if we
> > don't? Are you committing to playing whack-a-mole with the memory
> > model verifiers to silence all the false positives from these
> > known-to-be-safe access patterns?
> 
> It's not really a "new bot".  It has been official memory consistency
> policy for a while, but it just hasn't been well enforced.  For new code
> asking if the review is racy and needs a marking or use READ_ONCE() and
> WRITE_ONCE() has been part of the usual review protocol.  Reviewing old
> code and fixing things we got wrong will take a while, but I'm actually
> glad about more bots for that.
> 

I agree with you here, and we could slowly start marking those shared accesses
as racy, but bots spitting false-positivies all the time doesn't help much,
other than taking somebody's else time to look into the report.

Taking as example one case in the previous report, where the report complained
about concurrent bp->b_addr access during the buffer instantiation.

This seems to be covered by the access-marking.txt doc:

"
...
Churning the code base with ill-considered additions of data_race(),
READ_ONCE(), and WRITE_ONCE() is unhelpful.
...
"

Then:

"
...
Use of Plain C-Language Accesses

2.	Initialization-time and cleanup-time accesses.	This covers
	a  wide variety of situations, including the uniprocessor
	phase of system boot, variables to be used by not-yet-spawned
	kthreads, structures not yet published to reference-counted
	or RCU-protected data structures, and the cleanup side of any
	of these situations.
...
"

So, I think Dave has a point too. Like what happens with syzkaller
and random people reporting random syzkaller warnings.

While I appreciate the reports too, I think it would be fair for the reporters
to spend some time to at least craft a RFC patch fixing the warning.

In this case for example, the reporter explicitly mentioned data_race()
annotations, so, why not just send a patch attempting fix the warning instead of
asking somebody else to do that? IMHO, somebody fidgeting with KCSAN has enough
skills to craft such patch. Of course I might be wrong here, but tweaking debug
knobs over the kernel and asking somebody else to look at the results doesn't
seem scalable.

Cheers.