[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aCPAKC7OeCIGtVMM@dread.disaster.area>
Date: Wed, 14 May 2025 07:56:56 +1000
From: Dave Chinner <david@...morbit.com>
To: cen zhang <zzzccc427@...il.com>
Cc: cem@...nel.org, linux-xfs@...r.kernel.org, linux-kernel@...r.kernel.org,
baijiaju1990@...il.com, zhenghaoran154@...il.com
Subject: Re: Subject: [BUG] Five data races in in XFS Filesystem,one
potentially harmful
On Tue, May 13, 2025 at 08:25:49PM +0800, cen zhang wrote:
> Hello maintainers,
>
> I would like to report five data race bugs we discovered in the XFS
> filesystem on Linux kernel v6.14-rc4. These issues were identified
> using our in-kernel data race detector.
>
> Among the five races, we believe that four may be benign and could be
> annotated using `data_race()` to suppress false positives from
> analysis tools. However, one races involve shared global state or
> critical memory, and their effects are unclear.
> We would appreciate your evaluation on whether those should be fixed
> or annotated.
>
> Below is a summary of the findings:
>
> ---
>
> Benign Races
> ============
>
> 1. Race in `xfs_bmapi_reserve_delalloc()` and `xfs_vn_getattr()`
> ----------------------------------------------------------------
>
> A data race on `ip->i_delayed_blks`.
Not a bug. xfs_vn_getattr() runs unlocked as per the Linux VFS
design. -Everything- that is accessed in xfs_vn_getattr() is a data
race.
> 2. Race on `xfs_trans_ail_update_bulk` in `xfs_inode_item_format`
> -------------------------------------.
>
> We observed unsynchronized access to `lip->li_lsn`, which may exhibit
> store/load tearing. However, we did not observe any symptoms
> indicating harmful behavior.
Not a bug. The lsn in the log_dinode is garbage and not used
during recovery - it's mainly there as potential debug information.
> 3. Race on `pag->pagf_freeblks`
> -------------------------------
>
> Although concurrent, this race is unlikely to affect correctness.
It's an optimisitic check done knowing that we don't hold locks and
it can race. The code is explicitly designed this way. Every other
pagf variable used in these algorithms is also racy.
> 4. Race on `pag->pagf_longest`
> ------------------------------
>
> Similar to the previous race, this field appears to be safely used
> under current access patterns.
Like this one.
> Possibly Harmful Race
> ======================
>
> 1. Race on `bp->b_addr` between `xfs_buf_map_pages()` and `xfs_buf_offset()`
> ----------------------------------------------------------------------------
>
> Concurrent access to bp->b_addr happens during buffer preparation and
> usage. Since this is pointer manipulation of page mappings, store/load
> tearing or unexpected reuse might lead to memory corruption or invalid
> log item formats. We are not confident in classifying this race as
> benign or harmful and would appreciate your guidance on whether it
> should be fixed or annotated.
Impossible. It should be obvious that an object undergoing
instantiation can't have a locked, referenced access from some other
object in some other code....
As I said in my reply to Christoph's patch - there are going to be
-hundreds- of these sorts false positives in XFS. Unless there is a
commitment to annotate every single false positive and address the
"data race by design" methods within the VFS architecture, there's
little point in playing an ongoing game of whack-a-mole with these.
Have the policy discussion and obtain a commit to fixing the many,
many false positives that this verification bot will report before
dumping many, many false positive reports on the list.
-Dave.
--
Dave Chinner
david@...morbit.com
Powered by blists - more mailing lists