[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aO7khoBHdfPlEBAE@dread.disaster.area>
Date: Wed, 15 Oct 2025 11:02:14 +1100
From: Dave Chinner <david@...morbit.com>
To: Mateusz Guzik <mjguzik@...il.com>
Cc: Jan Kara <jack@...e.cz>, brauner@...nel.org, viro@...iv.linux.org.uk,
linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
josef@...icpanda.com, kernel-team@...com, amir73il@...il.com,
linux-btrfs@...r.kernel.org, linux-ext4@...r.kernel.org,
linux-xfs@...r.kernel.org, ceph-devel@...r.kernel.org,
linux-unionfs@...r.kernel.org
Subject: Re: [PATCH v7 13/14] xfs: use the new ->i_state accessors
On Fri, Oct 10, 2025 at 05:40:49PM +0200, Mateusz Guzik wrote:
> On Fri, Oct 10, 2025 at 4:41 PM Jan Kara <jack@...e.cz> wrote:
> >
> > On Thu 09-10-25 09:59:27, Mateusz Guzik wrote:
> > > Change generated with coccinelle and fixed up by hand as appropriate.
> > >
> > > Signed-off-by: Mateusz Guzik <mjguzik@...il.com>
> >
> > ...
> >
> > > @@ -2111,7 +2111,7 @@ xfs_rename_alloc_whiteout(
> > > */
> > > xfs_setup_iops(tmpfile);
> > > xfs_finish_inode_setup(tmpfile);
> > > - VFS_I(tmpfile)->i_state |= I_LINKABLE;
> > > + inode_state_set_raw(VFS_I(tmpfile), I_LINKABLE);
> > >
> > > *wip = tmpfile;
> > > return 0;
> > > @@ -2330,7 +2330,7 @@ xfs_rename(
> > > * flag from the inode so it doesn't accidentally get misused in
> > > * future.
> > > */
> > > - VFS_I(du_wip.ip)->i_state &= ~I_LINKABLE;
> > > + inode_state_clear_raw(VFS_I(du_wip.ip), I_LINKABLE);
> > > }
> > >
> > > out_commit:
> >
> > These two accesses look fishy (not your fault but when we are doing this
> > i_state exercise better make sure all the places are correct before
> > papering over bugs with _raw function variant). How come they cannot race
> > with other i_state modifications and thus corrupt i_state?
> >
>
> I asked about this here:
> https://lore.kernel.org/linux-xfs/CAGudoHEi05JGkTQ9PbM20D98S9fv0hTqpWRd5fWjEwkExSiVSw@mail.gmail.com/
Yes, as I said, we can add locking here if necessary, but locking
isn't necessary at this point in time because nothing else can
change the state of the newly allocated whiteout inode until we
unlock it.
Keep in mind the reason why we need I_LINKABLE here - it's not
needed for correctness - it's needed to avoid a warning embedded
in inc_nlink() because filesystems aren't trusted to implement
link counts correctly anymore.
Now we're being told that "it is too dangerous to let filesystems
manage inode state themselves" and so we have to add extra overhead
to code that we were forced to add to avoid VFS warnings added
because the VFS doesn't trust filesystems to maintain some other
important inode state....
So, if you want to get rid of XFS using I_LINKABLE here, please fix
the nlink VFS api to allow us to call inc_nlink_<something>() on a
zero link inode without I_LINKABLE needing to be set. We do actually
know what we are doing here, and as such needing I_LINKABLE here is
nothing but a hacky workaround for inflexible, trustless VFS APIs...
> > > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> > > index caff0125faea..ad94fbf55014 100644
> > > --- a/fs/xfs/xfs_iops.c
> > > +++ b/fs/xfs/xfs_iops.c
> > > @@ -1420,7 +1420,7 @@ xfs_setup_inode(
> > > bool is_meta = xfs_is_internal_inode(ip);
> > >
> > > inode->i_ino = ip->i_ino;
> > > - inode->i_state |= I_NEW;
> > > + inode_state_set_raw(inode, I_NEW);
"set" is wrong and will introduce a regression. This must be an
"add" operation as inode->i_state may have already been modified
by the time we get here. From 2021:
commit f38a032b165d812b0ba8378a5cd237c0888ff65f
Author: Dave Chinner <dchinner@...hat.com>
Date: Tue Aug 24 19:13:04 2021 -0700
xfs: fix I_DONTCACHE
Yup, the VFS hoist broke it, and nobody noticed. Bulkstat workloads
make it clear that it doesn't work as it should.
Fixes: dae2f8ed7992 ("fs: Lift XFS_IDONTCACHE to the VFS layer")
Signed-off-by: Dave Chinner <dchinner@...hat.com>
Reviewed-by: Darrick J. Wong <djwong@...nel.org>
Signed-off-by: Darrick J. Wong <djwong@...nel.org>
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index a3fe4c5307d3..f2210d927481 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -84,8 +84,9 @@ xfs_inode_alloc(
return NULL;
}
- /* VFS doesn't initialise i_mode! */
+ /* VFS doesn't initialise i_mode or i_state! */
VFS_I(ip)->i_mode = 0;
+ VFS_I(ip)->i_state = 0;
XFS_STATS_INC(mp, vn_active);
ASSERT(atomic_read(&ip->i_pincount) == 0);
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 0ff0cca94092..a607d6aca5c4 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1344,7 +1344,7 @@ xfs_setup_inode(
gfp_t gfp_mask;
inode->i_ino = ip->i_ino;
- inode->i_state = I_NEW;
+ inode->i_state |= I_NEW;
inode_sb_list_add(inode);
/* make the inode look hashed for the writeback code */
-Dave.
--
Dave Chinner
david@...morbit.com
Powered by blists - more mailing lists