[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGudoHHY2ZpSjYda94FZos8jRsaqZ_XcR7ZDDuY0AgvbnvehyQ@mail.gmail.com>
Date: Wed, 15 Oct 2025 04:10:10 +0200
From: Mateusz Guzik <mjguzik@...il.com>
To: Dave Chinner <david@...morbit.com>
Cc: Jan Kara <jack@...e.cz>, brauner@...nel.org, viro@...iv.linux.org.uk,
linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
josef@...icpanda.com, kernel-team@...com, amir73il@...il.com,
linux-btrfs@...r.kernel.org, linux-ext4@...r.kernel.org,
linux-xfs@...r.kernel.org, ceph-devel@...r.kernel.org,
linux-unionfs@...r.kernel.org
Subject: Re: [PATCH v7 13/14] xfs: use the new ->i_state accessors
On Wed, Oct 15, 2025 at 2:02 AM Dave Chinner <david@...morbit.com> wrote:
>
> On Fri, Oct 10, 2025 at 05:40:49PM +0200, Mateusz Guzik wrote:
> > On Fri, Oct 10, 2025 at 4:41 PM Jan Kara <jack@...e.cz> wrote:
> > >
> > > On Thu 09-10-25 09:59:27, Mateusz Guzik wrote:
> > > > Change generated with coccinelle and fixed up by hand as appropriate.
> > > >
> > > > Signed-off-by: Mateusz Guzik <mjguzik@...il.com>
> > >
> > > ...
> > >
> > > > @@ -2111,7 +2111,7 @@ xfs_rename_alloc_whiteout(
> > > > */
> > > > xfs_setup_iops(tmpfile);
> > > > xfs_finish_inode_setup(tmpfile);
> > > > - VFS_I(tmpfile)->i_state |= I_LINKABLE;
> > > > + inode_state_set_raw(VFS_I(tmpfile), I_LINKABLE);
> > > >
> > > > *wip = tmpfile;
> > > > return 0;
> > > > @@ -2330,7 +2330,7 @@ xfs_rename(
> > > > * flag from the inode so it doesn't accidentally get misused in
> > > > * future.
> > > > */
> > > > - VFS_I(du_wip.ip)->i_state &= ~I_LINKABLE;
> > > > + inode_state_clear_raw(VFS_I(du_wip.ip), I_LINKABLE);
> > > > }
> > > >
> > > > out_commit:
> > >
> > > These two accesses look fishy (not your fault but when we are doing this
> > > i_state exercise better make sure all the places are correct before
> > > papering over bugs with _raw function variant). How come they cannot race
> > > with other i_state modifications and thus corrupt i_state?
> > >
> >
> > I asked about this here:
> > https://lore.kernel.org/linux-xfs/CAGudoHEi05JGkTQ9PbM20D98S9fv0hTqpWRd5fWjEwkExSiVSw@mail.gmail.com/
>
> Yes, as I said, we can add locking here if necessary, but locking
> isn't necessary at this point in time because nothing else can
> change the state of the newly allocated whiteout inode until we
> unlock it.
>
I don't have much of an opinion about this bit. Not as per my response
I added routines to facilitate not taking the lock (for the time being
anyway).
> Keep in mind the reason why we need I_LINKABLE here - it's not
> needed for correctness - it's needed to avoid a warning embedded
> in inc_nlink() because filesystems aren't trusted to implement
> link counts correctly anymore.
Ok, I did not know that. Maybe I'll take a stab at sorting this out.
xfs aside, for unrelated reasons I was looking at the placement of the
indicator to begin with. Seems like for basic correctness this in fact
wants the inode lock (not the spin lock) and the spin lock is only
taken to synchronize against other spots which modify i_state. Perhaps
it should move, which would also obsolete the above woes.
> Now we're being told that "it is too dangerous to let filesystems
> manage inode state themselves" and so we have to add extra overhead
> to code that we were forced to add to avoid VFS warnings added
> because the VFS doesn't trust filesystems to maintain some other
> important inode state....
>
Given that this is how XFS behaved for a long time now and that
perhaps the I_LINKABLE handling can be redone in the first place,
perhaps Jan will be willing to un-NAK this bit.
> So, if you want to get rid of XFS using I_LINKABLE here, please fix
> the nlink VFS api to allow us to call inc_nlink_<something>() on a
> zero link inode without I_LINKABLE needing to be set. We do actually
> know what we are doing here, and as such needing I_LINKABLE here is
> nothing but a hacky workaround for inflexible, trustless VFS APIs...
>
> > > > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> > > > index caff0125faea..ad94fbf55014 100644
> > > > --- a/fs/xfs/xfs_iops.c
> > > > +++ b/fs/xfs/xfs_iops.c
> > > > @@ -1420,7 +1420,7 @@ xfs_setup_inode(
> > > > bool is_meta = xfs_is_internal_inode(ip);
> > > >
> > > > inode->i_ino = ip->i_ino;
> > > > - inode->i_state |= I_NEW;
> > > > + inode_state_set_raw(inode, I_NEW);
>
> "set" is wrong and will introduce a regression. This must be an
> "add" operation as inode->i_state may have already been modified
> by the time we get here.
There were complaints about original naming and _add/_del/_set got
whacked. So now this settled on _set/_clear/_assign, per the cheat
sheet in the patch. So this does what it was supposed to.
Powered by blists - more mailing lists