linux-kernel - Re: [PATCH v7 03/14] fs: provide accessors for ->i

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAGudoHFpoo0Qm=b4Z85tbJJmhh+vmSHuNnm3pVaLaQsmX9mURg@mail.gmail.com>
Date: Wed, 15 Oct 2025 07:46:39 +0200
From: Mateusz Guzik <mjguzik@...il.com>
To: Dave Chinner <david@...morbit.com>
Cc: Jan Kara <jack@...e.cz>, brauner@...nel.org, viro@...iv.linux.org.uk, 
	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org, 
	josef@...icpanda.com, kernel-team@...com, amir73il@...il.com, 
	linux-btrfs@...r.kernel.org, linux-ext4@...r.kernel.org, 
	linux-xfs@...r.kernel.org, ceph-devel@...r.kernel.org, 
	linux-unionfs@...r.kernel.org
Subject: Re: [PATCH v7 03/14] fs: provide accessors for ->i_state

On Wed, Oct 15, 2025 at 12:24 AM Dave Chinner <david@...morbit.com> wrote:
>
> On Fri, Oct 10, 2025 at 05:51:06PM +0200, Mateusz Guzik wrote:
> > On Fri, Oct 10, 2025 at 4:44 PM Jan Kara <jack@...e.cz> wrote:
> > >
> > > On Thu 09-10-25 09:59:17, Mateusz Guzik wrote:
> > > > +static inline void inode_state_set_raw(struct inode *inode,
> > > > +                                    enum inode_state_flags_enum flags)
> > > > +{
> > > > +     WRITE_ONCE(inode->i_state, inode->i_state | flags);
> > > > +}
> > >
> > > I think this shouldn't really exist as it is dangerous to use and if we
> > > deal with XFS, nobody will actually need this function.
> > >
> >
> > That's not strictly true, unless you mean code outside of fs/inode.c
> >
> > First, something is still needed to clear out the state in
> > inode_init_always_gfp().
> >
> > Afterwards there are few spots which further modify it without the
> > spinlock held (for example see insert_inode_locked4()).
> >
> > My take on the situation is that the current I_NEW et al handling is
> > crap and the inode hash api is also crap.
>
> The inode hash implementation is crap, too. The historically poor
> scalability characteristics of the VFS inode cache is the primary
> reason we've never considered ever trying to port XFS to use it,
> even if we ignore all the inode lifecycle issues that would have to
> be solved first...
>

I don't know of anyone defending the inode hash tho. The performance
of the thing was already bashed a few times, I did not see anyone
dunking on the API ;)

> > For starters freshly allocated inodes should not be starting with 0,
> > but with I_NEW.
>
> Not all inodes are cached filesystem inodes. e.g. anonymous inodes
> are initialised to inode->i_state = I_DIRTY.  pipe inodes also start
> at I_DIRTY. socket inodes don't touch i_state at init, so they
> essentially init i_state = 0....
>
> IOWs, the initial inode state depends on what the inode is being
> used for, and I_NEW is only relevant to inodes that are cached and
> can be found before the filesystem has fully initialised the VFS
> inode.
>

Well it is true that currently the I_NEW flag is there to help out
entities like the hash inode hash.

I'm looking to change it into a generic indicator of an uninitialized
inode. This is completely harmless for the consumers which currently
operate on inodes which never had the flag.

Here is one use: I'd like to introduce a mandatory routine to call
when the filesystem at hand claims the inode is ready to use.

Said routine would have 2 main purposes:
- validate the state of the inode (for example that a valid mode is
set; this would have caught some of the syzkaller bugs from the get
go)
- pre-compute a bunch of stuff, for example see this crapper:

   static inline int do_inode_permission(struct mnt_idmap *idmap,
                                        struct inode *inode, int mask)
  {
          if (unlikely(!(inode->i_opflags & IOP_FASTPERM))) {
                  if (likely(inode->i_op->permission))
                          return inode->i_op->permission(idmap, inode,
mask);

                  /* This gets set once for the inode lifetime */
                  spin_lock(&inode->i_lock);
                  inode->i_opflags |= IOP_FASTPERM;
                  spin_unlock(&inode->i_lock);
          }
          return generic_permission(idmap, inode, mask);
  }

The IOP_FASTPERM could be computed by the new routine, so this would
simplify to:
  static inline int do_inode_permission(struct mnt_idmap *idmap,
                                        struct inode *inode, int mask)
  {
          if (unlikely(!(inode->i_opflags & IOP_FASTPERM)))
                  return inode->i_op->permission(idmap, inode, mask);
          return generic_permission(idmap, inode, mask);
  }

The routine would assert the inode is I_NEW and would clear the flag,
replacing it with something else indicating the inode is indeed ready
to use.

While technically the I_NEW change is not necessarily to get there, I
do think it makes things cleaner.

Note unlock_new_inode() and similar are not mandatory to call.