linux-kernel - Re: [PATCH for 6.19-rc1] fs: preserve file type in make_bad

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAGudoHEF6O-xR7B2F=6-Gy7t78FvH_f_0YcbmQM9gQAwthFpAA@mail.gmail.com>
Date: Thu, 11 Dec 2025 00:27:05 +0100
From: Mateusz Guzik <mjguzik@...il.com>
To: Al Viro <viro@...iv.linux.org.uk>
Cc: Jan Kara <jack@...e.cz>, Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>, 
	syzbot <syzbot+d222f4b7129379c3d5bc@...kaller.appspotmail.com>, brauner@...nel.org, 
	jlbec@...lplan.org, joseph.qi@...ux.alibaba.com, linkinjeon@...nel.org, 
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, mark@...heh.com, 
	ocfs2-devel@...ts.linux.dev, sj1557.seo@...sung.com, 
	syzkaller-bugs@...glegroups.com, Chuck Lever <chuck.lever@...cle.com>
Subject: Re: [PATCH for 6.19-rc1] fs: preserve file type in make_bad_inode()
 unless invalid

On Wed, Dec 10, 2025 at 10:13 PM Al Viro <viro@...iv.linux.org.uk> wrote:
>
> On Wed, Dec 10, 2025 at 11:24:40AM +0100, Mateusz Guzik wrote:
>
> > I'm delighted to see the call is considered bogus.
> >
> > As for being able to assert on it, I noted the current flag handling
> > for lifecycle tracking is unhelpful.
> >
> > Per your response, i_state == 0 is overloaded to mean the inode is
> > fully sorted out *and* that it is brand new.
> >
> > Instead clear-cut indicators are needed to track where the inode is in
> > its lifecycle.
> >
> > I proposed 2 ways: a dedicated enum or fucking around with flags.
> >
> > Indeed the easiest stepping stone for the time being would be to push
> > up I_NEW to alloc_inode and assert on it in places which set the flag.
> > I'm going to cook it up.
>
> You are misinterpreting what I_NEW is about - it is badly named, no
> arguments here, but it's _not_ "inode is new".
>
> It's "it's in inode hash, but if you find it on lookup, you'll need to wait -
> it's not entirely set up".
>

Comments in the hash code make it pretty clear. The above is a part of
a bigger picture, which I already talked about in the 'light refcount'
patchset or whatever the name was.

The general problem statement is that the VFS layer suffers from a
chronic lack of assertions, which in turn helps people add latent bugs
(case in point: make_bad_inode() armed with asserts on state would
have blown up immediately and this entire thread would have been
avoided).

One of the things missing to create good coverage is reliable inode
lifecycle tracking. Currently *some of it* can be determined with some
of the flags in ->i_state, but even there are important states which
are straight up missing, notably whether the filesystem claims the
inode is ready to use *or* not ready at all (not even in the hash) is
indistinguishable by ->i_state. Trying to figure it out by other means
is avoidable tech debt. Bare minimum denoted states have to
distinguish between just allocated, creation aborted, ready to use, in
teardown and finally torn down.

An important part is validating whether inode at hand adheres to the
API contract when the filesystem claims it is ready. For example
->i_mode has to have a valid type set, but syzkaller convinced ntfs to
let an inode with invalid mode get out, which later resulted in a warn
in execve code because may_open() did not apply any of the checks to
it. This is the kind of a problem which can and should be checked for
before the inode is allowed to be used.

Another benefit is that some of the state can be pre-computed. For example this:
 static inline int do_inode_permission(struct mnt_idmap *idmap,
                                        struct inode *inode, int mask)
  {
          if (unlikely(!(inode->i_opflags & IOP_FASTPERM))) {
                  if (likely(inode->i_op->permission))
                          return inode->i_op->permission(idmap, inode,
mask);

                  /* This gets set once for the inode lifetime */
                  spin_lock(&inode->i_lock);
                  inode->i_opflags |= IOP_FASTPERM;
                  spin_unlock(&inode->i_lock);
          }
          return generic_permission(idmap, inode, mask);
  }

... will stop setting the flag as this aspect will be already sorted
out. There is another crapper of the sort in vfs_readlink and one can
suspect more will show up over time.

Back to lifecycle tracking, I_NEW could change semantics to mean "this
inode *is not* ready for use yet". unlock_new_inode() already serves
as a "this inode *is* ready for use" indicator, it just happens to not
be mandatory to call. Another routine could be added for filesystems
which don't use the hash to cover that gap.

Then for filesystems which *do* use the hash the entire thing is transparent.

> A plenty of inodes never enter that state at all.  Hell, consider pipes.
> Or sockets.  Or anything on procfs.  Or sysfs, or...
>

So whether I change the meaning of I_NEW or add another flag, I will
still need to patch these suckers to do *something* on the inodes they
create.

That's not the whole story, but should be enough to convey what I'm gunning for.

This will also have a side effect of giving I_NEW a more fitting use.