lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAGudoHE9DbPVtn+v8mUJvQ69FNp1ieCUH2QWwTW9kRd43o_wXw@mail.gmail.com>
Date: Tue, 25 Nov 2025 12:33:46 +0100
From: Mateusz Guzik <mjguzik@...il.com>
To: Jan Kara <jack@...e.cz>
Cc: brauner@...nel.org, viro@...iv.linux.org.uk, linux-kernel@...r.kernel.org, 
	linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH v2] fs: scale opening of character devices

On Tue, Nov 25, 2025 at 12:26 PM Jan Kara <jack@...e.cz> wrote:
>
> On Fri 21-11-25 08:28:18, Mateusz Guzik wrote:
> > chrdev_open() always takes cdev_lock, which is only needed to synchronize
> > against cd_forget(). But the latter is only ever called by inode evict(),
> > meaning these two can never legally race. Solidify this with asserts.
>
> Are you sure? Because from a quick glance it doesn't seem that inodes hold
> a refcount of the cdev. Thus inode->i_cdev can freed if you don't hold
> cdev_lock - it is only the cdev_lock that makes cdev_get() safe against UAF
> issues in chrdev_open() AFAICS because that blocks cdev_purge() from
> completing when last ref of the kobject is dropped...
>

Oh huh, I somehow missed cdev_purge().

The assumption was if ->i_cdev is legal to store in the first place,
it has to be legally accessible. That goes down the drain with the
above.

I'll cook something up for the next merge window.

Note devices like /dev/null are there for the duration, so the
->i_cdev clearing problem is not important for *that* one. It's a
matter of figuring out how to plug that information through.

>                                                                 Honza
>
> >
> > More cleanups are needed here but this is enough to get the thing out of
> > the way.
> >
> > Rationale is funny-sounding at first: opening of /dev/zero happens to be
> > a contention point in large-scale package building (think 100+ packages
> > at the same with a thread count to support it). Such a workload is not
> > only very fork+exec heavy, but frequently involves scripts which use the
> > idiom of silencing output by redirecting it to /dev/null.
> >
> > A non-large-scale microbenchmark of opening /dev/null in a loop in 16
> > processes:
> > before:       2865472
> > after:        4011960 (+40%)
> >
> > Code goes from being bottlenecked on the spinlock to being bottlenecked
> > on lockref.
> >
> > Signed-off-by: Mateusz Guzik <mjguzik@...il.com>
> > ---
> >
> > v2:
> > - add back new = NULL lost in refactoring
> >
> > I'll note for interested my experience with the workload at hand comes
> > from FreeBSD and was surprised to find /dev/null on the profile. Given
> > that Linux is globally serializing on it, it has to be a factor as well
> > in this case.
> >
> >  fs/char_dev.c        | 20 +++++++++++---------
> >  fs/inode.c           |  2 +-
> >  include/linux/cdev.h |  2 +-
> >  3 files changed, 13 insertions(+), 11 deletions(-)
> >
> > diff --git a/fs/char_dev.c b/fs/char_dev.c
> > index c2ddb998f3c9..9a6dfab084d1 100644
> > --- a/fs/char_dev.c
> > +++ b/fs/char_dev.c
> > @@ -374,15 +374,15 @@ static int chrdev_open(struct inode *inode, struct file *filp)
> >  {
> >       const struct file_operations *fops;
> >       struct cdev *p;
> > -     struct cdev *new = NULL;
> >       int ret = 0;
> >
> > -     spin_lock(&cdev_lock);
> > -     p = inode->i_cdev;
> > +     VFS_BUG_ON_INODE(icount_read(inode) < 1, inode);
> > +
> > +     p = READ_ONCE(inode->i_cdev);
> >       if (!p) {
> >               struct kobject *kobj;
> > +             struct cdev *new = NULL;
> >               int idx;
> > -             spin_unlock(&cdev_lock);
> >               kobj = kobj_lookup(cdev_map, inode->i_rdev, &idx);
> >               if (!kobj)
> >                       return -ENXIO;
> > @@ -392,19 +392,19 @@ static int chrdev_open(struct inode *inode, struct file *filp)
> >                  we dropped the lock. */
> >               p = inode->i_cdev;
> >               if (!p) {
> > -                     inode->i_cdev = p = new;
> > +                     p = new;
> > +                     WRITE_ONCE(inode->i_cdev, p);
> >                       list_add(&inode->i_devices, &p->list);
> >                       new = NULL;
> >               } else if (!cdev_get(p))
> >                       ret = -ENXIO;
> > +             spin_unlock(&cdev_lock);
> > +             cdev_put(new);
> >       } else if (!cdev_get(p))
> >               ret = -ENXIO;
> > -     spin_unlock(&cdev_lock);
> > -     cdev_put(new);
> >       if (ret)
> >               return ret;
> >
> > -     ret = -ENXIO;
> >       fops = fops_get(p->ops);
> >       if (!fops)
> >               goto out_cdev_put;
> > @@ -423,8 +423,10 @@ static int chrdev_open(struct inode *inode, struct file *filp)
> >       return ret;
> >  }
> >
> > -void cd_forget(struct inode *inode)
> > +void inode_cdev_forget(struct inode *inode)
> >  {
> > +     VFS_BUG_ON_INODE(!(inode_state_read_once(inode) & I_FREEING), inode);
> > +
> >       spin_lock(&cdev_lock);
> >       list_del_init(&inode->i_devices);
> >       inode->i_cdev = NULL;
> > diff --git a/fs/inode.c b/fs/inode.c
> > index a62032864ddf..88be1f20782d 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -840,7 +840,7 @@ static void evict(struct inode *inode)
> >               clear_inode(inode);
> >       }
> >       if (S_ISCHR(inode->i_mode) && inode->i_cdev)
> > -             cd_forget(inode);
> > +             inode_cdev_forget(inode);
> >
> >       remove_inode_hash(inode);
> >
> > diff --git a/include/linux/cdev.h b/include/linux/cdev.h
> > index 0e8cd6293deb..bed99967ad90 100644
> > --- a/include/linux/cdev.h
> > +++ b/include/linux/cdev.h
> > @@ -34,6 +34,6 @@ void cdev_device_del(struct cdev *cdev, struct device *dev);
> >
> >  void cdev_del(struct cdev *);
> >
> > -void cd_forget(struct inode *);
> > +void inode_cdev_forget(struct inode *);
> >
> >  #endif
> > --
> > 2.48.1
> >
> --
> Jan Kara <jack@...e.com>
> SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ