lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAPcyv4hytO3OnhpzKa2nKcqRhP-Gmei2LNhOTujx7Gn60wqrvQ@mail.gmail.com>
Date:   Thu, 21 Jan 2021 15:14:25 -0800
From:   Dan Williams <dan.j.williams@...el.com>
To:     Greg KH <gregkh@...uxfoundation.org>
Cc:     "Chen, Mike Ximing" <mike.ximing.chen@...el.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "arnd@...db.de" <arnd@...db.de>,
        "pierre-louis.bossart@...ux.intel.com" 
        <pierre-louis.bossart@...ux.intel.com>
Subject: Re: [PATCH v8 04/20] dlb: add device ioctl layer and first three ioctls

On Wed, Jan 13, 2021 at 1:56 AM Greg KH <gregkh@...uxfoundation.org> wrote:
[..]
> > That's not my concern though. The open race that cdev_del() does not
> > address is ioctl() called after device-unbind. The open fd is never
> > revoked and can live past device_unregister() in which case the ioctl
> > needs to revalidate the device, or (not recommended) block unbind /
> > driver-remove while open file descriptors are outstanding.
>
> A cdev is to track the character device, so the open/close/ioctl issue
> should not be relevant here.
>
> Device unbind is totally different and has nothing to do with the
> character device node, except where you are trying to tie the two
> together.  And yes, you do have to be aware of that, but usually is it
> quite simple.  Complex examples are the v4l layer where the distance
> between the two devices is great, so the middle layer has to handle
> things carefully.
>
> For a "simple" driver like this one, there shouldn't be any issues and
> it should be hard to get this wrong :)

It's trivial to trigger these problems in a bind/unbind test.

>
> > > > It strikes me that a helper like this might address many of the common patterns:
> > > >
> > > > +/**
> > > > + * get_live_device() - increment reference count for device iff !dead
> > > > + * @dev: device.
> > > > + *
> > > > + * Forward the call to get_device() if the device is still alive. If
> > > > + * this is called with the device_lock() held then the device is
> > > > + * guaranteed to not die until the device_lock() is dropped.
> > > > + */
> > > > +struct device *get_live_device(struct device *dev)
> > > > +{
> > > > +       return dev && !dev->p->dead ? get_device(dev) : NULL;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(get_live_device);
> > >
> > > Ick, no, that's still racy :(
> >
> > Hence the comment about needing to synchronize with the driver doing
> > device_unregister().
>
> If you save off your device pointer properly on probe (i.e. you grab a
> reference count as you "know" it is live), then you don't have any of
> these races or need to synchronize.  So again, this should be hard to
> get wrong, unless you have a "heavy" middle layer between the char
> device node and the device structure.

Hold an fd open and continue to issue file_operations while the driver
is torn down.

> > > And a cdev is not a "real" struct device, despite looking like one if
> > > you squint at it.  The patches from Christoph should be merged soon that
> > > remove the last remants of the logic that kind of looked like a struct
> > > device operation (with a kobject), and after that, I will clean it out
> > > to keep it from looking like it ties into the driver model entirely, as
> > > many people get this wrong, because it does not.
> >
> > Agree, but many drivers still tie cdev lifetime to a struct device.
>
> True, and that's normally fine.  Do you have examples of where this is
> wrong in the tree?
>
> > > > Alternatively, if device_lock() is too awkward for a driver it could
> > > > use its own lock and kill_device().
> > > >
> > > > ...am I off base thinking that cdev_del vs fops liveness is a
> > > > widespread problem worth documenting new design patterns?
> > >
> > > It shouldn't be a problem, again, because who would be able to close a
> > > char device node and still be able to call ioctl on it?  The VFS layer
> > > should prevent that from happening, right?
> >
> > Per above, unbind vs and revoking new ioctl() invocations is the concern.
>
> Luckily unbind is a very rare occurance (with the exception of the
> virtio drivers, who seem to love it and use it as their only user/kernel
> api), so this shouldn't be an issue in normal operations of a system.

I agree it's a small window and an infrequently stressed window under
normal conditions. However, it's the same kind of problem that lead to
the need to teach /dev/mem to revoke existing mappings. It's the
synchronization that is needed in the device-dax layer when it is
switching modes from device-mmap-only to online-memory-hotplug the
synchronization with in-flight mmap is needed to avoid collisions. The
libnvdimm subsystem is a heavy user of bind/unbind for applying
different personalities in front of pmem.

I'll advise Mike to not worry about it, I don't want the resolution of
this debate to slow him down if you don't see an issue here.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ