[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Zp5wOjhgK7HdPqsS@kbusch-mbp.dhcp.thefacebook.com>
Date: Mon, 22 Jul 2024 08:44:10 -0600
From: Keith Busch <kbusch@...nel.org>
To: Greg KH <gregkh@...uxfoundation.org>
Cc: Keith Busch <kbusch@...a.com>, rafael@...nel.org,
linux-kernel@...r.kernel.org, linux-pci@...r.kernel.org,
bhelgaas@...gle.com, lukas@...ner.de
Subject: Re: [PATCH] driver core: get kobject ref when accessing dev_attrs
On Sat, Jul 20, 2024 at 07:17:55AM +0200, Greg KH wrote:
> On Fri, Jul 19, 2024 at 11:55:13AM -0700, Keith Busch wrote:
> > From: Keith Busch <kbusch@...nel.org>
> >
> > Get a reference to the device's kobject while storing and showing device
> > attributes so that the device is valid for the lifetime of the sysfs access.
> > Without this, the device may be released and use-after-free will occur.
> >
> > This is an easy problem to recreate with pci switches. Basic topology on a my
> > qemu test machine:
> >
> > -[0000:00]-+-00.0 Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
> > +-01.0-[01-04]----00.0-[02-04]--+-00.0-[03]--
> > \-01.0-[04]----00.0 Red Hat, Inc. Virtio block device
> >
> > Simultaneously remove devices 04:00.0 and 01:00.0 and you'll hit it:
> >
> > # echo 1 > /sys/bus/pci/devices/0000\:04\:00.0/remove &
> > # echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/remove
>
> So you remove the parent before the child and also want to remove the
> child at the same time? You are going to have bad problems here :)
The example I provided is surely a user error, but it just demonstrates
the issue. The parent device can be removed at any time without user
action: hotplug and error handling take devices down automatically. And
it's not just a problem when requesting to concurrently removing the
child device; it's still a use-after-free from just accessing its
attributes.
> > @@ -2433,12 +2433,15 @@ static ssize_t dev_attr_show(struct kobject *kobj, struct attribute *attr,
> > struct device *dev = kobj_to_dev(kobj);
> > ssize_t ret = -EIO;
> >
> > + if (!kobject_get_unless_zero(kobj))
> > + return -ENXIO;
>
> We've been down this path before, and it doesn't end well from what I
> recall. Attributes that when written to remove themselves need to call
> the correct function to do so (look at how scsi does it). I think this
> change will now break that functionality. Look in the email archives a
> long time ago for more details, I can't recall them at the moment,
> sorry.
Thanks for the suggestion. I'll try to figure out what scsi does and see
if that strategy can work here.
Powered by blists - more mailing lists