[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251017162116.GA316284@nvidia.com>
Date: Fri, 17 Oct 2025 13:21:16 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Tzung-Bi Shih <tzungbi@...nel.org>
Cc: Benson Leung <bleung@...omium.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
"Rafael J . Wysocki" <rafael@...nel.org>,
Danilo Krummrich <dakr@...nel.org>,
Jonathan Corbet <corbet@....net>, Shuah Khan <shuah@...nel.org>,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
chrome-platform@...ts.linux.dev, linux-kselftest@...r.kernel.org,
Laurent Pinchart <laurent.pinchart@...asonboard.com>,
Bartosz Golaszewski <brgl@...ev.pl>,
Wolfram Sang <wsa+renesas@...g-engineering.com>,
Simona Vetter <simona.vetter@...ll.ch>,
Dan Williams <dan.j.williams@...el.com>
Subject: Re: [PATCH v5 5/7] revocable: Add fops replacement
On Sat, Oct 18, 2025 at 12:07:58AM +0800, Tzung-Bi Shih wrote:
> > From this perspective your example is malformed. Resources should not
> > become revoked concurrently *while a driver is bound*. The driver
> > should be unbound, call misc_deregister_sync()/etc, and return from
> > remove() guaranteeing it no longer touches any resources.
>
> Imagining:
> - Driver X provides the res1.
> - Driver Y provides the res2.
> - Driver Z provides the chardev /dev/zzz. The file operations use res1
> and res2.
> - A userspace program opened /dev/zzz.
Yes, then you have a mess and it is pretty hard to deal with.
We don't usually build things like that, and I'm not aware of any
cases where killing the whole char dev is the right answer. Usually
it's something like 'res1' is critical and 'res2' is discovered
optionally.
Making up elaborate fictional use cases is not a way to justify an
over complex design.
> If it ends up call misc_deregister_sync(), should the synchronous function
> wait for the userspace program to close the FD?
Some subsystems do this, it isn't great.
> The design behind revocable: the driver X waits via synchronize_srcu(), and
> then it is free to go. The subsequent accesses to res1 will get NULL, and
> should fail gracefully.
Yes, I understand what it is for, I am saying it is not required here.
> > For this specific cros_ec driver it's "res" is this:
> >
> > struct cros_ec_dev *ec = dev_get_drvdata(pdev->dev.parent);
> > struct cros_ec_platform *ec_platform = dev_get_platdata(ec->dev);
>
> In fact, no, the "res" we are concerning is struct cros_ec_device, e.g. [1].
> (I knew the naming cros_ec_device vs. cros_ec_dev is somehowg easy to confuse.)
struct cros_ec_dev *ec = dev_get_drvdata(mdev->parent);
struct cros_ec_device *ec_dev = ec->ec_dev;
It's all the same stuff. The parent needs to ensure it remains bound
only while it's ec->ec_dev is valid. That is its job.
> > This is already properly lifetime controlled!
> >
> > It *HAS* to be, and even your patches are assuming it by blindly
> > reaching into the parent's memory!
> >
> > + misc->rps[0] = ec->ec_dev->revocable_provider;
> >
> > If the parent driver has been racily unbound at this point the
> > ec->ec_dev is already a UAF!
>
> Not really, it uses the fact that the caller is from probe(). I think the
> driver can't be unbound when it is still in probe().
Right, but that's my point you are already relying on driver binding
lifetime rules to make your access valid. You should continue to rely
on that and fix the lack of synchronous remove to fix the bug.
> To be clear, I'm using misc as an example which is also the real crash we
> observed. If the file operations use other resources provided by a
> hot-pluggable device, it'd need to use revocable APIs to prevent the
> UAFs.
I understand, but it is a very poor use of this new construct. Come
with a driver that actually has two resources and needs something so
complicated first.
Improve miscdev as Laurent suggested to fix this specific driver bug.
Do not mess up char dev by trying to link it to lists of recovable and
build some wild auto-revoking thing nobody needs.
Jason
Powered by blists - more mailing lists