[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251017134916.GK3901471@nvidia.com>
Date: Fri, 17 Oct 2025 10:49:16 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Tzung-Bi Shih <tzungbi@...nel.org>
Cc: Benson Leung <bleung@...omium.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
"Rafael J . Wysocki" <rafael@...nel.org>,
Danilo Krummrich <dakr@...nel.org>,
Jonathan Corbet <corbet@....net>, Shuah Khan <shuah@...nel.org>,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
chrome-platform@...ts.linux.dev, linux-kselftest@...r.kernel.org,
Laurent Pinchart <laurent.pinchart@...asonboard.com>,
Bartosz Golaszewski <brgl@...ev.pl>,
Wolfram Sang <wsa+renesas@...g-engineering.com>,
Simona Vetter <simona.vetter@...ll.ch>,
Dan Williams <dan.j.williams@...el.com>
Subject: Re: [PATCH v5 5/7] revocable: Add fops replacement
On Fri, Oct 17, 2025 at 02:36:58AM +0000, Tzung-Bi Shih wrote:
> Imagining the following example:
>
> /* res1 and res2 are provided by hot-pluggable devices. */
> struct filp_priv {
> void *res1;
> void *res2;
> };
>
> /* In .open() fops */
> priv = kzalloc(sizeof(struct filp_priv), ...);
> priv->res1 = ...;
> priv->res2 = ...;
> filp->private_data = priv;
>
> /* In .read() fops */
> priv = filp->private_data;
> priv->res1 // could result UAF if the device has gone
> priv->res2 // could result UAF if the device has gone
>
>
> How does the bool * work for the example?
You are thinking about it completely wrong, you are trying to keep the
driver running conccurrently after it's remove returns - but that
isn't how Linux drivers are designed.
We have a whole family of synchronous fencing APIs that drivers call
in their remove() callback to shut down their concurrency. Think of
things like free_irq(), cancel_work_sync(), timer_shutdown_sync(),
sysfs_remove_files(). All of these guarentee the concurrent callbacks
are fenced before returning.
The only issue with cros_ec is this:
static void cros_ec_chardev_remove(struct platform_device *pdev)
{
struct miscdevice *misc = dev_get_drvdata(&pdev->dev);
misc_deregister(misc);
}
It doesn't fence the cdevs! Misc is a hard API to use because it
doesn't have a misc_deregister_sync() variation!
Dan/Laurent's point and proposal was that mis_deregister() does not
work like this! It is an anomaly that driver authors typically over
look.
So the proposal was to add some way to get a:
misc_deregister_sync()
What gives the fence. Under your proposal it would lock the SRCU and
change the bool. After it returns no cdev related threads are running
in fops touching res1/res2. I think your proposal to replace the fops
and that related machinery is smart and has a chance to succeed.
>From this perspective your example is malformed. Resources should not
become revoked concurrently *while a driver is bound*. The driver
should be unbound, call misc_deregister_sync()/etc, and return from
remove() guaranteeing it no longer touches any resources.
For this specific cros_ec driver it's "res" is this:
struct cros_ec_dev *ec = dev_get_drvdata(pdev->dev.parent);
struct cros_ec_platform *ec_platform = dev_get_platdata(ec->dev);
This is already properly lifetime controlled!
It *HAS* to be, and even your patches are assuming it by blindly
reaching into the parent's memory!
+ misc->rps[0] = ec->ec_dev->revocable_provider;
If the parent driver has been racily unbound at this point the
ec->ec_dev is already a UAF!
For cros it is safe because the cros_ec driver is a child of a MFD and
the MFD logic ensures that the children are unbound as part of
destroying the parent. So 'ec' is guarenteed valid from probe() to
remove() return.
IHMO auto-revoke is a terrible idea, if you go down that path then why
is misc special? You need to auto-revoke irqs, timers, work queues,
etc too? That's a mess.
I think your previous idea for revoke was properly formed, the issue
and objection was that the bug you are fixing is a miscdev complexity
caused by the lack of misc_deregister_sync(). If you fix that directly
then you don't need recovable at all, and it is a much more useful fix
that is an easy and natural API for drivers to use.
Jason
Powered by blists - more mailing lists