[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240731125232.00005aad@Huawei.com>
Date: Wed, 31 Jul 2024 12:52:32 +0100
From: Jonathan Cameron <Jonathan.Cameron@...wei.com>
To: Jason Gunthorpe <jgg@...dia.com>
CC: Jonathan Corbet <corbet@....net>, Itay Avraham <itayavr@...dia.com>, Jakub
Kicinski <kuba@...nel.org>, Leon Romanovsky <leon@...nel.org>,
<linux-doc@...r.kernel.org>, <linux-rdma@...r.kernel.org>,
<netdev@...r.kernel.org>, Paolo Abeni <pabeni@...hat.com>, Saeed Mahameed
<saeedm@...dia.com>, Tariq Toukan <tariqt@...dia.com>, Andy Gospodarek
<andrew.gospodarek@...adcom.com>, Aron Silverton <aron.silverton@...cle.com>,
Dan Williams <dan.j.williams@...el.com>, "David Ahern" <dsahern@...nel.org>,
Christoph Hellwig <hch@...radead.org>, "Jiri Pirko" <jiri@...dia.com>, Leonid
Bloch <lbloch@...dia.com>, "Leon Romanovsky" <leonro@...dia.com>,
<linux-cxl@...r.kernel.org>, <patches@...ts.linux.dev>
Subject: Re: [PATCH v2 7/8] fwctl/mlx5: Support for communicating with mlx5
fw
> > > +static void mlx5ctl_remove(struct auxiliary_device *adev)
> > > +{
> > > + struct mlx5ctl_dev *mcdev __free(mlx5ctl) = auxiliary_get_drvdata(adev);
> >
> > So this is calling fwctl_put(&mcdev->fwctl) on scope exit.
> >
> > Why do you need to drop a reference beyond the one fwctl_unregister() is dropping
> > in cdev_device_del()? Where am I missing a reference get?
>
> fwctl_register() / fwctl_unregister() are pairs. Internally they pair
> cdev_device_add() / cdev_device_del() which decrease some internal
> cdev refcounts.
>
> _alloc_device() / __free(mlx5ctl) above are the other pair.
> device_initialize() holds a reference from probe to remove.
>
> It has to work this way because if cdev_device_del() would put back
> all the references we would immediately UAF, eg:
>
> cdev_device_del(&fwctl->cdev, &fwctl->dev);
>
> /* Disable and free the driver's resources for any still open FDs. */
> guard(rwsem_write)(&fwctl->registration_lock);
> guard(mutex)(&fwctl->uctx_list_lock);
> ^^^^^^^
> Must still be allocated
>
> And more broadly, though mlx5 does not use this, it would be safe for
> a driver to do:
>
> fwctl_unregister();
> kfree(mcdev->mymemory);
> ^^^^^^ Must still be allocated!
> fwctl_put(&mcdev->fwctl);
>
> So we have the two steps where unregister makes it safe for the driver
> to begin teardown but keeps memory around, and the final put which
> releases the memory after driver teardown is done.
>
> This is also captured in the cleanup.h notation:
>
> struct mlx5ctl_dev *mcdev __free(mlx5ctl) = fwctl_alloc_device(
> &mdev->pdev->dev, &mlx5ctl_ops, struct mlx5ctl_dev,
> fwctl);
> ^^^^^^^^^^^^
> Here we indicate we have a ref on the stack from
> fwctl_alloc_device
>
> auxiliary_set_drvdata(adev, no_free_ptr(mcdev));
> ^^^^^^^^^^^^^^^^^ Move the ref
> into drvdata
>
> struct mlx5ctl_dev *mcdev __free(mlx5ctl) = auxiliary_get_drvdata(adev);
> ^^^^^^^^^^^ Move the ref out of
> drvdata onto the stack
>
Thanks for the explanation. I clearly needed more coffee that day :)
Personally I find this to be a confusing use of scoped cleanup
as we aren't associating a constructor / destructor with scope, but
rather sort of 'adopting ownership / destructor'.
Assuming my caffeine level is better today, maybe device managed is
more appropriate?
devm_add_action_or_reset to associate the destructor by placing
it immediately after the setup path for both the allocate and unregister.
Should run in very nearly same order for teardown as what you have here.
Alternatively this is just a new pattern I should get used to.
Jonathan
Powered by blists - more mailing lists