[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2023112938-unhook-defiance-75ed@gregkh>
Date: Wed, 29 Nov 2023 09:20:32 +0000
From: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To: Saeed Mahameed <saeed@...nel.org>
Cc: Arnd Bergmann <arnd@...db.de>, Jason Gunthorpe <jgg@...dia.com>,
Leon Romanovsky <leonro@...dia.com>,
Jiri Pirko <jiri@...dia.com>, Leonid Bloch <lbloch@...dia.com>,
Itay Avraham <itayavr@...dia.com>,
Jakub Kicinski <kuba@...nel.org>, linux-kernel@...r.kernel.org,
Saeed Mahameed <saeedm@...dia.com>
Subject: Re: [PATCH V3 2/5] misc: mlx5ctl: Add mlx5ctl misc driver
On Wed, Nov 29, 2023 at 01:08:39AM -0800, Saeed Mahameed wrote:
> On 27 Nov 18:59, Greg Kroah-Hartman wrote:
> > On Mon, Nov 20, 2023 at 11:06:16PM -0800, Saeed Mahameed wrote:
> > > +struct mlx5ctl_dev {
> > > + struct mlx5_core_dev *mdev;
> > > + struct miscdevice miscdev;
> > > + struct auxiliary_device *adev;
> > > + struct list_head fd_list;
> > > + spinlock_t fd_list_lock; /* protect list add/del */
> > > + struct rw_semaphore rw_lock;
> > > + struct kref refcount;
> >
> > You now have 2 different things that control the lifespan of this
> > structure. We really need some way to automatically check this so that
> > people don't keep making this same mistake, it happens all the time :(
> >
> > Please pick one structure (miscdevice) or the other (kref) to control
> > the lifespan, having 2 will just not work.
> >
>
> miscdevice doesn't handle the lifespan, open files will remain open even
> after the miscdevice was unregistered, hence we use the kref to defer the
> kfree until the last open file is closed.
miscdevice has a reference counter and a lifecycle, you can not have two
reference counted objects in the same structure and expect things to
work well.
> > Also, why a rw_semaphore? Only use those if you can prove with a
> > benchmark that it is actually faster, otherwise it's simpler to just use
> > a normal mutex (hint, you are changing the fields in the structure with
> > the read lock held, which feels very wrong, and so needs a LOT of
> > documentation, or just use a normal mutex...)
> >
>
> It is needed so we can protect against underlaying device unloading while
> miscdevice is active, we use rw semaphore since we don't care about
> synchronization between any of the fops, but we want to protect current
> active ioctls and fops from sudden mlx5ctl_remove (auxiliary_driver.remove)
> which can happen dynamically due to underlaying device removal.
Then use a normal mutex. Only use a rw lock if you can prove the
performance needs it as usually a rw lock is slower and more complex as
you then have to document stuff like:
> So here is the locking scheme:
>
> write_lock() : only on mlx5_ctl remove and mark the device is down
> via assigning NULL to mcdev->dev, to let all new readers abort and to wait
> for current readers to finish their task.
>
> read_lock(): used in all fops and ioctls, to make sure underlaying
> mlx5_core device is still active, and to prevent open files to access the
> device when miscdevice was already unregistered.
>
> I agree, this should've been documented in the code, I will add
> documentation.
Just make it simple and use a normal mutex please.
And fix up the reference counting, it shouldn't be this complex, it's
just a "simple" misc device driver :)
But before you do that, please see my other email about why not using
devlink for all of this instead.
thanks,
greg k-h
Powered by blists - more mailing lists