netdev - Re: [RFC 0/5] devlink: add an explicit locking API

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YYF//p5mDQ2/reOD@unreal>
Date:   Tue, 2 Nov 2021 20:14:22 +0200
From:   Leon Romanovsky <leon@...nel.org>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     idosch@...sch.org, edwin.peer@...adcom.com, jiri@...nulli.us,
        netdev@...r.kernel.org
Subject: Re: [RFC 0/5] devlink: add an explicit locking API

On Tue, Nov 02, 2021 at 08:14:12AM -0700, Jakub Kicinski wrote:
> On Tue, 2 Nov 2021 10:08:39 +0200 Leon Romanovsky wrote:
> > On Mon, Nov 01, 2021 at 02:16:13PM -0700, Jakub Kicinski wrote:
> > > On Mon, 1 Nov 2021 20:36:15 +0200 Leon Romanovsky wrote:  
> > > > This is three step process, but mainly it is first step. We need to make
> > > > sure that functions that can re-entry will use nested locks.
> > > > 
> > > > Steps:
> > > > 1. Use proper locking API that supports nesting:
> > > > https://lore.kernel.org/netdev/YYABqfFy%2F%2Fg5Gdis@nanopsycho/T/#mf9dc5cac2013abe413545bbe4a09cc231ae209a4  
> > > 
> > > Whether we provide the an unlocked API or allow lock nesting 
> > > on the driver API is not that important to me.  
> > 
> > Thanks
> 
> Not sure what you're thanking for. I still prefer two explicit APIs.
> Allowing nesting is not really necessary here. Callers know whether
> they hold the lock or not.

I'm doubt about. It maybe easy to tell in reload flow, but it is much
harder inside eswitch mode change (as an example).

> 
> Please see netdevsim, it takes the devlink_lock() at entry points to
> the driver - which in turn means the API which takes a lock can be
> removed once all drivers are converted.

Netdevsim is a nice example, but it is very different from the real device.

> 
> > > > 2. Convert devlink->lock to be RW semaphore:
> > > > commit 4506dd3a90a82a0b6bde238f507907747ab88407
> > > > Author: Leon Romanovsky <leon@...nel.org>
> > > > Date:   Sun Oct 24 16:54:16 2021 +0300
> > > > 
> > > >     devlink: Convert devlink lock to be RW semaphore
> > > > 
> > > >     This is naive conversion of devlink->lock to RW semaphore, so we will be
> > > >     able to differentiate commands that require exclusive access vs. parallel
> > > >     ready-to-run ones.
> > > > 
> > > >     All "set" commands that used devlink->lock are converted to write lock,
> > > >     while all "get" commands are marked with read lock.
> > > > 
> > > > @@ -578,8 +584,12 @@ static int devlink_nl_pre_doit(const struct genl_ops *ops,
> > > >                 mutex_unlock(&devlink_mutex);
> > > >                 return PTR_ERR(devlink);
> > > >         }
> > > > -       if (~ops->internal_flags & DEVLINK_NL_FLAG_NO_LOCK)
> > > > -               mutex_lock(&devlink->lock);
> > > > +
> > > > +       if (~ops->internal_flags & DEVLINK_NL_FLAG_SHARED_ACCESS)
> > > > +               down_write(&devlink->rwsem);
> > > > +       else
> > > > +               down_read(&devlink->rwsem);
> > > > +  
> > > 
> > > IIUC the RW sem thing is an optimization, it's besides the point.  
> > 
> > I need RW as a way to ensure "exclusive" access during _set_ operation.
> > It is not an optimization, but simple way to understand if parallel
> > access is possible at this specific point of time.
> 
> How is this not an optimization to allow parallel "reads"?

You need to stop everything when _set_ command is called. One way is to
require for all netlink devlink calls to have lock, another solution is
to use RW semaphore. This is why it is not optimization, but an implementation.
Parallel "reads" are nice bonus.

> 
> Anything that calls the driver has no idea whether the driver needs
> read or write locking (assuming drivers depend on devlink for locking
> as they should) so I presume you only plan this for dumps which don't
> call the driver?

I planned to separate based on command nature:
 * _set_ == write lock
 * _get_ == read lock

> 
> Anyway, consider this a nack, it should most definitely not be a part 
> of the initial conversions. We have enough fallout to deal with.

Any solution should take this into account. We can't design scheme that
will be needed to be rewritten immediately after merge.

> 
> > > > How? I have only one rule:
> > > >  * Driver author should be aware that between devlink_register() and
> > > >    devlink_unregister(), users can send netlink commands.
> > > > 
> > > > In your solution, driver authors will need to follow whole devlink
> > > > how-to-use book.  
> > > 
> > > Only if they need refs. If you don't the API is the same as yours.
> > > 
> > > IOW you don't provide an API for advanced use cases at all. You force
> > > the drivers to implement their own reference counting and locking.   
> > 
> > The complex drivers already do it anyway, because they need to reference
> > counting their own structures to make sure that the lifetime of these
> > structures meats their model.
> 
> Well, no comments on what mlx5 does or has to do but it's likely the
> most complex driver. Don't judge what others will benefit from based 
> on mlx5.
> 
> > > I want them to just rely on devlink as a framework.  
> > 
> > And I don't :). For me devlink is a way to configure device, not manage
> > lifetime of driver specific data structures.
> 
> My mind is set here. Obviously "community" can override, so gather
> support from significant devs or it's my way... sorry to be blunt.

No problem

> 
> > > How many driver locks do you have in netdevsim after conversion?
> > > All 3? How do you add a port to the instance from sysfs without a
> > > AB / BA deadlock between the port lock and the devlink lock if the
> > > driver can't take the devlink lock? Are you still going to need the 
> > > "in_reload" wart?  
> > 
> > I don't know yet, because as you wrote before netdevsim is for
> > prototyping and such ABBA deadlock doesn't exist in real devices.
> > 
> > My current focus is real devices for now.
> 
> I wrote it with nfp in mind as well. It has a delayed work which needs
> to take the port lock. Sadly I don't have any nfps handy and I didn't
> want to post an untested patch.

Do you remember why was port configuration implemented with delayed work?

> 
> > Maybe the solution will be to go away from sysfs completely. We will see.
> 
> Sigh, I feel like I just fixed netdevsim after it has been getting bit
> rotted for a couple of years.
> 
> > > > Strange, I see completely opposite picture in the git log with so much
> > > > changes that have Fixes line and add/remove/mention rtnl_lock. Maybe it
> > > > is useful, but almost all if not all drivers full of fixes of rtnl_lock
> > > > usage. I don't want same for the devlink.  
> > > 
> > > If you don't provide a locking API to the drivers you'll have to fix 2x
> > > the bugs, and each of them subtly different. At least maintainers know
> > > what rtnl_lock rules are.  
> > 
> > I clearly remember this patch and the sentence "...in
> > some devices' resume function(igb_resum,igc_resume) they calls rtnl_lock()
> > again". The word "... some ..." hints to me that maintainers have different
> > opinion on how to use rtnl_lock.
> > 
> > https://lore.kernel.org/netdev/20210809032809.1224002-1-acelan.kao@canonical.com/
> 
> Yes, using rtnl_lock for PM is certainly a bad idea, and I'm not sure
> why Intel does it. There are 10s of drivers which take rtnl_lock
> correctly and it greatly simplifies their lives.

I would say that you are ignoring that most of such drivers don't add
new functionality.

Anyway, I got your point, please give me time to see what I can do.

In case, we will adopt your model, will you convert all drivers?

Thanks