[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YYAzn+mtrGp/As74@unreal>
Date: Mon, 1 Nov 2021 20:36:15 +0200
From: Leon Romanovsky <leon@...nel.org>
To: Jakub Kicinski <kuba@...nel.org>
Cc: idosch@...sch.org, edwin.peer@...adcom.com, jiri@...nulli.us,
netdev@...r.kernel.org
Subject: Re: [RFC 0/5] devlink: add an explicit locking API
On Mon, Nov 01, 2021 at 07:32:59AM -0700, Jakub Kicinski wrote:
> On Sun, 31 Oct 2021 09:23:42 +0200 Leon Romanovsky wrote:
> > On Sat, Oct 30, 2021 at 04:12:49PM -0700, Jakub Kicinski wrote:
> > > This implements what I think is the right direction for devlink
> > > API overhaul. It's an early RFC/PoC because the body of code is
> > > rather large so I'd appreciate feedback early... The patches are
> > > very roughly split, the point of the posting is primarily to prove
> > > that from the driver side the conversion is an net improvement
> > > in complexity.
> > >
> > > IMO the problems with devlink locking are caused by two things:
> > >
> > > (1) driver has no way to block devlink calls like it does in case
> > > of netedev / rtnl_lock, note that for devlink each driver has
> > > it's own lock so contention is not an issue;
> > >
> > > (2) sometimes devlink calls into the driver without holding its lock
> > > - for operations which may need the driver to call devlink (port
> > > splitting, switch config, reload etc.), the circular dependency
> > > is there, the driver can't solve this problem.
> > >
> > > This set "fixes" the ordering by allowing the driver to participate
> > > in locking. The lock order remains:
> > >
> > > device_lock -> [devlink_mutex] -> devlink instance -> rtnl_lock
> > >
> > > but now driver can take devlink lock itself, and _ALL_ devlink ops
> > > are locked.
> > >
> > > The expectation is that driver will take the devlink instance lock
> > > on its probe and remove paths, hence protecting all configuration
> > > paths with the devlink instance lock.
> > >
> > > This is clearly demonstrated by the netdevsim conversion. All entry
> > > points to driver config are protected by devlink instance lock, so
> > > the driver switches to the "I'm already holding the devlink lock" API
> > > when calling devlink. All driver locks and trickery is removed.
> > >
> > > The part which is slightly more challanging is quiescing async entry
> > > points which need to be closed on the devlink reload path (workqueue,
> > > debugfs etc.) and which also take devlink instance lock. For that we
> > > need to be able to take refs on the devlink instance and let them
> > > clean up after themselves rather than waiting synchronously.
> > >
> > > That last part is not 100% finished in this patch set - it works but
> > > we need the driver to take devlink_mutex (the global lock) from its
> > > probe/remove path. I hope this is good enough for an RFC, the problem
> > > is easily solved by protecting the devlink XArray with something else
> > > than devlink_mutex and/or not holding devlink_mutex around each op
> > > (so that there is no ordering between the global and instance locks).
> > > Speaking of not holding devlink_mutex around each op this patch set
> > > also opens the path for parallel ops to different devlink instances
> > > which is currently impossible because all user space calls take
> > > devlink_mutex...
> >
> > No, please no.
<...>
> How is RW semaphore going to solve the problem that ops are unlocked
> and have to take the instance lock from within to add/remove ports?
This is three step process, but mainly it is first step. We need to make
sure that functions that can re-entry will use nested locks.
Steps:
1. Use proper locking API that supports nesting:
https://lore.kernel.org/netdev/YYABqfFy%2F%2Fg5Gdis@nanopsycho/T/#mf9dc5cac2013abe413545bbe4a09cc231ae209a4
2. Convert devlink->lock to be RW semaphore:
commit 4506dd3a90a82a0b6bde238f507907747ab88407
Author: Leon Romanovsky <leon@...nel.org>
Date: Sun Oct 24 16:54:16 2021 +0300
devlink: Convert devlink lock to be RW semaphore
This is naive conversion of devlink->lock to RW semaphore, so we will be
able to differentiate commands that require exclusive access vs. parallel
ready-to-run ones.
All "set" commands that used devlink->lock are converted to write lock,
while all "get" commands are marked with read lock.
@@ -578,8 +584,12 @@ static int devlink_nl_pre_doit(const struct genl_ops *ops,
mutex_unlock(&devlink_mutex);
return PTR_ERR(devlink);
}
- if (~ops->internal_flags & DEVLINK_NL_FLAG_NO_LOCK)
- mutex_lock(&devlink->lock);
+
+ if (~ops->internal_flags & DEVLINK_NL_FLAG_SHARED_ACCESS)
+ down_write(&devlink->rwsem);
+ else
+ down_read(&devlink->rwsem);
+
3. Drop devlink_mutex:
commit 3177af9971c4cd95f9633aeb9b0434687da62fd0
Author: Leon Romanovsky <leon@...nel.org>
Date: Sun Oct 31 16:05:40 2021 +0200
devlink: Use xarray locking mechanism instead big devlink lock
The conversion to XArray together with devlink reference counting
allows us reuse the following locking pattern:
xa_lock()
xa_for_each() {
devlink_try_get()
xa_unlock()
....
xa_lock()
}
This pattern gives us a way to run any commands between xa_unlock() and
xa_lock() without big devlink mutex, while making sure that devlink instance
won't be released.
Steps 2 and 3 were not posted due to merge window and my desire to get
mileage in our regression.
>
> > Please, let's not give up on standalone devlink implementation without
> > drivers need to know internal devlink details. It is hard to do but possible.
>
> We may just disagree on this one. Please answer my question above -
> so far IDK how you're going to fix the problem of re-reg'ing subobjects
> from the reload path.
>
> My experience writing drivers is that it was painfully unclear what
> the devlink locking rules are. Sounds like you'd make them even more
> complicated.
How? I have only one rule:
* Driver author should be aware that between devlink_register() and
devlink_unregister(), users can send netlink commands.
In your solution, driver authors will need to follow whole devlink
how-to-use book.
>
> This RFC makes the rules simple - all devlink ops are locked.
>
> For the convenience of the drivers they can also take the instance lock
> whenever they want to prevent ops from being called. Experience with
> rtnl_lock teaches us that this is very useful for drivers.
Strange, I see completely opposite picture in the git log with so much
changes that have Fixes line and add/remove/mention rtnl_lock. Maybe it
is useful, but almost all if not all drivers full of fixes of rtnl_lock
usage. I don't want same for the devlink.
Thanks
Powered by blists - more mailing lists