lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YYJQZIJPdy3WnQ1S@nanopsycho>
Date:   Wed, 3 Nov 2021 10:03:32 +0100
From:   Jiri Pirko <jiri@...nulli.us>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     leon@...nel.org, idosch@...sch.org, edwin.peer@...adcom.com,
        netdev@...r.kernel.org
Subject: Re: [RFC 0/5] devlink: add an explicit locking API

Sun, Oct 31, 2021 at 01:12:49AM CEST, kuba@...nel.org wrote:
>This implements what I think is the right direction for devlink
>API overhaul. It's an early RFC/PoC because the body of code is
>rather large so I'd appreciate feedback early... The patches are
>very roughly split, the point of the posting is primarily to prove
>that from the driver side the conversion is an net improvement
>in complexity.
>
>IMO the problems with devlink locking are caused by two things:
>
> (1) driver has no way to block devlink calls like it does in case
>     of netedev / rtnl_lock, note that for devlink each driver has
>     it's own lock so contention is not an issue;
>     
> (2) sometimes devlink calls into the driver without holding its lock
>     - for operations which may need the driver to call devlink (port
>     splitting, switch config, reload etc.), the circular dependency
>     is there, the driver can't solve this problem.
>
>This set "fixes" the ordering by allowing the driver to participate
>in locking. The lock order remains:
>
>  device_lock -> [devlink_mutex] -> devlink instance -> rtnl_lock
>
>but now driver can take devlink lock itself, and _ALL_ devlink ops
>are locked.
>
>The expectation is that driver will take the devlink instance lock
>on its probe and remove paths, hence protecting all configuration
>paths with the devlink instance lock.
>
>This is clearly demonstrated by the netdevsim conversion. All entry
>points to driver config are protected by devlink instance lock, so
>the driver switches to the "I'm already holding the devlink lock" API
>when calling devlink. All driver locks and trickery is removed.
>
>The part which is slightly more challanging is quiescing async entry
>points which need to be closed on the devlink reload path (workqueue,
>debugfs etc.) and which also take devlink instance lock. For that we
>need to be able to take refs on the devlink instance and let them
>clean up after themselves rather than waiting synchronously.
>
>That last part is not 100% finished in this patch set - it works but
>we need the driver to take devlink_mutex (the global lock) from its
>probe/remove path. I hope this is good enough for an RFC, the problem
>is easily solved by protecting the devlink XArray with something else
>than devlink_mutex and/or not holding devlink_mutex around each op
>(so that there is no ordering between the global and instance locks).
>Speaking of not holding devlink_mutex around each op this patch set
>also opens the path for parallel ops to different devlink instances
>which is currently impossible because all user space calls take
>devlink_mutex...
>
>The patches are on top of the cleanups I posted earlier.

Hi Jakub.

I took my time to read this thread and talked with Leon as well.
My original intention of locking in devlink was to maintain the locking
inside devlink.c to avoid the rtnl_lock-scenario.

However, I always feared that eventually we'll get to the point,
when it won't be possible to maintain any longer. I think may be it.

In general, I like your approach. It is very clean and explicit. The
driver knows what to do, in which context it is and it can behave
accordingly. In theory or course, but the reality of drivers code tells
us often something different :)

One small thing I don't fully undestand is the "opt-out" scenario which
makes things a bit tangled. But perhaps you can explain it a bit more.

Leon claims that he thinks that he would be able to solve the locking
scheme leaving all locking internal to devlink.c. I suggest to give
him a week or 2 to present the solution. If he is not successful, lets
continue on your approach.

What do you think?

Thanks!

>
>Jakub Kicinski (5):
>  devlink: add unlocked APIs
>  devlink: add API for explicit locking
>  devlink: allow locking of all ops
>  netdevsim: minor code move
>  netdevsim: use devlink locking
>
> drivers/net/netdevsim/bus.c       |  19 -
> drivers/net/netdevsim/dev.c       | 450 ++++++++-------
> drivers/net/netdevsim/fib.c       |  62 +--
> drivers/net/netdevsim/netdevsim.h |   5 -
> include/net/devlink.h             |  88 +++
> net/core/devlink.c                | 875 +++++++++++++++++++++---------
> 6 files changed, 996 insertions(+), 503 deletions(-)
>
>-- 
>2.31.1
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ