lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211109153335.GH1740502@nvidia.com>
Date:   Tue, 9 Nov 2021 11:33:35 -0400
From:   Jason Gunthorpe <jgg@...dia.com>
To:     Jakub Kicinski <kuba@...nel.org>, Jiri Pirko <jiri@...dia.com>
Cc:     Leon Romanovsky <leon@...nel.org>,
        Ido Schimmel <idosch@...sch.org>,
        Jiri Pirko <jiri@...nulli.us>,
        "David S . Miller" <davem@...emloft.net>,
        linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
        edwin.peer@...adcom.com
Subject: Re: [PATCH net-next] devlink: Require devlink lock during device
 reload

On Tue, Nov 09, 2021 at 07:07:02AM -0800, Jakub Kicinski wrote:
> On Tue, 9 Nov 2021 10:43:58 -0400 Jason Gunthorpe wrote:
> > This becomes all entangled in the aux device stuff we did before.
> 
> So entangled in fact that neither of you is willing to elucidate 
> the exact need ;)

I haven't been paying attention to this thread, Leon just asked me to
elaborate on why roce is in these traces.

What I know is mlx5 testing has shown an alarming number of crashers
and bugs connected to devlink and Leon has been grinding them
away. mlx5 is quite heavily invested in devlink and mlx5 really needs
it to work robustly.

> The main use case for reload for netns is placing a VF in a namespace,
> for a container to use. Is that right? I've not seen use cases
> requiring the PF to be moved, are there any?

Sure, it can be useful. I wouldn't call it essential, but it is there.
 
> > I once sketched out fixing this by removing the need to hold the
> > per_net_rwsem just for list iteration, which in turn avoids holding it
> > over the devlink reload paths. It seemed like a reasonable step toward
> > finer grained locking.
> 
> Seems to me the locking is just a symptom.

My fear is this reload during net ns destruction is devlink uAPI now
and, yes it may be only a symptom, but the root cause may be unfixable
uAPI constraints. Jiri, what do you think?

Speaking generally, I greatly prefer devlink move toward a design where
the aux bus operations that must be connected with devlink reload are
not invoked under any global locks at all. Not per_net_rwsem, rtnl, or
devlink BKLs.

We know holding global locks in open-ended contexts, like driver core
device_register/unregister, is an dangerous pattern.

Concretely, now that we are pushing virtualization use cases into
using devlink as a control plane mlx5 has pod/VM launch issuing
devlink steps. Some of this stuff is necessarily slow, like attaching
a roce aux driver takes a while. Holding a BKL while doing that means
all VM launches are serialized.

IMHO these needs demand a fine grained locking approach, not a BKL
design.

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ