netdev - Re: mlxsw and rtnl lock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170826170418.GA22324@shredder>
Date:   Sat, 26 Aug 2017 20:04:18 +0300
From:   Ido Schimmel <idosch@...sch.org>
To:     David Ahern <dsahern@...il.com>
Cc:     Jiri Pirko <jiri@...nulli.us>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        mlxsw@...lanox.com
Subject: Re: mlxsw and rtnl lock

On Fri, Aug 25, 2017 at 01:26:07PM -0700, David Ahern wrote:
> Jiri / Ido:
> 
> I was looking at the mlxsw driver and the places it holds the rtnl lock.
> There are a lot of them and from an admittedly short review it seems
> like the rtnl is protecting changes to mlxsw data structures as opposed
> to calling into the core networking stack. This is going to have huge
> impacts on scalability when both the kernel programming (user changes)
> and the hardware programming require the rtnl.

I'm aware of the problem and I intend to look into it. When we started
working on mlxsw about two years ago all the operations were serialized
by rtnl_lock, so when we had to add processing in a different context,
we ended up taking the same lock to protect against changes. But it can
impact scalability as you mentioned.

> With regards to the FIB notifier, why add the fib events to a work queue
> that is processed asynchronously if processing the work queue requires
> the rtnl lock? What is gained by deferring the work since a major side
> effect of the work queue is the loss of error propagation back to the
> user on the a failure. That is, if the FIB add/replace/append fails in
> the h/w for any reason, offload is silently aborted (an entry in the
> kernel log is still a silent abort).

FIB events are received in an atomic context and therefore must be
deferred to a workqueue. The chain was initially blocking, but this had
to change in commit d3f706f68e2f ("ipv4: fib: Convert FIB notification
chain to be atomic") to support dumping of IPv4 routes under RCU. IPv6
events are always sent in an atomic context.

Regarding the silent abort, that's intentional. You can look at the same
code in v4.9 - when the chain was still blocking - and you'll see that
we didn't propagate the error even then. This was discussed in the past
and the conclusion was that user doesn't expect to operation to fail. If
hardware resources are exceeded, we let the kernel take care of the
forwarding instead.