netdev - Re: [RFC PATCH v2 net-next 6/7] net: ethtool: add a mutex protecting RSS contexts

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <61041c56-f7d2-49f8-9fc3-57852a96105a@lunn.ch>
Date:   Wed, 12 Apr 2023 19:15:41 +0200
From:   Andrew Lunn <andrew@...n.ch>
To:     Edward Cree <ecree.xilinx@...il.com>
Cc:     edward.cree@....com, linux-net-drivers@....com,
        davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com,
        edumazet@...gle.com, netdev@...r.kernel.org,
        habetsm.xilinx@...il.com, sudheer.mogilappagari@...el.com
Subject: Re: [RFC PATCH v2 net-next 6/7] net: ethtool: add a mutex protecting
 RSS contexts

On Wed, Apr 12, 2023 at 05:16:11PM +0100, Edward Cree wrote:
> On 11/04/2023 21:40, Andrew Lunn wrote:
> > On Tue, Apr 11, 2023 at 07:26:14PM +0100, edward.cree@....com wrote:
> >> While this is not needed to serialise the ethtool entry points (which
> >>  are all under RTNL), drivers may have cause to asynchronously access
> >>  dev->ethtool->rss_ctx; taking dev->ethtool->rss_lock allows them to
> >>  do this safely without needing to take the RTNL.
> > 
> > What is actually wrong with taking RTNL? KISS is often best,
> > especially for locks.
> 
> The examples I have of driver code that needs to access rss_ctx (in the
>  sfc driver) are deep inside call chains where RTNL may or may not
>  already be held.
> 1) filter insertion.  E.g. ethtool -U is already holding RTNL, but other
>  sources of filters (e.g. aRFS) aren't, and thus taking it if necessary
>  might mean passing a 'bool rtnl_locked' all the way down the chain.
> 2) device reset handling (we have to restore the RSS contexts in the
>  hardware after a reset).  Again resets don't always happen under RTNL,
>  and I don't fully understand the details (EFX_ASSERT_RESET_SERIALISED()).
> So it makes life much simpler if we just have a finer-grained lock we can
>  just take when we need to.
> Also, RTNL is a very big hammer that serialises all kinds of operations
>  across all the netdevs in the system, holding it for any length of time
>  can cause annoying user-visible latency (e.g. iirc sshd accepting a new
>  connection has to wait for it) so I prefer to avoid it if possible.  If
>  anything we want to be breaking up this BKL[1], not making it bigger.

Hi Ed

I have to wonder if your locking model is wrong. When i look at the
next patch, i see the driver is also using this lock. And i generally
find that is wrong.

As a rule of thumb, driver writes don't understand locking. Yes, there
are some that do, but most don't. As core code developers, i find it
good practice to have the locks in the core, and only in the
core. Drivers writers should not need to worry about locking. The API
into the driver will take the locks needed before entering the driver,
and release them on exit.

So i don't really agree with 'it makes life much simpler if we just
have a finer-grained lock'. It makes life more complex having to help
driver writers find the corruption and deadlock bugs in their code
because they got the locking wrong.

Please try to work on the abstraction so that drivers don't need this
lock, just the core.

	Andrew