[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201204100021.0b026725@kicinski-fedora-pc1c0hjn.DHCP.thefacebook.com>
Date: Fri, 4 Dec 2020 10:00:21 -0800
From: Jakub Kicinski <kuba@...nel.org>
To: Vladimir Oltean <vladimir.oltean@....com>
Cc: "David S . Miller" <davem@...emloft.net>, netdev@...r.kernel.org,
UNGLinuxDriver@...rochip.com,
Alexandre Belloni <alexandre.belloni@...tlin.com>,
Andrew Lunn <andrew@...n.ch>,
Florian Fainelli <f.fainelli@...il.com>,
Vivien Didelot <vivien.didelot@...il.com>,
Horatiu Vultur <horatiu.vultur@...rochip.com>,
"Allan W . Nielsen" <allan.nielsen@...rochip.com>,
Claudiu Manoil <claudiu.manoil@....com>,
Steen Hegelund <steen.hegelund@...rochip.com>
Subject: Re: [PATCH v2 net] net: mscc: ocelot: install MAC addresses in
.ndo_set_rx_mode from process context
On Fri, 4 Dec 2020 19:51:25 +0200 Vladimir Oltean wrote:
> Currently ocelot_set_rx_mode calls ocelot_mact_learn directly, which has
> a very nice ocelot_mact_wait_for_completion at the end. Introduced in
> commit 639c1b2625af ("net: mscc: ocelot: Register poll timeout should be
> wall time not attempts"), this function uses readx_poll_timeout which
> triggers a lot of lockdep warnings and is also dangerous to use from
> atomic context, leading to lockups and panics.
>
> Steen Hegelund added a poll timeout of 100 ms for checking the MAC
> table, a duration which is clearly absurd to poll in atomic context.
> So we need to defer the MAC table access to process context, which we do
> via a dynamically allocated workqueue which contains all there is to
> know about the MAC table operation it has to do.
>
> Fixes: 639c1b2625af ("net: mscc: ocelot: Register poll timeout should be wall time not attempts")
> Signed-off-by: Vladimir Oltean <vladimir.oltean@....com>
> Reviewed-by: Florian Fainelli <f.fainelli@...il.com>
> ---
> Changes in v2:
> - Added Fixes tag (it won't backport that far, but anyway)
> - Using get_device and put_device to avoid racing with unbind
Does get_device really protect you from unbind? I thought it only
protects you from .release being called, IOW freeing struct device
memory..
More usual way of handling this would be allocating your own workqueue
and flushing that wq at the right point.
> drivers/net/ethernet/mscc/ocelot_net.c | 83 +++++++++++++++++++++++++-
> 1 file changed, 80 insertions(+), 3 deletions(-)
This is a little large for a rc7 fix :S
What's the expected poll time? maybe it's not that bad to busy wait?
Clearly nobody noticed the issue in 2 years (you mention lockdep so
not a "real" deadlock) which makes me think the wait can't be that long?
Also for a reference - there are drivers out there with busy poll
timeout of seconds :/
Powered by blists - more mailing lists