[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <932ae382-a1fe-c7ad-a9b3-e728ed6c1176@redhat.com>
Date: Wed, 28 Nov 2018 11:12:22 +0100
From: Ivan Vecera <ivecera@...hat.com>
To: David Miller <davem@...emloft.net>, poros@...hat.com
Cc: netdev@...r.kernel.org
Subject: Re: [PATCH net] be2net: Fix NULL pointer dereference in
be_tx_timeout()
On 27. 11. 18 23:51, David Miller wrote:
> From: Petr Oros <poros@...hat.com>
> Date: Thu, 22 Nov 2018 15:24:07 +0100
>
>> @@ -4700,8 +4700,11 @@ int be_update_queues(struct be_adapter *adapter)
>> struct net_device *netdev = adapter->netdev;
>> int status;
>>
>> - if (netif_running(netdev))
>> + if (netif_running(netdev)) {
>> + /* prevent netdev watchdog during tx queue destroy */
>> + netif_carrier_off(netdev);
>> be_close(netdev);
>> + }
>
> This will make userspace networking daemons will think that the link
> went down.
>
> This absolutely should not be a side effect of making a simple
> TX queue configuration change via ethtool.
>
Yes, you're right Dave. But the same thing (netif_carrier_off()) is done by
number of other drivers (igb, tg3, ixgbe...) during .set_channels() callback.
The patch that Petr sent does the practically the same thing like this one:
commit c0dfb90e5b2d41c907de9b624657a6688541837e
Author: John Fastabend <john.r.fastabend@...el.com>
Date: Tue Apr 27 02:13:39 2010 +0000
ixgbe: ixgbe_down needs to stop dev_watchdog
There is a small race between when the tx queues are stopped
and when netif_carrier_off() is called in ixgbe_down. If the
dev_watchdog() timer fires during this time it is possible for
a false tx timeout to occur.
This patch moves the netif_carrier_off() so that it is called before
the tx queues are stopped preventing the dev_watchdog timer from
detecting false tx timeouts. The race is seen occosionally when
FCoE or DCB settings are being configured or changed.
Testing note, running ifconfig up/down will not reproduce this
issue because dev_open/dev_close call dev_deactivate() and then
dev_activate().
where netif_carrier_off() is called prior netif_tx_disable() from ixgbe_down()
to avoid false Tx timeout. And ixgbe_down() is called from ixgbe_set_channels()
this way:
ixgbe_set_channels()->ixgbe_setup_tc()->ixgbe_close()->ixgbe_close_suspend()->ixgbe_down()
As I said the similar approach is used by other drivers as well.
The mlx4 driver resolves this situation differently. It calls
mlx4_en_stop_port() from mlx4_en_set_channels() with parameter 'detach'==1 that
causes that netif_device_detach() is called prior stopping of Tx queues. This
also effectively prevents dev_watchdog() to call .ndo_tx_timeout() callback. An
advantage is netif_device_detach() does not fire linkwatch events.
So... what ways is the _right_ one ?
Thanks,
Ivan
Powered by blists - more mailing lists