netdev - Re: [PATCH net] be2net: Fix NULL pointer dereference in be_tx

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <932ae382-a1fe-c7ad-a9b3-e728ed6c1176@redhat.com>
Date:   Wed, 28 Nov 2018 11:12:22 +0100
From:   Ivan Vecera <ivecera@...hat.com>
To:     David Miller <davem@...emloft.net>, poros@...hat.com
Cc:     netdev@...r.kernel.org
Subject: Re: [PATCH net] be2net: Fix NULL pointer dereference in
 be_tx_timeout()

On 27. 11. 18 23:51, David Miller wrote:
> From: Petr Oros <poros@...hat.com>
> Date: Thu, 22 Nov 2018 15:24:07 +0100
> 
>> @@ -4700,8 +4700,11 @@ int be_update_queues(struct be_adapter *adapter)
>>   	struct net_device *netdev = adapter->netdev;
>>   	int status;
>>   
>> -	if (netif_running(netdev))
>> +	if (netif_running(netdev)) {
>> +		/* prevent netdev watchdog during tx queue destroy */
>> +		netif_carrier_off(netdev);
>>   		be_close(netdev);
>> +	}
> 
> This will make userspace networking daemons will think that the link
> went down.
> 
> This absolutely should not be a side effect of making a simple
> TX queue configuration change via ethtool.
> 

Yes, you're right Dave. But the same thing (netif_carrier_off()) is done by 
number of other drivers (igb, tg3, ixgbe...) during .set_channels() callback. 
The patch that Petr sent does the practically the same thing like this one:

commit c0dfb90e5b2d41c907de9b624657a6688541837e
Author: John Fastabend <john.r.fastabend@...el.com>
Date:   Tue Apr 27 02:13:39 2010 +0000

     ixgbe: ixgbe_down needs to stop dev_watchdog

     There is a small race between when the tx queues are stopped
     and when netif_carrier_off() is called in ixgbe_down.  If the
     dev_watchdog() timer fires during this time it is possible for
     a false tx timeout to occur.

     This patch moves the netif_carrier_off() so that it is called before
     the tx queues are stopped preventing the dev_watchdog timer from
     detecting false tx timeouts.  The race is seen occosionally when
     FCoE or DCB settings are being configured or changed.

     Testing note, running ifconfig up/down will not reproduce this
     issue because dev_open/dev_close call dev_deactivate() and then
     dev_activate().

where netif_carrier_off() is called prior netif_tx_disable() from ixgbe_down() 
to avoid false Tx timeout. And ixgbe_down() is called from ixgbe_set_channels() 
this way:
ixgbe_set_channels()->ixgbe_setup_tc()->ixgbe_close()->ixgbe_close_suspend()->ixgbe_down()

As I said the similar approach is used by other drivers as well.

The mlx4 driver resolves this situation differently. It calls 
mlx4_en_stop_port() from mlx4_en_set_channels() with parameter 'detach'==1 that 
causes that netif_device_detach() is called prior stopping of Tx queues. This 
also effectively prevents dev_watchdog() to call .ndo_tx_timeout() callback. An 
advantage is netif_device_detach() does not fire linkwatch events.

So... what ways is the _right_ one ?

Thanks,
Ivan