lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 31 Dec 2009 20:06:18 -0700
From:	"Berck E. Nash" <flyboy@...il.com>
To:	Mike McCormack <mikem@...g3k.org>
CC:	Jarek Poplawski <jarkao2@...il.com>,
	Stephen Hemminger <shemminger@...tta.com>,
	netdev@...r.kernel.org, dhazelton@...er.net, mbreuer@...jas.com
Subject: Re: [PATCH] sky2: Lock transmit queue while disabling device

Well, that didn't fix it.  Oops attached, looks pretty much the same to me.

Mike McCormack wrote:
> Hi Jarek,
> 
> This is based on my analysis of the oops at:
> 
> http://bugzilla.kernel.org/show_bug.cgi?id=14925
> 
> Specifically:
> 
>>>> [ 8673.345873] sky2 eth0: receiver hang detected
>>>> [ 8673.350368] sky2 eth0: disabling interface
>>>> [ 8673.354749] BUG: unable to handle kernel NULL pointer dereference at
>>>> 0000000000000010
>>>> [ 8673.359748] IP: [<ffffffffa00373d3>] sky2_xmit_frame+0x321/0x5d8 
>>>> [sky2]
> 
> netif_device_detach() does not guarantee that all transmits have completed 
> after it returns.
> 
> CPU 1 stack will look like:
> 
>   dev_queue_xmit()
>      HARD_TX_LOCK() -> __netif_tx_lock()
>      ...
>      dev_hard_start_xmit()
>         ops->ndo_start_xmit()  -> sky2_xmit_frame()
>         sky2_xmit_frame() pushing skb to hardware
>           use NULL tx_ring here
> 
> 
> CPU 2 stack will look like:
>            
>   sky2_restart()
>      rtnl_lock()
>      sky2_detach()
>         netif_device_detach()
>         sky2_down()
>           printk("sky2 eth0: disabling interface")
>           ...
>           sky2_free_buffers(sky2);
>             sky2->tx_ring = NULL;
>           ...
> 
> Another way to solve the problem would be to take the transmit lock in 
> netif_device_detach() to make sure that any in progress transmits have
> completed before returning.
> 
> Note that most of these backtraces are using the nvidia binary only 
> module.  This may change the timings and make the sky2 race more likely,
> or be involved in the "tx timeout" condition that triggers a sky2_restart().
> 
> Will test with netif_tx_lock_bh and resubmit.
> 
> thanks,
> 
> Mike
>      
>   
>    
> 
> Jarek Poplawski wrote:
>> Mike McCormack wrote, On 12/31/2009 11:55 AM:
>>
>>> netif_device_detach() does not take the tx_lock, so it's
>>>  possible that a call to sky2_xmit_frame is still in
>>>  progress after netif_device_detach() is complete.
>>>
>>> Take netif_tx_lock() to make sure all transmits have
>>>  stopped while we're disabling the devices and that
>>>  no other CPU is still transmitting a frame after
>>>  we've disabling the device.
>>>
>>> Proposed fix for "sky2 panic under load" reported by Berck E. Nash.
>> Could you give some scenario of the oops/fix?
>> Btw, even if it worked, you should use netif_tx_lock_bh
>> version considering sky2_detach use contexts, I guess.
>>
>> Jarek P.
>>
>>> Signed-off-by: Mike McCormack <mikem@...g3k.org>
>>> ---
>>>  drivers/net/sky2.c |    2 ++
>>>  1 files changed, 2 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
>>> index faa4841..8ae8520 100644
>>> --- a/drivers/net/sky2.c
>>> +++ b/drivers/net/sky2.c
>>> @@ -3176,7 +3176,9 @@ static void sky2_reset(struct sky2_hw *hw)
>>>  static void sky2_detach(struct net_device *dev)
>>>  {
>>>  	if (netif_running(dev)) {
>>> +		netif_tx_lock(dev);
>>>  		netif_device_detach(dev);	/* stop txq */
>>> +		netif_tx_unlock(dev);
>>>  		sky2_down(dev);
>>>  	}
>>>  }
>>
> 


View attachment "sky2crash2.txt" of type "text/plain" (5380 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ