[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20190520.200922.2277656639346033061.davem@davemloft.net>
Date: Mon, 20 May 2019 20:09:22 -0400 (EDT)
From: David Miller <davem@...emloft.net>
To: Jan.Kloetzke@...h.de
Cc: oneukum@...e.com, jan@...etzke.net, netdev@...r.kernel.org,
linux-usb@...r.kernel.org
Subject: Re: [PATCH v2] usbnet: fix kernel crash after disconnect
From: Kloetzke Jan <Jan.Kloetzke@...h.de>
Date: Thu, 16 May 2019 07:10:30 +0000
> Am Montag, den 06.05.2019, 10:17 +0200 schrieb Oliver Neukum:
>> On So, 2019-05-05 at 00:45 -0700, David Miller wrote:
>> > From: Kloetzke Jan <Jan.Kloetzke@...h.de>
>> > Date: Tue, 30 Apr 2019 14:15:07 +0000
>> >
>> > > @@ -1431,6 +1432,11 @@ netdev_tx_t usbnet_start_xmit (struct sk_buff *skb,
>> > > spin_unlock_irqrestore(&dev->txq.lock, flags);
>> > > goto drop;
>> > > }
>> > > + if (WARN_ON(netif_queue_stopped(net))) {
>> > > + usb_autopm_put_interface_async(dev->intf);
>> > > + spin_unlock_irqrestore(&dev->txq.lock, flags);
>> > > + goto drop;
>> > > + }
>> >
>> > If this is known to happen and is expected, then we should not warn.
>> >
>>
>> yes this is the point. Can ndo_start_xmit() and ndo_stop() race?
>> If not, why does the patch fix the observed issue and what
>> prevents the race? Something is not clear here.
>
> Dave, could you shed some light on Olivers question? If the race can
> happen then we can stick to v1 because the WARN_ON is indeed pointless.
> Otherwise it's not clear why it made the problem go away for us and v2
> may be the better option...
Yes I think they can race. ->ndo_stop() executes and stops the queue,
then we get an RCU grace period so that all parallel executions of
->ndo_start_xmit() complete.
But I wonder, this can probably cause problems because some drivers have
"stop queue and re-check" logic, f.e. in drivers/net/tg3.c we have:
if (unlikely(tg3_tx_avail(tnapi) <= (MAX_SKB_FRAGS + 1))) {
netif_tx_stop_queue(txq);
/* netif_tx_stop_queue() must be done before checking
* checking tx index in tg3_tx_avail() below, because in
* tg3_tx(), we update tx index before checking for
* netif_tx_queue_stopped().
*/
smp_mb();
if (tg3_tx_avail(tnapi) > TG3_TX_WAKEUP_THRESH(tnapi))
netif_tx_wake_queue(txq);
}
which in the racey scenerio would undo ->ndo_stop()'s work which is
completely unexpected.
Hmmm...
Powered by blists - more mailing lists