lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 6 Jun 2017 12:45:41 +0200
From:   Niklas Cassel <niklas.cassel@...s.com>
To:     Pavel Machek <pavel@....cz>
CC:     Alexandre Torgue <alexandre.torgue@...com>,
        Giuseppe Cavallaro <peppe.cavallaro@...com>,
        Joao Pinto <Joao.Pinto@...opsys.com>,
        Lars Persson <larper@...s.com>, netdev <netdev@...r.kernel.org>
Subject: Re: stmmac tx timeout

On 05/23/2017 03:39 PM, Pavel Machek wrote:
> Hi!
> 
>> I'm debugging a transmit queue 0 timeout on stmmac with DWMAC4 (4.10a).
>> I'm using kernel v4.9.23, which is before multi queue support was added.
>> I've cherry-picked
>> 98a29944774a ("net: ethernet: stmmac: remove private tx queue lock")
>> 84c53b4baef8 ("stmmac: fix memory barriers")
>> but I still get tx timeouts with these patches.
>>
>> I've managed to reproduce the problem several times,
>> mainly by transmitting the syslog over HTTP.
> 
> How long does it take till timeout? Umm. And if you go through the
> list... I believe we understood what was wrong with the timeout
> handling and how to fix it...
> 
> You may want to tweak tx coalescing parameters. If you set them
> "right" you should get timeouts every 5 minutes or so. That makes it
> easier to debug. This should do the trick:
> 
> +++ b/drivers/net/ethernet/stmicro/stmmac/common.h
> -#define STMMAC_COAL_TX_TIMER   40000
> +#define STMMAC_COAL_TX_TIMER   1000
> 
> Now that you have driver that crashes early, you might want to do some
> voodoo to stop the crashing. This worked for me:
> 
> @@ -2043,7 +2063,11 @@ static netdev_tx_t stmmac_xmit(struct sk_buff
> *skb, stru\
> ct net_device *dev)
>         } else
> 	                priv->tx_count_frames = 0;
> 
> +       dma_rmb();
> +       dma_wmb();
>         /* To avoid raise condition */
> +       BUG_ON(first->des01.etx.own); /* This BUG_ON seems to be enough.
> +                                          Replacing it with barriers is _not_enough. */
>         priv->hw->desc->set_tx_owner(first);
>  	wmb();
> 
> No, the BUG_ON() does not trigger. Yes, it still fixes the driver for
> me. You may want to verify it has same effect for you.

Hello Pavel,

I am sincerely grateful for you help.

I forward ported your patch to 4.9,
however, I could still get tx timeouts.

Thankfully I finally found the root cause of
my tx timeouts, see the patch I've submitted
here:

http://marc.info/?l=linux-kernel&m=149673393525236


Best regards,
Niklas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ