netdev - Re: [RFC PATCH v1] net: ethernet: nb8800: Reset HW block in ndo

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:   Mon, 31 Jul 2017 16:28:14 +0100
From:   Måns Rullgård <mans@...sr.com>
To:     Mason <slash.tmp@...e.fr>
Cc:     Florian Fainelli <f.fainelli@...il.com>,
        Marc Gonzalez <marc_gonzalez@...madesigns.com>,
        netdev <netdev@...r.kernel.org>,
        Linux ARM <linux-arm-kernel@...ts.infradead.org>
Subject: Re: [RFC PATCH v1] net: ethernet: nb8800: Reset HW block in ndo_open

Mason <slash.tmp@...e.fr> writes:

> On 31/07/2017 16:08, Mason wrote:
>
>> Other things make no sense to me, for example in nb8800_dma_stop()
>> there is a polling loop:
>> 
>> 	do {
>> 		mdelay(100);
>> 		nb8800_writel(priv, NB8800_TX_DESC_ADDR, txb->dma_desc);
>> 		wmb();
>> 		mdelay(100);
>> 		nb8800_writel(priv, NB8800_TXC_CR, txcr | TCR_EN);
>> 
>> 		mdelay(5500);
>> 
>> 		err = readl_poll_timeout_atomic(priv->base + NB8800_RXC_CR,
>> 						rxcr, !(rxcr & RCR_EN),
>> 						1000, 100000);
>> 		printk("err=%d retry=%d\n", err, retry);
>> 	} while (err && --retry);
>> 
>> 
>> (It was me who added the delays.)
>> 
>> *Whatever* delays I insert, it always goes 3 times through the loop.
>> 
>> [   29.654492] ++ETH++ gw32 reg=f002610c val=9ecc8000
>> [   29.759320] ++ETH++ gw32 reg=f0026100 val=005c0aff
>> [   35.364705] err=-110 retry=5
>> [   35.467609] ++ETH++ gw32 reg=f002610c val=9ecc8000
>> [   35.572436] ++ETH++ gw32 reg=f0026100 val=005c0aff
>> [   41.177822] err=-110 retry=4
>> [   41.280726] ++ETH++ gw32 reg=f002610c val=9ecc8000
>> [   41.385553] ++ETH++ gw32 reg=f0026100 val=005c0aff
>> [   46.890907] err=0 retry=3
>> 
>> How is that possible?
>
> First time through the loop, it doesn't matter how long we poll,
> it *always* times out. Second time as well (only on BOARD B).
>
> Third time, it succeeds quickly (first or second poll).
> (This explains why various delays had no impact.)
>
> In fact, requesting the transfer 3 times *before* polling
> makes the polling succeed quickly:
>
> 	nb8800_writel(priv, NB8800_TX_DESC_ADDR, txb->dma_desc);
> 	wmb();
> 	nb8800_writel(priv, NB8800_TXC_CR, txcr | TCR_EN);
>
> [   16.464596] ++ETH++ gw32 reg=f002610c val=9ef28000
> [   16.469414] ++ETH++ gw32 reg=f0026100 val=005c0aff
> [   16.474231] ++ETH++ gw32 reg=f002610c val=9ef28000
> [   16.479048] ++ETH++ gw32 reg=f0026100 val=005c0aff
> [   16.483865] ++ETH++ gw32 reg=f002610c val=9ef28000
> [   16.488682] ++ETH++ gw32 reg=f0026100 val=005c0aff
> [   16.493500] ++ETH++ POLL reg=f0026200 val=06100a8f
> [   16.499317] ++ETH++ POLL reg=f0026200 val=06100a8e
> [   16.504134] err=0 retry=5

That strengthens my theory that the hardware has an internal queue of
three descriptors that are pre-loaded from memory.  Your hardware people
should be able to confirm this.

> With my changes, I get *exactly* the same logs on BOARD A
> and BOARD B (modulo the descriptors addresses).
>
> Yet BOARD A stays functional, but BOARD B is hosed...

What's the difference between board A and board B?

> Depressing. I've run out of ideas.

Get your hardware people involved.  Perhaps they can run some test in a
simulator.

-- 
Måns Rullgård