netdev - Re: [RFC PATCH v1] net: ethernet: nb8800: Reset HW block in ndo

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <31792688-7720-1d1e-4fd4-a0f96ac8af59@free.fr>
Date:   Mon, 31 Jul 2017 16:08:31 +0200
From:   Mason <slash.tmp@...e.fr>
To:     Mans Rullgard <mans@...sr.com>
Cc:     Florian Fainelli <f.fainelli@...il.com>,
        Marc Gonzalez <marc_gonzalez@...madesigns.com>,
        netdev <netdev@...r.kernel.org>,
        Linux ARM <linux-arm-kernel@...ts.infradead.org>
Subject: Re: [RFC PATCH v1] net: ethernet: nb8800: Reset HW block in ndo_open

On 31/07/2017 13:59, Måns Rullgård wrote:

> Mason writes:
> 
>> On 29/07/2017 17:18, Florian Fainelli wrote:
>>
>>> On 07/29/2017 05:02 AM, Mason wrote:
>>>
>>>> I have identified a 100% reproducible flaw.
>>>> I have proposed a work-around that brings this down to 0
>>>> (tested 1000 cycles of link up / ping / link down).
>>>
>>> Can you also try to get help from your HW resources to eventually help
>>> you find out what is going on here?
>>
>> The patch I proposed /is/ based on the feedback from the HW team :-(
>> "Just reset the HW block, and everything will work as expected."
> 
> Nobody is saying a reset won't recover the lockup.  The problem is that
> we don't know what caused it to lock up in the first place.  How do we
> know it can't happen during normal operation?  If we knew the cause, it
> might also be possible to avoid the situation entirely.

How does one prove that something "can't happen during normal operation"?

The "put adapter in loop-back mode so we can send ourselves fake packets"
shenanigans seems completely insane, if you ask me.

Other things make no sense to me, for example in nb8800_dma_stop()
there is a polling loop:

	do {
		mdelay(100);
		nb8800_writel(priv, NB8800_TX_DESC_ADDR, txb->dma_desc);
		wmb();
		mdelay(100);
		nb8800_writel(priv, NB8800_TXC_CR, txcr | TCR_EN);

		mdelay(5500);

		err = readl_poll_timeout_atomic(priv->base + NB8800_RXC_CR,
						rxcr, !(rxcr & RCR_EN),
						1000, 100000);
		printk("err=%d retry=%d\n", err, retry);
	} while (err && --retry);


(It was me who added the delays.)

*Whatever* delays I insert, it always goes 3 times through the loop.

[   29.654492] ++ETH++ gw32 reg=f002610c val=9ecc8000
[   29.759320] ++ETH++ gw32 reg=f0026100 val=005c0aff
[   35.364705] err=-110 retry=5
[   35.467609] ++ETH++ gw32 reg=f002610c val=9ecc8000
[   35.572436] ++ETH++ gw32 reg=f0026100 val=005c0aff
[   41.177822] err=-110 retry=4
[   41.280726] ++ETH++ gw32 reg=f002610c val=9ecc8000
[   41.385553] ++ETH++ gw32 reg=f0026100 val=005c0aff
[   46.890907] err=0 retry=3

How is that possible?

I've tried using spinlocks and delays to get parallel execution
down to a minimum, and have the same logs on both boards.

Regards.