netdev - RE: [PATCH] net: ftgmac100: Fix missing TX-poll issue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <PS1PR0601MB18498469F0263306A6E5183F9C1A0@PS1PR0601MB1849.apcprd06.prod.outlook.com>
Date:   Fri, 23 Oct 2020 13:08:30 +0000
From:   Dylan Hung <dylan_hung@...eedtech.com>
To:     Andrew Jeffery <andrew@...id.au>,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>
CC:     BMC-SW <BMC-SW@...eedtech.com>,
        linux-aspeed <linux-aspeed@...ts.ozlabs.org>,
        Po-Yu Chuang <ratbert@...aday-tech.com>,
        netdev <netdev@...r.kernel.org>,
        OpenBMC Maillist <openbmc@...ts.ozlabs.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Jakub Kicinski <kuba@...nel.org>,
        David Miller <davem@...emloft.net>
Subject: RE: [PATCH] net: ftgmac100: Fix missing TX-poll issue

> -----Original Message-----
> From: Andrew Jeffery [mailto:andrew@...id.au]
> Sent: Wednesday, October 21, 2020 6:26 AM
> To: Benjamin Herrenschmidt <benh@...nel.crashing.org>; Arnd Bergmann
> <arnd@...db.de>; Dylan Hung <dylan_hung@...eedtech.com>
> Cc: BMC-SW <BMC-SW@...eedtech.com>; linux-aspeed
> <linux-aspeed@...ts.ozlabs.org>; Po-Yu Chuang <ratbert@...aday-tech.com>;
> netdev <netdev@...r.kernel.org>; OpenBMC Maillist
> <openbmc@...ts.ozlabs.org>; Linux Kernel Mailing List
> <linux-kernel@...r.kernel.org>; Jakub Kicinski <kuba@...nel.org>; David
> Miller <davem@...emloft.net>
> Subject: Re: [PATCH] net: ftgmac100: Fix missing TX-poll issue
> 
> 
> 
> On Wed, 21 Oct 2020, at 08:40, Benjamin Herrenschmidt wrote:
> > On Tue, 2020-10-20 at 21:49 +0200, Arnd Bergmann wrote:
> > > On Tue, Oct 20, 2020 at 11:37 AM Dylan Hung
> <dylan_hung@...eedtech.com> wrote:
> > > > > +1 @first is system memory from dma_alloc_coherent(), right?
> > > > >
> > > > > You shouldn't have to do this. Is coherent DMA memory broken on
> > > > > your platform?
> > > >
> > > > It is about the arbitration on the DRAM controller.  There are two
> queues in the dram controller, one is for the CPU access and the other is for
> the HW engines.
> > > > When CPU issues a store command, the dram controller just
> acknowledges cpu's request and pushes the request into the queue.  Then
> CPU triggers the HW MAC engine, the HW engine starts to fetch the DMA
> memory.
> > > > But since the cpu's request may still stay in the queue, the HW engine
> may fetch the wrong data.
> >
> > Actually, I take back what I said earlier, the above seems to imply
> > this is more generic.
> >
> > Dylan, please confirm, does this affect *all* DMA capable devices ? If
> > yes, then it's a really really bad design bug in your chips
> > unfortunately and the proper fix is indeed to make dma_wmb() do a
> > dummy read of some sort (what address though ? would any dummy
> > non-cachable page do ?) to force the data out as *all* drivers will
> > potentially be affected.
> >

The issue was found on our test chip (ast2600 version A0) which is just for testing and won't be mass-produced.  This HW bug has been fixed on ast2600 A1 and later versions.

To verify the HW fix, I run overnight iperf and kvm tests on ast2600A1 without this patch, and get stable result without hanging.
So I think we can discard this patch.

> > I was under the impression that it was a specific timing issue in the
> > vhub and ethernet parts, but if it's more generic then it needs to be
> > fixed globally.
> >
> 
> We see a similar issue in the XDMA engine where it can transfer stale data to
> the host. I think the driver ended up using memcpy_toio() to work around that
> despite using a DMA reserved memory region.