netdev - Re: Fwd: net: fec: rx descriptor ring out of order

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 11 Nov 2020 23:10:58 +0100
From:   Kegl Rohit <keglrohit@...il.com>
To:     David Laight <David.Laight@...lab.com>
Cc:     Eric Dumazet <eric.dumazet@...il.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Andy Duan <fugang.duan@....com>
Subject: Re: Fwd: net: fec: rx descriptor ring out of order

On Wed, Nov 11, 2020 at 6:52 PM David Laight <David.Laight@...lab.com> wrote:
>
> > On 11/11/20 3:27 PM, Kegl Rohit wrote:
> > > Hello!
> > >
> > > We are using a imx6q platform.
> > > The fec interface is used to receive a continuous stream of custom /
> > > raw ethernet packets. The packet size is fixed ~132 bytes and they get
> > > sent every 250µs.
> > >
> > > While testing I observed spontaneous packet delays from time to time.
> > > After digging down deeper I think that the fec peripheral does not
> > > update the rx descriptor status correctly.
> > > I modified the queue_rx function which is called by the NAPI poll
> > > function. "no packet N" is printed when the queue_rx function doesn't
> > > process any descriptor.
> > > Therefore the variable N counts continuous calls without ready
> > > descriptors. When the current descriptor is ready&processed and moved
> > > to the next entry, then N is cleared again.
> > > Additionally an error is printed if the current descriptor is empty
> > > but the next one is already ready. In case this error happens the
> > > current descriptor and the next 11 ones are dumped.
> > > "C"  ... current
> > > "E"  ... empty
> > >
> > > [   57.436478 <    0.020005>] no packet 1!
> > > [   57.460850 <    0.024372>] no packet 1!
> > > [   57.461107 <    0.000257>] ring error, current empty but next is not empty
> > > [   57.461118 <    0.000011>] RX ahead
> > > [   57.461135 <    0.000017>] 129 C E 0x8840 0x2c743a40  132
> > > [   57.461146 <    0.000011>] 130     0x0840 0x2c744180  132
> > > [   57.461158 <    0.000012>] 131   E 0x8840 0x2c7448c0  132
>
> What are the addresses of the ring entries?
> I bet there is something wrong with the cache coherency and/or
> flushing.

The ring descriptors are allocated via dma_alloc_coherent().  I will
extend the dump with their addresses.
The current output shows the dma_map_single() skb data buffer.
I tried calling  flush_cache_all() before reading the descriptors
status => no change.
Are there any flush options to try?


> So the MAC hardware has done the write but (somewhere) it
> isn't visible to the cpu for ages.
It looks like that. After an error occured i will also read the skb
data (dma_sync_single() before) to check if the new data is already
there.
So I can prove that the status is wrong, because the data could be
already there.


> I've seen a 'fec' ethernet block in a freescale DSP.
> IIRC it is a fairly simple block - won't be doing out-of-order writes.
>
> The imx6q seems to be arm based.
> I'm guessing that means it doesn't do cache coherency for ethernet dma
> accesses.
> That (more or less) means the rings need to be mapped uncached.
> Any attempt to just flush/invalidate the cache lines is doomed.
The descriptors are allocated using dma_alloc_coherent(). So flushes
should not be needed? Synchronizing is done via barriers e.g. wmb()
before resetting the descriptor status.
The skb data itself is mapped using the DMA API.

> ...
> > > I am suspecting the errata:
> > >
> > > ERR005783 ENET: ENET Status FIFO may overflow due to consecutive short frames
> > > Description:
> > > When the MAC receives shorter frames (size 64 bytes) at a rate
> > > exceeding the average line-rate
> > > burst traffic of 400 Mbps the DMA is able to absorb, the receiver
> > > might drop incoming frames
> > > before a Pause frame is issued.
> > > Projected Impact:
> > > No malfunction will result aside from the frame drops.
> > > Workarounds:
> > > The application might want to implement some flow control to ensure
> > > the line-rate burst traffic is
> > > below 400 Mbps if it only uses consecutive small frames with minimal
> > > (96 bit times) or short
> > > Inter-frame gap (IFG) time following large frames at such a high rate.
> > > The limit does not exist for
> > > frames of size larger than 800 bytes.
> > > Proposed Solution:
> > > No fix scheduled
> > > Linux BSP Status:
> > > Workaround possible but not implemented in the BSP, impacting
> > > functionality as described above.
> > >
> > > Is the "ENET Status FIFO" some internal hardware FIFO or is it the
> > > descriptor ring.
> > > What would be the workaround when a "Workaround is possible"?
>
> I don't think that is applicable.
> It looks like it just drops frames under high load.
Hm ok.

> I've no idea what a 'Linux BSP' might be.
> That term is usually used for the (often broken) board support
> for things like Vx(no-longer)Works.
Hm ok.

> > > I could only think of skipping/dropping the descriptor when the
> > > current is still busy but the next one is ready.
> > > But it is not easily possible because the "stuck" descriptor gets
> > > ready after a huge delay.
>
> I bet the descriptor is at the end of a cache line which finally
> gets re-read.

Would have cache_flush_all() solved this problem?