lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 12 Nov 2020 08:51:01 +0100
From:   Kegl Rohit <keglrohit@...il.com>
To:     Andy Duan <fugang.duan@....com>
Cc:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [EXT] Fwd: net: fec: rx descriptor ring out of order

On Thu, Nov 12, 2020 at 2:29 AM Andy Duan <fugang.duan@....com> wrote:
>
> From: Kegl Rohit <keglrohit@...il.com> Sent: Wednesday, November 11, 2020 10:27 PM
> > Hello!
> >
> > We are using a imx6q platform.
> > The fec interface is used to receive a continuous stream of custom / raw
> > ethernet packets. The packet size is fixed ~132 bytes and they get sent every
> > 250µs.
> >
> > While testing I observed spontaneous packet delays from time to time.
> > After digging down deeper I think that the fec peripheral does not update the rx
> > descriptor status correctly.
> > I modified the queue_rx function which is called by the NAPI poll function. "no
> > packet N" is printed when the queue_rx function doesn't process any descriptor.
> > Therefore the variable N counts continuous calls without ready descriptors.
> > When the current descriptor is ready&processed and moved to the next entry,
> > then N is cleared again.
> > Additionally an error is printed if the current descriptor is empty but the next
> > one is already ready. In case this error happens the current descriptor and the
> > next 11 ones are dumped.
> > "C"  ... current
> > "E"  ... empty
> >
> > [   57.436478 <    0.020005>] no packet 1!
> > [   57.460850 <    0.024372>] no packet 1!
> > [   57.461107 <    0.000257>] ring error, current empty but next is not
> > empty
> > [   57.461118 <    0.000011>] RX ahead
> > [   57.461135 <    0.000017>] 129 C E 0x8840 0x2c743a40  132
> > [   57.461146 <    0.000011>] 130     0x0840 0x2c744180  132
> > [   57.461158 <    0.000012>] 131   E 0x8840 0x2c7448c0  132
> > [   57.461170 <    0.000012>] 132   E 0x8840 0x2c745000  132
> > [   57.461181 <    0.000011>] 133   E 0x8840 0x2c745740  132
> > [   57.461192 <    0.000011>] 134   E 0x8840 0x2c745e80  132
> > [   57.461204 <    0.000012>] 135   E 0x8880 0x2c7465c0  114
> > [   57.461215 <    0.000011>] 136   E 0x8840 0x2c746d00  132
> > [   57.461227 <    0.000012>] 137   E 0x8840 0x2c747440  132
> > [   57.461239 <    0.000012>] 138   E 0x8840 0x2c748040  132
> > [   57.461250 <    0.000011>] 139   E 0x8840 0x2c748780  132
> > [   57.461262 <    0.000012>] 140   E 0x8840 0x2c748ec0  132
> > [   57.461477 <    0.000008>] no packet 2!
> > [   57.461506 <    0.000029>] ring error, current empty but next is not
> > empty
> > [   57.461537 <    0.000031>] RX ahead
> > [   57.461550 <    0.000013>] 129 C E 0x8840 0x2c743a40  132
> > [   57.461563 <    0.000013>] 130     0x0840 0x2c744180  132
> > [   57.461577 <    0.000014>] 131     0x0840 0x2c7448c0  132
> > [   57.461589 <    0.000012>] 132     0x0840 0x2c745000  132
> > [   57.461601 <    0.000012>] 133   E 0x8840 0x2c745740  132
> > [   57.461613 <    0.000012>] 134   E 0x8840 0x2c745e80  132
> > [   57.461624 <    0.000011>] 135   E 0x8880 0x2c7465c0  114
> > [   57.461635 <    0.000011>] 136   E 0x8840 0x2c746d00  132
> > [   57.461645 <    0.000010>] 137   E 0x8840 0x2c747440  132
> > [   57.461657 <    0.000012>] 138   E 0x8840 0x2c748040  132
> > [   57.461668 <    0.000011>] 139   E 0x8840 0x2c748780  132
> > [   57.461680 <    0.000012>] 140   E 0x8840 0x2c748ec0  132
> > [   57.461894 <    0.000009>] no packet 3!
> > [   57.461926 <    0.000032>] ring error, current empty but next is not
> > empty
> > [   57.461935 <    0.000009>] RX ahead
> > [   57.461947 <    0.000012>] 129 C E 0x8840 0x2c743a40  132
> > [   57.461959 <    0.000012>] 130     0x0840 0x2c744180  132
> > [   57.461970 <    0.000011>] 131     0x0840 0x2c7448c0  132
> > [   57.461982 <    0.000012>] 132     0x0840 0x2c745000  132
> > [   57.461993 <    0.000011>] 133     0x0840 0x2c745740  132
> > [   57.462005 <    0.000012>] 134   E 0x8840 0x2c745e80  132
> > [   57.462017 <    0.000012>] 135   E 0x8880 0x2c7465c0  114
> > [   57.462028 <    0.000011>] 136   E 0x8840 0x2c746d00  132
> > [   57.462039 <    0.000011>] 137   E 0x8840 0x2c747440  132
> > [   57.462051 <    0.000012>] 138   E 0x8840 0x2c748040  132
> > [   57.462062 <    0.000011>] 139   E 0x8840 0x2c748780  132
> > [   57.462075 <    0.000013>] 140   E 0x8840 0x2c748ec0  132
> > [   57.462289 <    0.000009>] no packet 4!
> > [   57.462316 <    0.000027>] ring error, current empty but next is not
> > empty
> > [   57.462326 <    0.000010>] RX ahead
> > [   57.462339 <    0.000013>] 129 C E 0x8840 0x2c743a40  132
> > [   57.462351 <    0.000012>] 130     0x0840 0x2c744180  132
> > [   57.462362 <    0.000011>] 131     0x0840 0x2c7448c0  132
> > [   57.462373 <    0.000011>] 132     0x0840 0x2c745000  132
> > [   57.462384 <    0.000011>] 133     0x0840 0x2c745740  132
> > [   57.462397 <    0.000013>] 134     0x0840 0x2c745e80  132
> > [   57.462408 <    0.000011>] 135     0x0840 0x2c7465c0  132
> > [   57.462421 <    0.000013>] 136   E 0x8840 0x2c746d00  132
> > [   57.462431 <    0.000010>] 137   E 0x8840 0x2c747440  132
> > [   57.462443 <    0.000012>] 138   E 0x8840 0x2c748040  132
> > [   57.462454 <    0.000011>] 139   E 0x8840 0x2c748780  132
> > [   57.462467 <    0.000013>] 140   E 0x8840 0x2c748ec0  132
> > [   57.462697 <    0.000009>] no packet 5!
> > [   57.462730 <    0.000033>] ring error, current empty but next is not
> > empty
> > [   57.462739 <    0.000009>] RX ahead
> > [   57.462752 <    0.000013>] 129 C E 0x8840 0x2c743a40  132
> > [   57.462763 <    0.000011>] 130     0x0840 0x2c744180  132
> > [   57.462775 <    0.000012>] 131     0x0840 0x2c7448c0  132
> > [   57.462787 <    0.000012>] 132     0x0840 0x2c745000  132
> > [   57.462799 <    0.000012>] 133     0x0840 0x2c745740  132
> > [   57.462809 <    0.000010>] 134     0x0840 0x2c745e80  132
> > [   57.462820 <    0.000011>] 135     0x0840 0x2c7465c0  132
> > [   57.462830 <    0.000010>] 136     0x0840 0x2c746d00  132
> > [   57.462842 <    0.000012>] 137     0x0840 0x2c747440  132
> > [   57.462853 <    0.000011>] 138   E 0x8840 0x2c748040  132
> > [   57.462864 <    0.000011>] 139   E 0x8840 0x2c748780  132
> > [   57.462877 <    0.000013>] 140   E 0x8840 0x2c748ec0  132
> > [   57.463093 <    0.000009>] no packet 6!
> > [   57.463120 <    0.000027>] RX ahead
> > [   57.463133 <    0.000013>] 129 C   0x0840 0x2c743a40  132
> > [   57.463144 <    0.000011>] 130     0x0840 0x2c744180  132
> > [   57.463155 <    0.000011>] 131     0x0840 0x2c7448c0  132
> > [   57.463166 <    0.000011>] 132     0x0840 0x2c745000  132
> > [   57.463179 <    0.000013>] 133     0x0840 0x2c745740  132
> > [   57.463190 <    0.000011>] 134     0x0840 0x2c745e80  132
> > [   57.463201 <    0.000011>] 135     0x0840 0x2c7465c0  132
> > [   57.463213 <    0.000012>] 136     0x0840 0x2c746d00  132
> > [   57.463224 <    0.000011>] 137     0x0840 0x2c747440  132
> > [   57.463235 <    0.000011>] 138     0x0840 0x2c748040  132
> > [   57.463245 <    0.000010>] 139   E 0x8840 0x2c748780  132
> > [   57.463256 <    0.000011>] 140   E 0x8840 0x2c748ec0  132
> > [   57.463695 <    0.000244>] rx 12
> >
> > As you can see, the described error is catched and the ring is dumped.
> > 9 descriptors got ready before the current descriptor is ready.
> > After that the current descriptor got ready and 12 packets were processed at
> > once.
> > I could also observe cases where the ring (512 entries) got full before the
> > current descriptor was cleared.
> > And also cases where the current and next descriptor were not ready.
> > [   57.462752 <    0.000013>] 129 C E 0x8840 0x2c743a40  132
> > [   57.462763 <    0.000011>] 130    E 0x0840 0x2c744180  132
> > [   57.462775 <    0.000012>] 131     0x0840 0x2c7448c0  132
> >
> > I am suspecting the errata:
> >
> > ERR005783 ENET: ENET Status FIFO may overflow due to consecutive short
> > frames
> > Description:
> > When the MAC receives shorter frames (size 64 bytes) at a rate exceeding the
> > average line-rate burst traffic of 400 Mbps the DMA is able to absorb, the
> > receiver might drop incoming frames before a Pause frame is issued.
> > Projected Impact:
> > No malfunction will result aside from the frame drops.
> > Workarounds:
> > The application might want to implement some flow control to ensure the
> > line-rate burst traffic is below 400 Mbps if it only uses consecutive small frames
> > with minimal
> > (96 bit times) or short
> > Inter-frame gap (IFG) time following large frames at such a high rate.
> > The limit does not exist for
> > frames of size larger than 800 bytes.
> > Proposed Solution:
> > No fix scheduled
> > Linux BSP Status:
> > Workaround possible but not implemented in the BSP, impacting functionality as
> > described above.
> >
> > Is the "ENET Status FIFO" some internal hardware FIFO or is it the descriptor
> > ring.
> > What would be the workaround when a "Workaround is possible"?
> >
> > I could only think of skipping/dropping the descriptor when the current is still
> > busy but the next one is ready.
> > But it is not easily possible because the "stuck" descriptor gets ready after a
> > huge delay.
> >
> > Is this issue known already? Any suggestions?
> >
>
> We don't see the issue.
>
> Yes, the IP has the errata on i.MX6Q,  so the workaround is to enable HW flow control.
> Keep HW flow control is enabled on your networking connection to avoid FIFO overrun happen.
>
> Regards,
> Andy
> >
> > Thanks in advance

Ok, after rereading the errata I don't think that they are the problem.

ERR004512 ENET: 1 Gb Ethernet MAC (ENET) system limitation.

Here flow control should be the solution.
We are using a 100MBit/s full duplex link and the generated test
stream is only 4MBit/s, so this issue should not apply.


ERR005783 ENET: ENET Status FIFO may overflow due to consecutive short frames.
When the MAC receives shorter frames (size 64 bytes) at a rate
exceeding the average line-rate
burst traffic of 400 Mbps the DMA is able to absorb, the receiver
might drop incoming frames
before a Pause frame is issued.

In this case the hardware flow control workaround will not help and a
flow control has to be done at software protocol level.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ