[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMeyCbhG7-dCr4bVWP=kNuwLa6CNB9h=SwN_kK7VbJ7YFCY2Ow@mail.gmail.com>
Date: Thu, 12 Nov 2020 12:56:20 +0100
From: Kegl Rohit <keglrohit@...il.com>
To: David Laight <David.Laight@...lab.com>
Cc: Eric Dumazet <eric.dumazet@...il.com>,
Fabio Estevam <festevam@...il.com>,
netdev <netdev@...r.kernel.org>
Subject: Re: net: fec: rx descriptor ring out of order
On Thu, Nov 12, 2020 at 12:10 PM David Laight <David.Laight@...lab.com> wrote:
>
> From: Eric Dumazet
> > Sent: 12 November 2020 10:42
> >
> > On 11/12/20 7:52 AM, Kegl Rohit wrote:
> > > On Wed, Nov 11, 2020 at 11:18 PM Fabio Estevam <festevam@...il.com> wrote:
> > >>
> > >> On Wed, Nov 11, 2020 at 11:27 AM Kegl Rohit <keglrohit@...il.com> wrote:
> > >>>
> > >>> Hello!
> > >>>
> > >>> We are using a imx6q platform.
> > >>> The fec interface is used to receive a continuous stream of custom /
> > >>> raw ethernet packets. The packet size is fixed ~132 bytes and they get
> > >>> sent every 250µs.
> > >>>
> > >>> While testing I observed spontaneous packet delays from time to time.
> > >>> After digging down deeper I think that the fec peripheral does not
> > >>> update the rx descriptor status correctly.
> > >>
> > >> What is the kernel version that you are using?
> > >
> > > Sadly stuck at 3.10.108.
>
> If you build a newer kernel it should work with your
> existing userspace.
Not so easily possible because there are custom drivers and some
kernel modifications in the mix.
I have a dirty ported system with a 5.4 kernel ready. I will also try it there.
But I am afraid the error will not happen but still exist.
> > > https://github.com/gregkh/linux/blob/v3.10.108/drivers/net/ethernet/freescale/fec_main.c
> > > The rx queue status handling did not change much compared to 5.x. Only
> > > the NAPI handling / clearing IRQs was changed more than once.
> > > I also backported the newer NAPI handling style / clearing irqs not in
> > > the irq handler but in napi_poll() => same issue.
> > > The issue is pretty rare => To reproduce i have to reboot the system
> > > every 3 min. Sometimes after 1~2min on the first, sometimes on the
> > > ~10th reboot it will happen.
> > >
> >
> > Is seems some rmb() & wmb() are missing.
>
> They are unlikely to make any difference since the 'bad'
> rx status persists between calls to the receive function.
Our kernel already has some patches like the wmb() for the rx path and
the rmb() for the tx path applied.
I tried the rmb() at the rx path, because this is not in master
https://github.com/gregkh/linux/blob/master/drivers/net/ethernet/freescale/fec_main.c#L1434.
=> Still the same issue, no change
I extended the debugging:
descriptor index, current, empty, desc.status, desc.buffer (mapped
skb->data), desc.length
[ 137.758009 < 0.000015>] 409 0xa09d5320 C E 0x8840 0x2c6f0780 0
I also reset the desc.length field to 0 after the packet was received
and before the descriptor was set to empty again.
So I could observe that the length is also not set like the status.
Because i know the content and size of my rx packets, i used
dma_sync_single(mapped skb->data) to get the data even if the status
is empty.
Each packet contains a counter, so i verified that the data is already
there and not lost.
Only the descriptor status and length is not updated.
[ 137.757966 < 0.000021>] cnt: 2341 .... counter of
current ("empty") packet; index 409 in example
[ 137.757984 < 0.000018>] nxcnt: 2342 .... counter of next
not empty packet; index 410 in example
=> content is there but status is not. As next step i will also check
if all bytes are correct, not only the two counter bytes.
[ 40.888181 < 0.000344>] --- start test application ---
[ 137.757945 < 96.869764>] ring error, next is ready
[ 137.757966 < 0.000021>] cnt: 2341
[ 137.757984 < 0.000018>] nxcnt: 2342
[ 137.757994 < 0.000010>] RX ahead
[ 137.758009 < 0.000015>] 409 0xa09d5320 C E 0x8840 0x2c6f0780 0
[ 137.758024 < 0.000015>] 410 0xa09d5340 0x0840 0x2c6f0ec0 132
[ 137.758038 < 0.000014>] 411 0xa09d5360 E 0x8840 0x2c6f1600 0
[ 137.758051 < 0.000013>] 412 0xa09d5380 E 0x8840 0x2c6f1d40 0
[ 137.758064 < 0.000013>] 413 0xa09d53a0 E 0x8840 0x2c6f2480 0
[ 137.758076 < 0.000012>] 414 0xa09d53c0 E 0x8840 0x2c6f2bc0 0
[ 137.758089 < 0.000013>] 415 0xa09d53e0 E 0x8840 0x2c6f3300 0
[ 137.758102 < 0.000013>] 416 0xa09d5400 E 0x8840 0x2c6f3a40 0
[ 137.758115 < 0.000013>] 417 0xa09d5420 E 0x8840 0x2c6f4180 0
[ 137.758127 < 0.000012>] 418 0xa09d5440 E 0x8840 0x2c6f48c0 0
[ 137.758140 < 0.000013>] 419 0xa09d5460 E 0x8840 0x2c6f5000 0
[ 137.758152 < 0.000012>] 420 0xa09d5480 E 0x8840 0x2c6f5740 0
[ 137.758165 < 0.000013>] 421 0xa09d54a0 E 0x8840 0x2c6f5e80 0
[ 137.758414 < 0.000025>] ring error, next is ready
[ 137.758426 < 0.000012>] cnt: 2341
[ 137.758439 < 0.000013>] nxcnt: 2342
[ 137.758448 < 0.000009>] RX ahead
[ 137.758485 < 0.000037>] 409 0xa09d5320 C E 0x8840 0x2c6f0780 0
[ 137.758500 < 0.000015>] 410 0xa09d5340 0x0840 0x2c6f0ec0 132
[ 137.758515 < 0.000015>] 411 0xa09d5360 0x0840 0x2c6f1600 132
[ 137.758529 < 0.000014>] 412 0xa09d5380 0x0840 0x2c6f1d40 132
[ 137.758542 < 0.000013>] 413 0xa09d53a0 E 0x8840 0x2c6f2480 0
[ 137.758556 < 0.000014>] 414 0xa09d53c0 E 0x8840 0x2c6f2bc0 0
[ 137.758569 < 0.000013>] 415 0xa09d53e0 E 0x8840 0x2c6f3300 0
[ 137.758582 < 0.000013>] 416 0xa09d5400 E 0x8840 0x2c6f3a40 0
[ 137.758596 < 0.000014>] 417 0xa09d5420 E 0x8840 0x2c6f4180 0
[ 137.758609 < 0.000013>] 418 0xa09d5440 E 0x8840 0x2c6f48c0 0
[ 137.758622 < 0.000013>] 419 0xa09d5460 E 0x8840 0x2c6f5000 0
[ 137.758905 < 0.000031>] ring error, next is ready
[ 137.758917 < 0.000012>] cnt: 2341
[ 137.758930 < 0.000013>] nxcnt: 2342
[ 137.758938 < 0.000008>] RX ahead
[ 137.758951 < 0.000013>] 409 0xa09d5320 C E 0x8840 0x2c6f0780 0
[ 137.758965 < 0.000014>] 410 0xa09d5340 0x0840 0x2c6f0ec0 132
[ 137.758978 < 0.000013>] 411 0xa09d5360 0x0840 0x2c6f1600 132
[ 137.758991 < 0.000013>] 412 0xa09d5380 0x0840 0x2c6f1d40 132
[ 137.759005 < 0.000014>] 413 0xa09d53a0 0x0840 0x2c6f2480 132
[ 137.759018 < 0.000013>] 414 0xa09d53c0 0x0840 0x2c6f2bc0 132
[ 137.759031 < 0.000013>] 415 0xa09d53e0 E 0x8840 0x2c6f3300 0
[ 137.759044 < 0.000013>] 416 0xa09d5400 E 0x8840 0x2c6f3a40 0
[ 137.759057 < 0.000013>] 417 0xa09d5420 E 0x8840 0x2c6f4180 0
[ 137.759071 < 0.000014>] 418 0xa09d5440 E 0x8840 0x2c6f48c0 0
[ 137.759084 < 0.000013>] 419 0xa09d5460 E 0x8840 0x2c6f5000 0
Powered by blists - more mailing lists