netdev - Re: FEC on i.MX 7 transmit queue timeout

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <46a27329-36df-1eaf-1321-24db037842fe@nxp.com>
Date:   Fri, 5 May 2017 02:44:26 +0000
From:   Andy Duan <fugang.duan@....com>
To:     Stefan Agner <stefan@...er.ch>
CC:     "festevam@...il.com" <festevam@...il.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "netdev-owner@...r.kernel.org" <netdev-owner@...r.kernel.org>
Subject: Re: FEC on i.MX 7 transmit queue timeout

On 2017年05月05日 10:09, Stefan Agner wrote:
> On 2017-05-04 19:03, Andy Duan wrote:
>> On 2017年05月05日 05:36, Stefan Agner wrote:
>>> On 2017-05-03 20:08, Andy Duan wrote:
>>>> From: Stefan Agner <stefan@...er.ch> Sent: Thursday, May 04, 2017 9:22 AM
>>>>> To: Andy Duan <fugang.duan@....com>
>>>>> Cc: fugang.duan@...escale.com; festevam@...il.com;
>>>>> netdev@...r.kernel.org; netdev-owner@...r.kernel.org
>>>>> Subject: Re: FEC on i.MX 7 transmit queue timeout
>>>>>
>>>>> Hi Andy,
>>>>>
>>>>> On 2017-04-20 19:48, Andy Duan wrote:
>>>>>> On 2017年04月20日 07:15, Stefan Agner wrote:
>>>>>>> I tested again with imx6sx-fec compatible string. I could reproduce
>>>>>>> it on a Colibri with i.MX 7Dual. But not always: It really depends
>>>>>>> whether queue 2 is counting up or not. Just after boot, I check
>>>>>>> /proc/interrupts twice, if queue 2 is counting it will happen!
>>>>>>>
>>>>>>> But if only queue 0 is mostly in use, then it seems to work just fine.
>>>>>> If your case is only running best effort like tcp/udp, you can re-set
>>>>>> the "fsl,num-tx-queues" and "fsl,num-rx-queues" to 1 in board dts file.
>>>>>> Other two queues are for AVB audio/video queues, they have high
>>>>>> priority than queue 0. If running iperf tcp test on the three queues,
>>>>>> then the tcp segment may be out-of-order that cause net watchdog
>>>>> timeout.
>>>>>>> I also tried i.MX 7Dual SabreSD here, and the same thing. I had to
>>>>>>> reboot 3 times, then queue 2 was counting:
>>>>>>>     57:          8     GIC-0 150 Level     30be0000.ethernet
>>>>>>>     58:      20137     GIC-0 151 Level     30be0000.ethernet
>>>>>>>     59:       9269     GIC-0 152 Level     30be0000.ethernet
>>>>>>>
>>>>>>> It took me about 40 minutes on Sabre until it happened, and I had to
>>>>>>> force it using iperf, but then I got the ring dumps:
>>>>>> My board had ran more than 47 hours with nfs rootfs in 4.11.0-rc6, but
>>>>>> not running iperf.
>>>>>> I am testing with iperf.
>>>>> Any update on this issue?
>>>>>
>>>>> When using iperf (server) on the board with Linux 4.11 the issue appears
>>>>> within a few iperf iterations on a Sabre (TO 1.2, Board Rev C, if that matters)...
>>>>>
>>>> I don’t know whether you received my last mail. (maybe failed due to I
>>>> received some rejection mails)
>>> I think I did not... The last email I received was Fri, 21 Apr 2017
>>> 02:48:23 UTC.
>>>
>>>
>>>> If your case is only running best effort like tcp/udp, you can re-set
>>>> the "fsl,num-tx-queues" and "fsl,num-rx-queues" to 1 in board dts
>>>> file.
>>> I did test that, and it seems to work fine with those properties set to
>>> 1.
>> So it can fix your problem after long time test?
> Yes, seems to work fine after more than 2 hours.
>
>>>> Other two queues are for AVB audio/video queues, they have high
>>>> priority than queue 0. If running iperf tcp test on the three queues,
>>>> then the tcp segment may be out-of-order that cause net watchdog
>>>> timeout.
>>> Okay. A single event would be understandable, but it seems to enter some
>>> kind of loop after that (continuously printing "fec 30be0000.ethernet
>>> eth0: TX ring dump ...").
>>>
>>> In a quick test I commented out the fec_dump call, with that it seems to
>>> print only once and continues working afterwards (although, speed starts
>>> to decrease, so something is not good at that point).
>> The test base on above change ? One queue still bring watchdog timeout ?
> No, sorry for the confusion: This was without the fix above. So use
> multiple queues, and disable fec_dump... I was just wondering, because
> disabling the multiple queues seems to me somewhat a workaround for
> now... :-)
>
No, it is not workaround. As i said, quque1 and queue2 are for AVB paths 
have higher priority in transmition.
It bring the trouble for your case. I will submit one patch to fix it 
that best effort go queue0, AVB streaming go
quque1 and queue2.

>
>>>> In fsl kernel tree, there have one patch that only select the queue0
>>>> for best effort like tcp/udp. Pls test again in your board, if no
>>>> problem I will upstream the patch.
>>> That sounds like a reasonable fix.
>>>
>>> IP, no matter whether TCP/UDP, is the most common use case, so IMHO this
>>> should "just work" by default.
>>>
>>> --
>>> Stefan