lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 25 Mar 2022 11:33:15 +0200
From:   Tomas Melin <tomas.melin@...sala.com>
To:     Claudiu.Beznea@...rochip.com, robert.hancock@...ian.com,
        kuba@...nel.org
Cc:     Nicolas.Ferre@...rochip.com, davem@...emloft.net,
        netdev@...r.kernel.org
Subject: Re: [PATCH v3] net: macb: restart tx after tx used bit read

Hi,

On 25/03/2022 10:13, Claudiu.Beznea@...rochip.com wrote:
> Hi,
> 

>>>>
>>>> Any chance you remember more details about in which situation restarting TX
>>>> helped for
>>>> your use case? was tx_qbar at the end of frame or stopped in middle of
>>>> frame?
> 
> I look though my emails for this particular issue, didn't find all that I
> need with regards to the issue that leads to this fix, but what can I tell
> from my mind and some emails still in my inbox is that this issue had been
> reproduced at that time only with a particular we server running on SAMA5D4
> and at some point a packet stopped being transmitted although TX_START had
> been issued for it. In that case the controller fired TX Used bit read
> interrupt.
> 
> The GEM datasheet specifies this "Transmit is halted when a buffer
> descriptor with its used bit set is read, a transmit error occurs, or by
> writing to the transmit halt bit of the network control register"
> 
> Also, at that point had a support case open on Cadence and they confirm
> that having TX restarted is the good way.
> 
> At the time of investigating the issue I only found it reproducible only on
> one SoC (SAMA5D4) out of 4 (SAMA5D2, SAMA5D3 and one ARM926 based SoC). All
> these are probably less faster than ZynqMP.
> 
> Though this IP is today present also on SAMA7G5 who's CPU can run @1GHz and
> MAC IP being clocked @200MHz. Even in this last setup I haven't saw the
> behavior with used bit read being fired too often.
> 
> By any chance on your setup do you have small packets inserted in MACB
> queues at high rate?

Thanks for elaborating on your case.

Frames are two descriptors long (header + one) and packets ~1450 bytes.
with iperf it's relatively easy to trigger this situation, but this is 
also seen with "normal" traffic. It just take longer to trigger.


thanks,
Tomas


> 
>>>
>>> Which kernel version are you using? Robert has been working on macb +
>>> Zynq recently, adding him to CC.
>>
>> We have been working with ZynqMP and haven't seen such isses in the past, but
>> I'm not sure we've tried the same type of stress test on those interfaces. If
>> by Zynq, Tomas means the Zynq-7000 series, that might be a different
>> version/revision of the IP core than we have as well.
>>
>> I haven't looked at the TX ring descriptor and register setup on this core in
>> that much detail, but the fact the controller gets into this "TX used bit read"
>> state in the first place seems unusual. I'm wondering if something is being
>> done in the wrong order or if we are missing a memory barrier etc?
> 
> That might possible especially on descriptors update path.
> 
>>
>> --
>> Robert Hancock
>> Senior Hardware Designer, Calian Advanced Technologies
>> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.calian.com%2F&amp;data=04%7C01%7Ctomas.melin%40vaisala.com%7C1aaf00e981f14742b92108da0e3754c1%7C6d7393e041f54c2e9b124c2be5da5c57%7C0%7C0%7C637837928068621708%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=230PlqUdZGct3lBAd39A0Zm8hxIgh5jovRRrQJ%2BXVyc%3D&amp;reserved=0
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ