lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 25 Mar 2022 08:13:08 +0000 From: <Claudiu.Beznea@...rochip.com> To: <robert.hancock@...ian.com>, <kuba@...nel.org>, <tomas.melin@...sala.com> CC: <Nicolas.Ferre@...rochip.com>, <davem@...emloft.net>, <netdev@...r.kernel.org> Subject: Re: [PATCH v3] net: macb: restart tx after tx used bit read Hi, On 23.03.2022 18:42, Robert Hancock wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe > > On Wed, 2022-03-23 at 08:43 -0700, Jakub Kicinski wrote: >> On Wed, 23 Mar 2022 10:08:20 +0200 Tomas Melin wrote: >>>> From: <Claudiu.Beznea@...rochip.com> >>>> To: <Nicolas.Ferre@...rochip.com>, <davem@...emloft.net> >>>> Cc: <netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>, >>>> <Claudiu.Beznea@...rochip.com> >>>> Subject: [PATCH v3] net: macb: restart tx after tx used bit read >>>> Date: Mon, 17 Dec 2018 10:02:42 +0000 [thread overview] >>>> Message-ID: < >>>> 1545040937-6583-1-git-send-email-claudiu.beznea@...rochip.com> (raw) >>>> >>>> From: Claudiu Beznea <claudiu.beznea@...rochip.com> >>>> >>>> On some platforms (currently detected only on SAMA5D4) TX might stuck >>>> even the pachets are still present in DMA memories and TX start was >>>> issued for them. This happens due to race condition between MACB driver >>>> updating next TX buffer descriptor to be used and IP reading the same >>>> descriptor. In such a case, the "TX USED BIT READ" interrupt is asserted. >>>> GEM/MACB user guide specifies that if a "TX USED BIT READ" interrupt >>>> is asserted TX must be restarted. Restart TX if used bit is read and >>>> packets are present in software TX queue. Packets are removed from >>>> software >>>> TX queue if TX was successful for them (see macb_tx_interrupt()). >>>> >>>> Signed-off-by: Claudiu Beznea <claudiu.beznea@...rochip.com> >>> >>> On Xilinx Zynq the above change can cause infinite interrupt loop leading >>> to CPU stall. Seems timing/load needs to be appropriate for this to happen, >>> and currently >>> with 1G ethernet this can be triggered normally within minutes when running >>> stress tests >>> on the network interface. >>> >>> The events leading up to the interrupt looping are similar as the issue >>> described in the >>> commit message. However in our case, restarting TX does not help at all. >>> Instead >>> the controller is stuck on the queue end descriptor generating endless >>> TX_USED >>> interrupts, never breaking out of interrupt routine. >>> >>> Any chance you remember more details about in which situation restarting TX >>> helped for >>> your use case? was tx_qbar at the end of frame or stopped in middle of >>> frame? I look though my emails for this particular issue, didn't find all that I need with regards to the issue that leads to this fix, but what can I tell from my mind and some emails still in my inbox is that this issue had been reproduced at that time only with a particular we server running on SAMA5D4 and at some point a packet stopped being transmitted although TX_START had been issued for it. In that case the controller fired TX Used bit read interrupt. The GEM datasheet specifies this "Transmit is halted when a buffer descriptor with its used bit set is read, a transmit error occurs, or by writing to the transmit halt bit of the network control register" Also, at that point had a support case open on Cadence and they confirm that having TX restarted is the good way. At the time of investigating the issue I only found it reproducible only on one SoC (SAMA5D4) out of 4 (SAMA5D2, SAMA5D3 and one ARM926 based SoC). All these are probably less faster than ZynqMP. Though this IP is today present also on SAMA7G5 who's CPU can run @1GHz and MAC IP being clocked @200MHz. Even in this last setup I haven't saw the behavior with used bit read being fired too often. By any chance on your setup do you have small packets inserted in MACB queues at high rate? >> >> Which kernel version are you using? Robert has been working on macb + >> Zynq recently, adding him to CC. > > We have been working with ZynqMP and haven't seen such isses in the past, but > I'm not sure we've tried the same type of stress test on those interfaces. If > by Zynq, Tomas means the Zynq-7000 series, that might be a different > version/revision of the IP core than we have as well. > > I haven't looked at the TX ring descriptor and register setup on this core in > that much detail, but the fact the controller gets into this "TX used bit read" > state in the first place seems unusual. I'm wondering if something is being > done in the wrong order or if we are missing a memory barrier etc? That might possible especially on descriptors update path. > > -- > Robert Hancock > Senior Hardware Designer, Calian Advanced Technologies > www.calian.com
Powered by blists - more mailing lists