lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 25 Mar 2022 13:41:50 +0000 From: <Claudiu.Beznea@...rochip.com> To: <tomas.melin@...sala.com>, <netdev@...r.kernel.org> CC: <Nicolas.Ferre@...rochip.com>, <davem@...emloft.net>, <kuba@...nel.org>, <pabeni@...hat.com> Subject: Re: [PATCH] net: macb: Restart tx only if queue pointer is lagging On 25.03.2022 11:35, Tomas Melin wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you know the > content is safe > > Hi, > > On 25/03/2022 10:57, Claudiu.Beznea@...rochip.com wrote: >> On 25.03.2022 08:50, Tomas Melin wrote: >>> [Some people who received this message don't often get email from >>> tomas.melin@...sala.com. Learn why this is important at >>> http://aka.ms/LearnAboutSenderIdentification.] >>> >>> EXTERNAL EMAIL: Do not click links or open attachments unless you know >>> the content is safe >>> >>> commit 5ea9c08a8692 ("net: macb: restart tx after tx used bit read") >>> added support for restarting transmission. Restarting tx does not work >>> in case controller asserts TXUBR interrupt and TQBP is already at the end >>> of the tx queue. In that situation, restarting tx will immediately cause >>> assertion of another TXUBR interrupt. The driver will end up in an infinite >>> interrupt loop which it cannot break out of. >>> >>> For cases where TQBP is at the end of the tx queue, instead >>> only clear TXUBR interrupt. As more data gets pushed to the queue, >>> transmission will resume. >>> >>> This issue was observed on a Xilinx Zynq based board. During stress test of >>> the network interface, driver would get stuck on interrupt loop >>> within seconds or minutes causing CPU to stall. >>> >>> Signed-off-by: Tomas Melin <tomas.melin@...sala.com> >>> --- >>> drivers/net/ethernet/cadence/macb_main.c | 8 ++++++++ >>> 1 file changed, 8 insertions(+) >>> >>> diff --git a/drivers/net/ethernet/cadence/macb_main.c >>> b/drivers/net/ethernet/cadence/macb_main.c >>> index 800d5ced5800..e475be29845c 100644 >>> --- a/drivers/net/ethernet/cadence/macb_main.c >>> +++ b/drivers/net/ethernet/cadence/macb_main.c >>> @@ -1658,6 +1658,7 @@ static void macb_tx_restart(struct macb_queue *queue) >>> unsigned int head = queue->tx_head; >>> unsigned int tail = queue->tx_tail; >>> struct macb *bp = queue->bp; >>> + unsigned int head_idx, tbqp; >>> >>> if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE) >>> queue_writel(queue, ISR, MACB_BIT(TXUBR)); >>> @@ -1665,6 +1666,13 @@ static void macb_tx_restart(struct macb_queue >>> *queue) >>> if (head == tail) >>> return; >>> >>> + tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp); >>> + tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp)); >>> + head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, head)); >>> + >>> + if (tbqp == head_idx) >>> + return; >>> + >> >> This looks like TBQP is not advancing though there are packets in the >> software queues (head != tail). Packets are added in the software queues on >> TX path and removed when TX was done for them. > > TBQP is at the end of the queue, and that matches with tx_head > maintained by driver. So seems controller is happily at end marker, > and when restarted immediately sees that end marker used tag and > triggers an interrupt again. > > Also when looking at the buffer descriptor memory it shows that all > frames between tx_tail and tx_head have been marked as used. I see. Controller sets TX_USED on the 1st descriptor of the transmitted frame. If there were packets with one descriptor enqueued that should mean controller did its job. head != tail on software data structures when receiving TXUBR interrupt and all descriptors in queue have TX_USED bit set might signal that a descriptor is not updated to CPU on TCOMP interrupt when CPU uses it and thus driver doesn't treat a TCOMP interrupt. See the above code on macb_tx_interrupt(): desc = macb_tx_desc(queue, tail); /* Make hw descriptor updates visible to CPU */ rmb(); ctrl = desc->ctrl; /* TX_USED bit is only set by hardware on the very first buffer * descriptor of the transmitted frame. */ if (!(ctrl & MACB_BIT(TX_USED))) break; > > GEM documentation says "transmission is restarted from > the first buffer descriptor of the frame being transmitted when the > transmit start bit is rewritten" but since all frames are already marked > as transmitted, restarting wont help. Adding this additional check will > help for the issue we have. > I see but according to your description (all descriptors treated by controller) if no packets are enqueued for TX after: + if (tbqp == head_idx) + return; + there are some SKBs that were correctly treated by controller but not freed by software (they are freed on macb_tx_unmap() called from macb_tx_interrupt()). They will be freed on next TCOMP interrupt for other packets being transmitted. > >> >> Maybe TX_WRAP is missing on one TX descriptor? Few months ago while >> investigating some other issues on this I found that this might be missed >> on one descriptor [1] but haven't managed to make it break at that point >> anyhow. >> >> Could you check on your side if this is solving your issue? > > I have seen that we can get stuck at any location in the ring buffer, so > this does not seem to be the case here. I can try though if it would > have any effect. I was thinking that having small packets there is high chance that TBQP to not reach a descriptor with wrap bit set due to the code pointed in my previous email. Thank you, Claudiu Beznea > > thanks, > Tomas > > >> >> /* Set 'TX_USED' bit in buffer descriptor at tx_head position >> * to set the end of TX queue >> */ >> i = tx_head; >> entry = macb_tx_ring_wrap(bp, i); >> ctrl = MACB_BIT(TX_USED); >> + if (entry == bp->tx_ring_size - 1) >> + ctrl |= MACB_BIT(TX_WRAP); >> desc = macb_tx_desc(queue, entry); >> desc->ctrl = ctrl; >> >> [1] >> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Ftree%2Fdrivers%2Fnet%2Fethernet%2Fcadence%2Fmacb_main.c%23n1958&data=04%7C01%7Ctomas.melin%40vaisala.com%7C2fe72e2a6a874b5279a708da0e3d7852%7C6d7393e041f54c2e9b124c2be5da5c57%7C0%7C0%7C637837954434714462%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=hijcfj3TnOxj12dhG0Q8d0AJNFNBJSxtEjOTkCoZThI%3D&reserved=0 >> >> >>> macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART)); >>> } >>> >>> -- >>> 2.35.1 >>> >>
Powered by blists - more mailing lists