lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 25 Mar 2022 16:41:20 +0200 From: Tomas Melin <tomas.melin@...sala.com> To: Claudiu.Beznea@...rochip.com, netdev@...r.kernel.org Cc: Nicolas.Ferre@...rochip.com, davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com Subject: Re: [PATCH] net: macb: Restart tx only if queue pointer is lagging Hi, On 25/03/2022 15:41, Claudiu.Beznea@...rochip.com wrote: > On 25.03.2022 11:35, Tomas Melin wrote: >> EXTERNAL EMAIL: Do not click links or open attachments unless you know the >> content is safe >> >> Hi, >> >> On 25/03/2022 10:57, Claudiu.Beznea@...rochip.com wrote: >>> On 25.03.2022 08:50, Tomas Melin wrote: >>>> [Some people who received this message don't often get email from >>>> tomas.melin@...sala.com. Learn why this is important at >>>> http://aka.ms/LearnAboutSenderIdentification.] >>>> >>>> EXTERNAL EMAIL: Do not click links or open attachments unless you know >>>> the content is safe >>>> >>>> commit 5ea9c08a8692 ("net: macb: restart tx after tx used bit read") >>>> added support for restarting transmission. Restarting tx does not work >>>> in case controller asserts TXUBR interrupt and TQBP is already at the end >>>> of the tx queue. In that situation, restarting tx will immediately cause >>>> assertion of another TXUBR interrupt. The driver will end up in an infinite >>>> interrupt loop which it cannot break out of. >>>> >>>> For cases where TQBP is at the end of the tx queue, instead >>>> only clear TXUBR interrupt. As more data gets pushed to the queue, >>>> transmission will resume. >>>> >>>> This issue was observed on a Xilinx Zynq based board. During stress test of >>>> the network interface, driver would get stuck on interrupt loop >>>> within seconds or minutes causing CPU to stall. >>>> >>>> Signed-off-by: Tomas Melin <tomas.melin@...sala.com> >>>> --- >>>> drivers/net/ethernet/cadence/macb_main.c | 8 ++++++++ >>>> 1 file changed, 8 insertions(+) >>>> >>>> diff --git a/drivers/net/ethernet/cadence/macb_main.c >>>> b/drivers/net/ethernet/cadence/macb_main.c >>>> index 800d5ced5800..e475be29845c 100644 >>>> --- a/drivers/net/ethernet/cadence/macb_main.c >>>> +++ b/drivers/net/ethernet/cadence/macb_main.c >>>> @@ -1658,6 +1658,7 @@ static void macb_tx_restart(struct macb_queue *queue) >>>> unsigned int head = queue->tx_head; >>>> unsigned int tail = queue->tx_tail; >>>> struct macb *bp = queue->bp; >>>> + unsigned int head_idx, tbqp; >>>> >>>> if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE) >>>> queue_writel(queue, ISR, MACB_BIT(TXUBR)); >>>> @@ -1665,6 +1666,13 @@ static void macb_tx_restart(struct macb_queue >>>> *queue) >>>> if (head == tail) >>>> return; >>>> >>>> + tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp); >>>> + tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp)); >>>> + head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, head)); >>>> + >>>> + if (tbqp == head_idx) >>>> + return; >>>> + >>> >>> This looks like TBQP is not advancing though there are packets in the >>> software queues (head != tail). Packets are added in the software queues on >>> TX path and removed when TX was done for them. >> >> TBQP is at the end of the queue, and that matches with tx_head >> maintained by driver. So seems controller is happily at end marker, >> and when restarted immediately sees that end marker used tag and >> triggers an interrupt again. >> >> Also when looking at the buffer descriptor memory it shows that all >> frames between tx_tail and tx_head have been marked as used. > > I see. Controller sets TX_USED on the 1st descriptor of the transmitted > frame. If there were packets with one descriptor enqueued that should mean > controller did its job > > head != tail on software data structures when receiving TXUBR interrupt and > all descriptors in queue have TX_USED bit set might signal that a > descriptor is not updated to CPU on TCOMP interrupt when CPU uses it and > thus driver doesn't treat a TCOMP interrupt. See the above code on Both TX_USED and last buffer (bit 15) indicator looks ok from memory, so controller seems to be up to date. If we were to get a TCOMP interrupt things would be rolling again. Ofcourse this is speculation, but perhaps there could also be some corner cases where the controller fails to generate TCOMP as expected? > macb_tx_interrupt(): > > desc = macb_tx_desc(queue, tail); > > /* Make hw descriptor updates visible to CPU */ > rmb(); > > ctrl = desc->ctrl; > > /* TX_USED bit is only set by hardware on the very first buffer > * descriptor of the transmitted frame. > */ > > if (!(ctrl & MACB_BIT(TX_USED))) > break; > > >> >> GEM documentation says "transmission is restarted from >> the first buffer descriptor of the frame being transmitted when the >> transmit start bit is rewritten" but since all frames are already marked >> as transmitted, restarting wont help. Adding this additional check will >> help for the issue we have. >> > > I see but according to your description (all descriptors treated by > controller) if no packets are enqueued for TX after: > > + if (tbqp == head_idx) > + return; > + > > there are some SKBs that were correctly treated by controller but not freed > by software (they are freed on macb_tx_unmap() called from > macb_tx_interrupt()). They will be freed on next TCOMP interrupt for other > packets being transmitted. Yes, that is idea. We cannot restart since it triggers new irq, but instead need to break out. When more data arrives, the controller continues operation again. > >> >>> >>> Maybe TX_WRAP is missing on one TX descriptor? Few months ago while >>> investigating some other issues on this I found that this might be missed >>> on one descriptor [1] but haven't managed to make it break at that point >>> anyhow. >>> >>> Could you check on your side if this is solving your issue? >> >> I have seen that we can get stuck at any location in the ring buffer, so >> this does not seem to be the case here. I can try though if it would >> have any effect. > > I was thinking that having small packets there is high chance that TBQP to > not reach a descriptor with wrap bit set due to the code pointed in my > previous email. I tested with the additions suggested below, but with no change. Thanks, Tomas > > Thank you, > Claudiu Beznea > >> >> thanks, >> Tomas >> >> >>> >>> /* Set 'TX_USED' bit in buffer descriptor at tx_head position >>> * to set the end of TX queue >>> */ >>> i = tx_head; >>> entry = macb_tx_ring_wrap(bp, i); >>> ctrl = MACB_BIT(TX_USED); >>> + if (entry == bp->tx_ring_size - 1) >>> + ctrl |= MACB_BIT(TX_WRAP); >>> desc = macb_tx_desc(queue, entry); >>> desc->ctrl = ctrl; >>> >>> [1] >>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Ftree%2Fdrivers%2Fnet%2Fethernet%2Fcadence%2Fmacb_main.c%23n1958&data=04%7C01%7Ctomas.melin%40vaisala.com%7C4d5ef5a2a27247697b0b08da0e6539ff%7C6d7393e041f54c2e9b124c2be5da5c57%7C0%7C0%7C637838125183624670%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=v%2F%2FdE1Y8BxHWsmtn3nX70OFN5oCb%2Bzlb881UXuYtoMs%3D&reserved=0 >>> >>> >>>> macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART)); >>>> } >>>> >>>> -- >>>> 2.35.1 >>>> >>> >
Powered by blists - more mailing lists