lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7310fea8-8f55-a198-5317-a7ad95980beb@vaisala.com>
Date:   Fri, 25 Mar 2022 11:35:44 +0200
From:   Tomas Melin <tomas.melin@...sala.com>
To:     Claudiu.Beznea@...rochip.com, netdev@...r.kernel.org
Cc:     Nicolas.Ferre@...rochip.com, davem@...emloft.net, kuba@...nel.org,
        pabeni@...hat.com
Subject: Re: [PATCH] net: macb: Restart tx only if queue pointer is lagging

Hi,

On 25/03/2022 10:57, Claudiu.Beznea@...rochip.com wrote:
> On 25.03.2022 08:50, Tomas Melin wrote:
>> [Some people who received this message don't often get email from tomas.melin@...sala.com. Learn why this is important at http://aka.ms/LearnAboutSenderIdentification.]
>>
>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>>
>> commit 5ea9c08a8692 ("net: macb: restart tx after tx used bit read")
>> added support for restarting transmission. Restarting tx does not work
>> in case controller asserts TXUBR interrupt and TQBP is already at the end
>> of the tx queue. In that situation, restarting tx will immediately cause
>> assertion of another TXUBR interrupt. The driver will end up in an infinite
>> interrupt loop which it cannot break out of.
>>
>> For cases where TQBP is at the end of the tx queue, instead
>> only clear TXUBR interrupt. As more data gets pushed to the queue,
>> transmission will resume.
>>
>> This issue was observed on a Xilinx Zynq based board. During stress test of
>> the network interface, driver would get stuck on interrupt loop
>> within seconds or minutes causing CPU to stall.
>>
>> Signed-off-by: Tomas Melin <tomas.melin@...sala.com>
>> ---
>>   drivers/net/ethernet/cadence/macb_main.c | 8 ++++++++
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
>> index 800d5ced5800..e475be29845c 100644
>> --- a/drivers/net/ethernet/cadence/macb_main.c
>> +++ b/drivers/net/ethernet/cadence/macb_main.c
>> @@ -1658,6 +1658,7 @@ static void macb_tx_restart(struct macb_queue *queue)
>>          unsigned int head = queue->tx_head;
>>          unsigned int tail = queue->tx_tail;
>>          struct macb *bp = queue->bp;
>> +       unsigned int head_idx, tbqp;
>>
>>          if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
>>                  queue_writel(queue, ISR, MACB_BIT(TXUBR));
>> @@ -1665,6 +1666,13 @@ static void macb_tx_restart(struct macb_queue *queue)
>>          if (head == tail)
>>                  return;
>>
>> +       tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
>> +       tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
>> +       head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, head));
>> +
>> +       if (tbqp == head_idx)
>> +               return;
>> +
> 
> This looks like TBQP is not advancing though there are packets in the
> software queues (head != tail). Packets are added in the software queues on
> TX path and removed when TX was done for them.

TBQP is at the end of the queue, and that matches with tx_head 
maintained by driver. So seems controller is happily at end marker,
and when restarted immediately sees that end marker used tag and 
triggers an interrupt again.

Also when looking at the buffer descriptor memory it shows that all 
frames between tx_tail and tx_head have been marked as used.

GEM documentation says "transmission is restarted from
the first buffer descriptor of the frame being transmitted when the 
transmit start bit is rewritten" but since all frames are already marked
as transmitted, restarting wont help. Adding this additional check will 
help for the issue we have.


> 
> Maybe TX_WRAP is missing on one TX descriptor? Few months ago while
> investigating some other issues on this I found that this might be missed
> on one descriptor [1] but haven't managed to make it break at that point
> anyhow.
> 
> Could you check on your side if this is solving your issue?

I have seen that we can get stuck at any location in the ring buffer, so 
this does not seem to be the case here. I can try though if it would 
have any effect.

thanks,
Tomas


> 
> 	/* Set 'TX_USED' bit in buffer descriptor at tx_head position
> 	 * to set the end of TX queue
> 	 */
> 	i = tx_head;
> 	entry = macb_tx_ring_wrap(bp, i);
> 	ctrl = MACB_BIT(TX_USED);
> +	if (entry == bp->tx_ring_size - 1)
> +		ctrl |= MACB_BIT(TX_WRAP);
> 	desc = macb_tx_desc(queue, entry);
> 	desc->ctrl = ctrl;
> 
> [1]
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Ftree%2Fdrivers%2Fnet%2Fethernet%2Fcadence%2Fmacb_main.c%23n1958&amp;data=04%7C01%7Ctomas.melin%40vaisala.com%7C2fe72e2a6a874b5279a708da0e3d7852%7C6d7393e041f54c2e9b124c2be5da5c57%7C0%7C0%7C637837954434714462%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=hijcfj3TnOxj12dhG0Q8d0AJNFNBJSxtEjOTkCoZThI%3D&amp;reserved=0
> 
>>          macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
>>   }
>>
>> --
>> 2.35.1
>>
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ