lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b643b825-e68a-875e-f4ac-edddae613705@microchip.com>
Date:   Fri, 25 Mar 2022 13:41:50 +0000
From:   <Claudiu.Beznea@...rochip.com>
To:     <tomas.melin@...sala.com>, <netdev@...r.kernel.org>
CC:     <Nicolas.Ferre@...rochip.com>, <davem@...emloft.net>,
        <kuba@...nel.org>, <pabeni@...hat.com>
Subject: Re: [PATCH] net: macb: Restart tx only if queue pointer is lagging

On 25.03.2022 11:35, Tomas Melin wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the
> content is safe
> 
> Hi,
> 
> On 25/03/2022 10:57, Claudiu.Beznea@...rochip.com wrote:
>> On 25.03.2022 08:50, Tomas Melin wrote:
>>> [Some people who received this message don't often get email from
>>> tomas.melin@...sala.com. Learn why this is important at
>>> http://aka.ms/LearnAboutSenderIdentification.]
>>>
>>> EXTERNAL EMAIL: Do not click links or open attachments unless you know
>>> the content is safe
>>>
>>> commit 5ea9c08a8692 ("net: macb: restart tx after tx used bit read")
>>> added support for restarting transmission. Restarting tx does not work
>>> in case controller asserts TXUBR interrupt and TQBP is already at the end
>>> of the tx queue. In that situation, restarting tx will immediately cause
>>> assertion of another TXUBR interrupt. The driver will end up in an infinite
>>> interrupt loop which it cannot break out of.
>>>
>>> For cases where TQBP is at the end of the tx queue, instead
>>> only clear TXUBR interrupt. As more data gets pushed to the queue,
>>> transmission will resume.
>>>
>>> This issue was observed on a Xilinx Zynq based board. During stress test of
>>> the network interface, driver would get stuck on interrupt loop
>>> within seconds or minutes causing CPU to stall.
>>>
>>> Signed-off-by: Tomas Melin <tomas.melin@...sala.com>
>>> ---
>>>   drivers/net/ethernet/cadence/macb_main.c | 8 ++++++++
>>>   1 file changed, 8 insertions(+)
>>>
>>> diff --git a/drivers/net/ethernet/cadence/macb_main.c
>>> b/drivers/net/ethernet/cadence/macb_main.c
>>> index 800d5ced5800..e475be29845c 100644
>>> --- a/drivers/net/ethernet/cadence/macb_main.c
>>> +++ b/drivers/net/ethernet/cadence/macb_main.c
>>> @@ -1658,6 +1658,7 @@ static void macb_tx_restart(struct macb_queue *queue)
>>>          unsigned int head = queue->tx_head;
>>>          unsigned int tail = queue->tx_tail;
>>>          struct macb *bp = queue->bp;
>>> +       unsigned int head_idx, tbqp;
>>>
>>>          if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
>>>                  queue_writel(queue, ISR, MACB_BIT(TXUBR));
>>> @@ -1665,6 +1666,13 @@ static void macb_tx_restart(struct macb_queue
>>> *queue)
>>>          if (head == tail)
>>>                  return;
>>>
>>> +       tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
>>> +       tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
>>> +       head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, head));
>>> +
>>> +       if (tbqp == head_idx)
>>> +               return;
>>> +
>>
>> This looks like TBQP is not advancing though there are packets in the
>> software queues (head != tail). Packets are added in the software queues on
>> TX path and removed when TX was done for them.
> 
> TBQP is at the end of the queue, and that matches with tx_head
> maintained by driver. So seems controller is happily at end marker,
> and when restarted immediately sees that end marker used tag and
> triggers an interrupt again.
> 
> Also when looking at the buffer descriptor memory it shows that all
> frames between tx_tail and tx_head have been marked as used.

I see. Controller sets TX_USED on the 1st descriptor of the transmitted
frame. If there were packets with one descriptor enqueued that should mean
controller did its job.

head != tail on software data structures when receiving TXUBR interrupt and
all descriptors in queue have TX_USED bit set might signal that  a
descriptor is not updated to CPU on TCOMP interrupt when CPU uses it and
thus driver doesn't treat a TCOMP interrupt. See the above code on
macb_tx_interrupt():

desc = macb_tx_desc(queue, tail);

/* Make hw descriptor updates visible to CPU */
rmb();

ctrl = desc->ctrl;

/* TX_USED bit is only set by hardware on the very first buffer
* descriptor of the transmitted frame.
*/

if (!(ctrl & MACB_BIT(TX_USED)))
	break;


> 
> GEM documentation says "transmission is restarted from
> the first buffer descriptor of the frame being transmitted when the
> transmit start bit is rewritten" but since all frames are already marked
> as transmitted, restarting wont help. Adding this additional check will
> help for the issue we have.
> 

I see but according to your description (all descriptors treated by
controller) if no packets are enqueued for TX after:

+       if (tbqp == head_idx)
+               return;
+

there are some SKBs that were correctly treated by controller but not freed
by software (they are freed on macb_tx_unmap() called from
macb_tx_interrupt()). They will be freed on next TCOMP interrupt for other
packets being transmitted.

> 
>>
>> Maybe TX_WRAP is missing on one TX descriptor? Few months ago while
>> investigating some other issues on this I found that this might be missed
>> on one descriptor [1] but haven't managed to make it break at that point
>> anyhow.
>>
>> Could you check on your side if this is solving your issue?
> 
> I have seen that we can get stuck at any location in the ring buffer, so
> this does not seem to be the case here. I can try though if it would
> have any effect.

I was thinking that having small packets there is high chance that TBQP to
not reach a descriptor with wrap bit set due to the code pointed in my
previous email.

Thank you,
Claudiu Beznea

> 
> thanks,
> Tomas
> 
> 
>>
>>       /* Set 'TX_USED' bit in buffer descriptor at tx_head position
>>        * to set the end of TX queue
>>        */
>>       i = tx_head;
>>       entry = macb_tx_ring_wrap(bp, i);
>>       ctrl = MACB_BIT(TX_USED);
>> +     if (entry == bp->tx_ring_size - 1)
>> +             ctrl |= MACB_BIT(TX_WRAP);
>>       desc = macb_tx_desc(queue, entry);
>>       desc->ctrl = ctrl;
>>
>> [1]
>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Ftree%2Fdrivers%2Fnet%2Fethernet%2Fcadence%2Fmacb_main.c%23n1958&amp;data=04%7C01%7Ctomas.melin%40vaisala.com%7C2fe72e2a6a874b5279a708da0e3d7852%7C6d7393e041f54c2e9b124c2be5da5c57%7C0%7C0%7C637837954434714462%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=hijcfj3TnOxj12dhG0Q8d0AJNFNBJSxtEjOTkCoZThI%3D&amp;reserved=0
>>
>>
>>>          macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
>>>   }
>>>
>>> -- 
>>> 2.35.1
>>>
>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ