[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<PUZP153MB07880E6D692FD5D13C508694CC29A@PUZP153MB0788.APCP153.PROD.OUTLOOK.COM>
Date: Mon, 3 Jul 2023 19:55:06 +0000
From: Souradeep Chakrabarti <schakrabarti@...rosoft.com>
To: Alexander Lobakin <aleksander.lobakin@...el.com>, souradeep chakrabarti
<schakrabarti@...ux.microsoft.com>
CC: KY Srinivasan <kys@...rosoft.com>, Haiyang Zhang <haiyangz@...rosoft.com>,
"wei.liu@...nel.org" <wei.liu@...nel.org>, Dexuan Cui <decui@...rosoft.com>,
"davem@...emloft.net" <davem@...emloft.net>, "edumazet@...gle.com"
<edumazet@...gle.com>, "kuba@...nel.org" <kuba@...nel.org>,
"pabeni@...hat.com" <pabeni@...hat.com>, Long Li <longli@...rosoft.com>, Ajay
Sharma <sharmaajay@...rosoft.com>, "leon@...nel.org" <leon@...nel.org>,
"cai.huoqing@...ux.dev" <cai.huoqing@...ux.dev>,
"ssengar@...ux.microsoft.com" <ssengar@...ux.microsoft.com>,
"vkuznets@...hat.com" <vkuznets@...hat.com>, "tglx@...utronix.de"
<tglx@...utronix.de>, "linux-hyperv@...r.kernel.org"
<linux-hyperv@...r.kernel.org>, "netdev@...r.kernel.org"
<netdev@...r.kernel.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "linux-rdma@...r.kernel.org"
<linux-rdma@...r.kernel.org>, "stable@...r.kernel.org"
<stable@...r.kernel.org>
Subject: RE: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload when
host is unresponsive
>-----Original Message-----
>From: Alexander Lobakin <aleksander.lobakin@...el.com>
>Sent: Monday, July 3, 2023 10:18 PM
>To: souradeep chakrabarti <schakrabarti@...ux.microsoft.com>
>Cc: KY Srinivasan <kys@...rosoft.com>; Haiyang Zhang
><haiyangz@...rosoft.com>; wei.liu@...nel.org; Dexuan Cui
><decui@...rosoft.com>; davem@...emloft.net; edumazet@...gle.com;
>kuba@...nel.org; pabeni@...hat.com; Long Li <longli@...rosoft.com>; Ajay
>Sharma <sharmaajay@...rosoft.com>; leon@...nel.org;
>cai.huoqing@...ux.dev; ssengar@...ux.microsoft.com; vkuznets@...hat.com;
>tglx@...utronix.de; linux-hyperv@...r.kernel.org; netdev@...r.kernel.org;
>linux-kernel@...r.kernel.org; linux-rdma@...r.kernel.org;
>stable@...r.kernel.org; Souradeep Chakrabarti <schakrabarti@...rosoft.com>
>Subject: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload when
>host is unresponsive
>
>From: Souradeep Chakrabarti <schakrabarti@...ux.microsoft.com>
>Date: Mon, 3 Jul 2023 01:49:31 -0700
>
>> From: Souradeep Chakrabarti <schakrabarti@...ux.microsoft.com>
>
>Please sync your Git name and Git mail account settings, so that your own
>patches won't have "From:" when sending. From what I see, you need to
>correct first letters of name and surname to capital in the Git email settings
>block.
Thank you for pointing, I will fix it.
>
>>
>> When unloading the MANA driver, mana_dealloc_queues() waits for the
>> MANA hardware to complete any inflight packets and set the pending
>> send count to zero. But if the hardware has failed,
>> mana_dealloc_queues() could wait forever.
>>
>> Fix this by adding a timeout to the wait. Set the timeout to 120
>> seconds, which is a somewhat arbitrary value that is more than long
>> enough for functional hardware to complete any sends.
>>
>> Signed-off-by: Souradeep Chakrabarti
>> <schakrabarti@...ux.microsoft.com>
>
>Where's "Fixes:" tagging the blamed commit?
This is present from the day zero of the mana driver code.
It has not been introduced in the code by any commit.
>
>> ---
>> V3 -> V4:
>> * Fixed the commit message to describe the context.
>> * Removed the vf_unload_timeout, as it is not required.
>> ---
>> drivers/net/ethernet/microsoft/mana/mana_en.c | 26
>> ++++++++++++++++---
>> 1 file changed, 23 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c
>> b/drivers/net/ethernet/microsoft/mana/mana_en.c
>> index a499e460594b..d26f1da70411 100644
>> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
>> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
>> @@ -2346,7 +2346,10 @@ static int mana_dealloc_queues(struct
>> net_device *ndev) {
>> struct mana_port_context *apc = netdev_priv(ndev);
>> struct gdma_dev *gd = apc->ac->gdma_dev;
>> + unsigned long timeout;
>> struct mana_txq *txq;
>> + struct sk_buff *skb;
>> + struct mana_cq *cq;
>> int i, err;
>>
>> if (apc->port_is_up)
>> @@ -2363,15 +2366,32 @@ static int mana_dealloc_queues(struct
>net_device *ndev)
>> * to false, but it doesn't matter since mana_start_xmit() drops any
>> * new packets due to apc->port_is_up being false.
>> *
>> - * Drain all the in-flight TX packets
>> + * Drain all the in-flight TX packets.
>> + * A timeout of 120 seconds for all the queues is used.
>> + * This will break the while loop when h/w is not responding.
>> + * This value of 120 has been decided here considering max
>> + * number of queues.
>> */
>> +
>> + timeout = jiffies + 120 * HZ;
>
>Why not initialize it right when declaring?
I will fix it in next version.
>
>> for (i = 0; i < apc->num_queues; i++) {
>> txq = &apc->tx_qp[i].txq;
>> -
>> - while (atomic_read(&txq->pending_sends) > 0)
>> + while (atomic_read(&txq->pending_sends) > 0 &&
>> + time_before(jiffies, timeout)) {
>> usleep_range(1000, 2000);> + }
>> }
>
>120 seconds by 2 msec step is 60000 iterations, by 1 msec is 120000
>iterations. I know usleep_range() often is much less precise, but still.
>Do you really need that much time? Has this been measured during the tests
>that it can take up to 120 seconds or is it just some random value that "should
>be enough"?
>If you really need 120 seconds, I'd suggest using a timer / delayed work instead
>of wasting resources.
Here the intent is not waiting for 120 seconds, rather than avoid continue checking the
pending_sends of each tx queues for an indefinite time, before freeing sk_buffs.
The pending_sends can only get decreased for a tx queue, if mana_poll_tx_cq()
gets called for a completion notification and increased by xmit.
In this particular bug, apc->port_is_up is not set to false, causing
xmit to keep increasing the pending_sends for the queue and mana_poll_tx_cq()
not getting called for the queue.
If we see the comment in the function mana_dealloc_queues(), it mentions it :
2346 /* No packet can be transmitted now since apc->port_is_up is false.
2347 * There is still a tiny chance that mana_poll_tx_cq() can re-enable
2348 * a txq because it may not timely see apc->port_is_up being cleared
2349 * to false, but it doesn't matter since mana_start_xmit() drops any
2350 * new packets due to apc->port_is_up being false.
The value 120 seconds has been decided here based on maximum number of queues
are allowed in this specific hardware, it is a safe assumption.
>
>>
>> + for (i = 0; i < apc->num_queues; i++) {
>> + txq = &apc->tx_qp[i].txq;
>> + cq = &apc->tx_qp[i].tx_cq;
>
>cq can be just &txq->tx_cq.
mana_txq structure does not have a pointer to mana_cq.
>
>> + while (atomic_read(&txq->pending_sends)) {
>> + skb = skb_dequeue(&txq->pending_skbs);
>> + mana_unmap_skb(skb, apc);
>> + napi_consume_skb(skb, cq->budget);
>
>(you already have comment about this one)
>
>> + atomic_sub(1, &txq->pending_sends);
>> + }
>> + }
>> /* We're 100% sure the queues can no longer be woken up, because
>> * we're sure now mana_poll_tx_cq() can't be running.
>> */
>
>Thanks,
>Olek
Powered by blists - more mailing lists