netdev - RE: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload when host is unresponsive

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <PUZP153MB07880E6D692FD5D13C508694CC29A@PUZP153MB0788.APCP153.PROD.OUTLOOK.COM>
Date: Mon, 3 Jul 2023 19:55:06 +0000
From: Souradeep Chakrabarti <schakrabarti@...rosoft.com>
To: Alexander Lobakin <aleksander.lobakin@...el.com>, souradeep chakrabarti
	<schakrabarti@...ux.microsoft.com>
CC: KY Srinivasan <kys@...rosoft.com>, Haiyang Zhang <haiyangz@...rosoft.com>,
	"wei.liu@...nel.org" <wei.liu@...nel.org>, Dexuan Cui <decui@...rosoft.com>,
	"davem@...emloft.net" <davem@...emloft.net>, "edumazet@...gle.com"
	<edumazet@...gle.com>, "kuba@...nel.org" <kuba@...nel.org>,
	"pabeni@...hat.com" <pabeni@...hat.com>, Long Li <longli@...rosoft.com>, Ajay
 Sharma <sharmaajay@...rosoft.com>, "leon@...nel.org" <leon@...nel.org>,
	"cai.huoqing@...ux.dev" <cai.huoqing@...ux.dev>,
	"ssengar@...ux.microsoft.com" <ssengar@...ux.microsoft.com>,
	"vkuznets@...hat.com" <vkuznets@...hat.com>, "tglx@...utronix.de"
	<tglx@...utronix.de>, "linux-hyperv@...r.kernel.org"
	<linux-hyperv@...r.kernel.org>, "netdev@...r.kernel.org"
	<netdev@...r.kernel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "linux-rdma@...r.kernel.org"
	<linux-rdma@...r.kernel.org>, "stable@...r.kernel.org"
	<stable@...r.kernel.org>
Subject: RE: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload when
 host is unresponsive



>-----Original Message-----
>From: Alexander Lobakin <aleksander.lobakin@...el.com>
>Sent: Monday, July 3, 2023 10:18 PM
>To: souradeep chakrabarti <schakrabarti@...ux.microsoft.com>
>Cc: KY Srinivasan <kys@...rosoft.com>; Haiyang Zhang
><haiyangz@...rosoft.com>; wei.liu@...nel.org; Dexuan Cui
><decui@...rosoft.com>; davem@...emloft.net; edumazet@...gle.com;
>kuba@...nel.org; pabeni@...hat.com; Long Li <longli@...rosoft.com>; Ajay
>Sharma <sharmaajay@...rosoft.com>; leon@...nel.org;
>cai.huoqing@...ux.dev; ssengar@...ux.microsoft.com; vkuznets@...hat.com;
>tglx@...utronix.de; linux-hyperv@...r.kernel.org; netdev@...r.kernel.org;
>linux-kernel@...r.kernel.org; linux-rdma@...r.kernel.org;
>stable@...r.kernel.org; Souradeep Chakrabarti <schakrabarti@...rosoft.com>
>Subject: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload when
>host is unresponsive
>
>From: Souradeep Chakrabarti <schakrabarti@...ux.microsoft.com>
>Date: Mon,  3 Jul 2023 01:49:31 -0700
>
>> From: Souradeep Chakrabarti <schakrabarti@...ux.microsoft.com>
>
>Please sync your Git name and Git mail account settings, so that your own
>patches won't have "From:" when sending. From what I see, you need to
>correct first letters of name and surname to capital in the Git email settings
>block.
Thank you for pointing, I will fix it.
>
>>
>> When unloading the MANA driver, mana_dealloc_queues() waits for the
>> MANA hardware to complete any inflight packets and set the pending
>> send count to zero. But if the hardware has failed,
>> mana_dealloc_queues() could wait forever.
>>
>> Fix this by adding a timeout to the wait. Set the timeout to 120
>> seconds, which is a somewhat arbitrary value that is more than long
>> enough for functional hardware to complete any sends.
>>
>> Signed-off-by: Souradeep Chakrabarti
>> <schakrabarti@...ux.microsoft.com>
>
>Where's "Fixes:" tagging the blamed commit?
This is present from the day zero of the mana driver code.
It has not been introduced in the code by any commit.
>
>> ---
>> V3 -> V4:
>> * Fixed the commit message to describe the context.
>> * Removed the vf_unload_timeout, as it is not required.
>> ---
>>  drivers/net/ethernet/microsoft/mana/mana_en.c | 26
>> ++++++++++++++++---
>>  1 file changed, 23 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c
>> b/drivers/net/ethernet/microsoft/mana/mana_en.c
>> index a499e460594b..d26f1da70411 100644
>> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
>> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
>> @@ -2346,7 +2346,10 @@ static int mana_dealloc_queues(struct
>> net_device *ndev)  {
>>  	struct mana_port_context *apc = netdev_priv(ndev);
>>  	struct gdma_dev *gd = apc->ac->gdma_dev;
>> +	unsigned long timeout;
>>  	struct mana_txq *txq;
>> +	struct sk_buff *skb;
>> +	struct mana_cq *cq;
>>  	int i, err;
>>
>>  	if (apc->port_is_up)
>> @@ -2363,15 +2366,32 @@ static int mana_dealloc_queues(struct
>net_device *ndev)
>>  	 * to false, but it doesn't matter since mana_start_xmit() drops any
>>  	 * new packets due to apc->port_is_up being false.
>>  	 *
>> -	 * Drain all the in-flight TX packets
>> +	 * Drain all the in-flight TX packets.
>> +	 * A timeout of 120 seconds for all the queues is used.
>> +	 * This will break the while loop when h/w is not responding.
>> +	 * This value of 120 has been decided here considering max
>> +	 * number of queues.
>>  	 */
>> +
>> +	timeout = jiffies + 120 * HZ;
>
>Why not initialize it right when declaring?
I will fix it in next version.
>
>>  	for (i = 0; i < apc->num_queues; i++) {
>>  		txq = &apc->tx_qp[i].txq;
>> -
>> -		while (atomic_read(&txq->pending_sends) > 0)
>> +		while (atomic_read(&txq->pending_sends) > 0 &&
>> +		       time_before(jiffies, timeout)) {
>>  			usleep_range(1000, 2000);> +		}
>>  	}
>
>120 seconds by 2 msec step is 60000 iterations, by 1 msec is 120000
>iterations. I know usleep_range() often is much less precise, but still.
>Do you really need that much time? Has this been measured during the tests
>that it can take up to 120 seconds or is it just some random value that "should
>be enough"?
>If you really need 120 seconds, I'd suggest using a timer / delayed work instead
>of wasting resources.
Here the intent is not waiting for 120 seconds, rather than avoid continue checking the 
pending_sends  of each tx queues for an indefinite time, before freeing sk_buffs.
The pending_sends can only get decreased for a tx queue,  if mana_poll_tx_cq()
gets called for a completion notification and increased by xmit.

In this particular bug, apc->port_is_up is not set to false, causing
xmit to keep increasing the pending_sends for the queue and mana_poll_tx_cq()
not getting called for the queue.

If we see the comment in the function mana_dealloc_queues(), it mentions it :

2346     /* No packet can be transmitted now since apc->port_is_up is false.
2347      * There is still a tiny chance that mana_poll_tx_cq() can re-enable
2348      * a txq because it may not timely see apc->port_is_up being cleared
2349      * to false, but it doesn't matter since mana_start_xmit() drops any
2350      * new packets due to apc->port_is_up being false.

The value 120 seconds has been decided here based on maximum number of queues
are allowed in this specific hardware, it is a safe assumption.
>
>>
>> +	for (i = 0; i < apc->num_queues; i++) {
>> +		txq = &apc->tx_qp[i].txq;
>> +		cq = &apc->tx_qp[i].tx_cq;
>
>cq can be just &txq->tx_cq.
mana_txq  structure does not have a pointer to mana_cq.
>
>> +		while (atomic_read(&txq->pending_sends)) {
>> +			skb = skb_dequeue(&txq->pending_skbs);
>> +			mana_unmap_skb(skb, apc);
>> +			napi_consume_skb(skb, cq->budget);
>
>(you already have comment about this one)
>
>> +			atomic_sub(1, &txq->pending_sends);
>> +		}
>> +	}
>>  	/* We're 100% sure the queues can no longer be woken up, because
>>  	 * we're sure now mana_poll_tx_cq() can't be running.
>>  	 */
>
>Thanks,
>Olek