netdev - RE: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload when host is unresponsive

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <PH7PR21MB31166C1FB6FEB3886EDA8103CA2EA@PH7PR21MB3116.namprd21.prod.outlook.com>
Date: Tue, 4 Jul 2023 13:42:24 +0000
From: Haiyang Zhang <haiyangz@...rosoft.com>
To: Souradeep Chakrabarti <schakrabarti@...rosoft.com>, Alexander Lobakin
	<aleksander.lobakin@...el.com>, souradeep chakrabarti
	<schakrabarti@...ux.microsoft.com>
CC: KY Srinivasan <kys@...rosoft.com>, "wei.liu@...nel.org"
	<wei.liu@...nel.org>, Dexuan Cui <decui@...rosoft.com>, "davem@...emloft.net"
	<davem@...emloft.net>, "edumazet@...gle.com" <edumazet@...gle.com>,
	"kuba@...nel.org" <kuba@...nel.org>, "pabeni@...hat.com" <pabeni@...hat.com>,
	Long Li <longli@...rosoft.com>, Ajay Sharma <sharmaajay@...rosoft.com>,
	"leon@...nel.org" <leon@...nel.org>, "cai.huoqing@...ux.dev"
	<cai.huoqing@...ux.dev>, "ssengar@...ux.microsoft.com"
	<ssengar@...ux.microsoft.com>, "vkuznets@...hat.com" <vkuznets@...hat.com>,
	"tglx@...utronix.de" <tglx@...utronix.de>, "linux-hyperv@...r.kernel.org"
	<linux-hyperv@...r.kernel.org>, "netdev@...r.kernel.org"
	<netdev@...r.kernel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "linux-rdma@...r.kernel.org"
	<linux-rdma@...r.kernel.org>, "stable@...r.kernel.org"
	<stable@...r.kernel.org>
Subject: RE: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload when
 host is unresponsive



> -----Original Message-----
> From: Souradeep Chakrabarti <schakrabarti@...rosoft.com>
> Sent: Monday, July 3, 2023 3:55 PM
> To: Alexander Lobakin <aleksander.lobakin@...el.com>; souradeep chakrabarti
> <schakrabarti@...ux.microsoft.com>
> Cc: KY Srinivasan <kys@...rosoft.com>; Haiyang Zhang
> <haiyangz@...rosoft.com>; wei.liu@...nel.org; Dexuan Cui
> <decui@...rosoft.com>; davem@...emloft.net; edumazet@...gle.com;
> kuba@...nel.org; pabeni@...hat.com; Long Li <longli@...rosoft.com>; Ajay
> Sharma <sharmaajay@...rosoft.com>; leon@...nel.org;
> cai.huoqing@...ux.dev; ssengar@...ux.microsoft.com; vkuznets@...hat.com;
> tglx@...utronix.de; linux-hyperv@...r.kernel.org; netdev@...r.kernel.org;
> linux-kernel@...r.kernel.org; linux-rdma@...r.kernel.org;
> stable@...r.kernel.org
> Subject: RE: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload
> when host is unresponsive
> 
> 
> 
> >-----Original Message-----
> >From: Alexander Lobakin <aleksander.lobakin@...el.com>
> >Sent: Monday, July 3, 2023 10:18 PM
> >To: souradeep chakrabarti <schakrabarti@...ux.microsoft.com>
> >Cc: KY Srinivasan <kys@...rosoft.com>; Haiyang Zhang
> ><haiyangz@...rosoft.com>; wei.liu@...nel.org; Dexuan Cui
> ><decui@...rosoft.com>; davem@...emloft.net; edumazet@...gle.com;
> >kuba@...nel.org; pabeni@...hat.com; Long Li <longli@...rosoft.com>; Ajay
> >Sharma <sharmaajay@...rosoft.com>; leon@...nel.org;
> >cai.huoqing@...ux.dev; ssengar@...ux.microsoft.com; vkuznets@...hat.com;
> >tglx@...utronix.de; linux-hyperv@...r.kernel.org; netdev@...r.kernel.org;
> >linux-kernel@...r.kernel.org; linux-rdma@...r.kernel.org;
> >stable@...r.kernel.org; Souradeep Chakrabarti
> <schakrabarti@...rosoft.com>
> >Subject: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload when
> >host is unresponsive
> >
> >From: Souradeep Chakrabarti <schakrabarti@...ux.microsoft.com>
> >Date: Mon,  3 Jul 2023 01:49:31 -0700
> >
> >> From: Souradeep Chakrabarti <schakrabarti@...ux.microsoft.com>
> >
> >Please sync your Git name and Git mail account settings, so that your own
> >patches won't have "From:" when sending. From what I see, you need to
> >correct first letters of name and surname to capital in the Git email settings
> >block.
> Thank you for pointing, I will fix it.
> >
> >>
> >> When unloading the MANA driver, mana_dealloc_queues() waits for the
> >> MANA hardware to complete any inflight packets and set the pending
> >> send count to zero. But if the hardware has failed,
> >> mana_dealloc_queues() could wait forever.
> >>
> >> Fix this by adding a timeout to the wait. Set the timeout to 120
> >> seconds, which is a somewhat arbitrary value that is more than long
> >> enough for functional hardware to complete any sends.
> >>
> >> Signed-off-by: Souradeep Chakrabarti
> >> <schakrabarti@...ux.microsoft.com>
> >
> >Where's "Fixes:" tagging the blamed commit?
> This is present from the day zero of the mana driver code.
> It has not been introduced in the code by any commit.
> >
> >> ---
> >> V3 -> V4:
> >> * Fixed the commit message to describe the context.
> >> * Removed the vf_unload_timeout, as it is not required.
> >> ---
> >>  drivers/net/ethernet/microsoft/mana/mana_en.c | 26
> >> ++++++++++++++++---
> >>  1 file changed, 23 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c
> >> b/drivers/net/ethernet/microsoft/mana/mana_en.c
> >> index a499e460594b..d26f1da70411 100644
> >> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> >> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> >> @@ -2346,7 +2346,10 @@ static int mana_dealloc_queues(struct
> >> net_device *ndev)  {
> >>  	struct mana_port_context *apc = netdev_priv(ndev);
> >>  	struct gdma_dev *gd = apc->ac->gdma_dev;
> >> +	unsigned long timeout;
> >>  	struct mana_txq *txq;
> >> +	struct sk_buff *skb;
> >> +	struct mana_cq *cq;
> >>  	int i, err;
> >>
> >>  	if (apc->port_is_up)
> >> @@ -2363,15 +2366,32 @@ static int mana_dealloc_queues(struct
> >net_device *ndev)
> >>  	 * to false, but it doesn't matter since mana_start_xmit() drops any
> >>  	 * new packets due to apc->port_is_up being false.
> >>  	 *
> >> -	 * Drain all the in-flight TX packets
> >> +	 * Drain all the in-flight TX packets.
> >> +	 * A timeout of 120 seconds for all the queues is used.
> >> +	 * This will break the while loop when h/w is not responding.
> >> +	 * This value of 120 has been decided here considering max
> >> +	 * number of queues.
> >>  	 */
> >> +
> >> +	timeout = jiffies + 120 * HZ;
> >
> >Why not initialize it right when declaring?
> I will fix it in next version.
> >
> >>  	for (i = 0; i < apc->num_queues; i++) {
> >>  		txq = &apc->tx_qp[i].txq;
> >> -
> >> -		while (atomic_read(&txq->pending_sends) > 0)
> >> +		while (atomic_read(&txq->pending_sends) > 0 &&
> >> +		       time_before(jiffies, timeout)) {
> >>  			usleep_range(1000, 2000);> +		}
> >>  	}
> >
> >120 seconds by 2 msec step is 60000 iterations, by 1 msec is 120000
> >iterations. I know usleep_range() often is much less precise, but still.
> >Do you really need that much time? Has this been measured during the tests
> >that it can take up to 120 seconds or is it just some random value that "should
> >be enough"?
> >If you really need 120 seconds, I'd suggest using a timer / delayed work
> instead
> >of wasting resources.
> Here the intent is not waiting for 120 seconds, rather than avoid continue
> checking the
> pending_sends  of each tx queues for an indefinite time, before freeing
> sk_buffs.
> The pending_sends can only get decreased for a tx queue,  if mana_poll_tx_cq()
> gets called for a completion notification and increased by xmit.
> 
> In this particular bug, apc->port_is_up is not set to false, causing
> xmit to keep increasing the pending_sends for the queue and
> mana_poll_tx_cq()
> not getting called for the queue.
> 
> If we see the comment in the function mana_dealloc_queues(), it mentions it :
> 
> 2346     /* No packet can be transmitted now since apc->port_is_up is false.
> 2347      * There is still a tiny chance that mana_poll_tx_cq() can re-enable
> 2348      * a txq because it may not timely see apc->port_is_up being cleared
> 2349      * to false, but it doesn't matter since mana_start_xmit() drops any
> 2350      * new packets due to apc->port_is_up being false.
> 
> The value 120 seconds has been decided here based on maximum number of
> queues
> are allowed in this specific hardware, it is a safe assumption.

I agree. Also, this waiting time is usually much shorter than 120 sec. The long 
wait only happens in rare and unexpected NIC HW non-responding cases. To 
further reduce the resource consumption, we can double the usleep_range() 
time in every iteration. So, the number of iterations will be greatly reduced 
before reaching 120 sec.

Thanks,
- Haiyang