[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<PH7PR21MB31166C1FB6FEB3886EDA8103CA2EA@PH7PR21MB3116.namprd21.prod.outlook.com>
Date: Tue, 4 Jul 2023 13:42:24 +0000
From: Haiyang Zhang <haiyangz@...rosoft.com>
To: Souradeep Chakrabarti <schakrabarti@...rosoft.com>, Alexander Lobakin
<aleksander.lobakin@...el.com>, souradeep chakrabarti
<schakrabarti@...ux.microsoft.com>
CC: KY Srinivasan <kys@...rosoft.com>, "wei.liu@...nel.org"
<wei.liu@...nel.org>, Dexuan Cui <decui@...rosoft.com>, "davem@...emloft.net"
<davem@...emloft.net>, "edumazet@...gle.com" <edumazet@...gle.com>,
"kuba@...nel.org" <kuba@...nel.org>, "pabeni@...hat.com" <pabeni@...hat.com>,
Long Li <longli@...rosoft.com>, Ajay Sharma <sharmaajay@...rosoft.com>,
"leon@...nel.org" <leon@...nel.org>, "cai.huoqing@...ux.dev"
<cai.huoqing@...ux.dev>, "ssengar@...ux.microsoft.com"
<ssengar@...ux.microsoft.com>, "vkuznets@...hat.com" <vkuznets@...hat.com>,
"tglx@...utronix.de" <tglx@...utronix.de>, "linux-hyperv@...r.kernel.org"
<linux-hyperv@...r.kernel.org>, "netdev@...r.kernel.org"
<netdev@...r.kernel.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "linux-rdma@...r.kernel.org"
<linux-rdma@...r.kernel.org>, "stable@...r.kernel.org"
<stable@...r.kernel.org>
Subject: RE: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload when
host is unresponsive
> -----Original Message-----
> From: Souradeep Chakrabarti <schakrabarti@...rosoft.com>
> Sent: Monday, July 3, 2023 3:55 PM
> To: Alexander Lobakin <aleksander.lobakin@...el.com>; souradeep chakrabarti
> <schakrabarti@...ux.microsoft.com>
> Cc: KY Srinivasan <kys@...rosoft.com>; Haiyang Zhang
> <haiyangz@...rosoft.com>; wei.liu@...nel.org; Dexuan Cui
> <decui@...rosoft.com>; davem@...emloft.net; edumazet@...gle.com;
> kuba@...nel.org; pabeni@...hat.com; Long Li <longli@...rosoft.com>; Ajay
> Sharma <sharmaajay@...rosoft.com>; leon@...nel.org;
> cai.huoqing@...ux.dev; ssengar@...ux.microsoft.com; vkuznets@...hat.com;
> tglx@...utronix.de; linux-hyperv@...r.kernel.org; netdev@...r.kernel.org;
> linux-kernel@...r.kernel.org; linux-rdma@...r.kernel.org;
> stable@...r.kernel.org
> Subject: RE: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload
> when host is unresponsive
>
>
>
> >-----Original Message-----
> >From: Alexander Lobakin <aleksander.lobakin@...el.com>
> >Sent: Monday, July 3, 2023 10:18 PM
> >To: souradeep chakrabarti <schakrabarti@...ux.microsoft.com>
> >Cc: KY Srinivasan <kys@...rosoft.com>; Haiyang Zhang
> ><haiyangz@...rosoft.com>; wei.liu@...nel.org; Dexuan Cui
> ><decui@...rosoft.com>; davem@...emloft.net; edumazet@...gle.com;
> >kuba@...nel.org; pabeni@...hat.com; Long Li <longli@...rosoft.com>; Ajay
> >Sharma <sharmaajay@...rosoft.com>; leon@...nel.org;
> >cai.huoqing@...ux.dev; ssengar@...ux.microsoft.com; vkuznets@...hat.com;
> >tglx@...utronix.de; linux-hyperv@...r.kernel.org; netdev@...r.kernel.org;
> >linux-kernel@...r.kernel.org; linux-rdma@...r.kernel.org;
> >stable@...r.kernel.org; Souradeep Chakrabarti
> <schakrabarti@...rosoft.com>
> >Subject: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload when
> >host is unresponsive
> >
> >From: Souradeep Chakrabarti <schakrabarti@...ux.microsoft.com>
> >Date: Mon, 3 Jul 2023 01:49:31 -0700
> >
> >> From: Souradeep Chakrabarti <schakrabarti@...ux.microsoft.com>
> >
> >Please sync your Git name and Git mail account settings, so that your own
> >patches won't have "From:" when sending. From what I see, you need to
> >correct first letters of name and surname to capital in the Git email settings
> >block.
> Thank you for pointing, I will fix it.
> >
> >>
> >> When unloading the MANA driver, mana_dealloc_queues() waits for the
> >> MANA hardware to complete any inflight packets and set the pending
> >> send count to zero. But if the hardware has failed,
> >> mana_dealloc_queues() could wait forever.
> >>
> >> Fix this by adding a timeout to the wait. Set the timeout to 120
> >> seconds, which is a somewhat arbitrary value that is more than long
> >> enough for functional hardware to complete any sends.
> >>
> >> Signed-off-by: Souradeep Chakrabarti
> >> <schakrabarti@...ux.microsoft.com>
> >
> >Where's "Fixes:" tagging the blamed commit?
> This is present from the day zero of the mana driver code.
> It has not been introduced in the code by any commit.
> >
> >> ---
> >> V3 -> V4:
> >> * Fixed the commit message to describe the context.
> >> * Removed the vf_unload_timeout, as it is not required.
> >> ---
> >> drivers/net/ethernet/microsoft/mana/mana_en.c | 26
> >> ++++++++++++++++---
> >> 1 file changed, 23 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c
> >> b/drivers/net/ethernet/microsoft/mana/mana_en.c
> >> index a499e460594b..d26f1da70411 100644
> >> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> >> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> >> @@ -2346,7 +2346,10 @@ static int mana_dealloc_queues(struct
> >> net_device *ndev) {
> >> struct mana_port_context *apc = netdev_priv(ndev);
> >> struct gdma_dev *gd = apc->ac->gdma_dev;
> >> + unsigned long timeout;
> >> struct mana_txq *txq;
> >> + struct sk_buff *skb;
> >> + struct mana_cq *cq;
> >> int i, err;
> >>
> >> if (apc->port_is_up)
> >> @@ -2363,15 +2366,32 @@ static int mana_dealloc_queues(struct
> >net_device *ndev)
> >> * to false, but it doesn't matter since mana_start_xmit() drops any
> >> * new packets due to apc->port_is_up being false.
> >> *
> >> - * Drain all the in-flight TX packets
> >> + * Drain all the in-flight TX packets.
> >> + * A timeout of 120 seconds for all the queues is used.
> >> + * This will break the while loop when h/w is not responding.
> >> + * This value of 120 has been decided here considering max
> >> + * number of queues.
> >> */
> >> +
> >> + timeout = jiffies + 120 * HZ;
> >
> >Why not initialize it right when declaring?
> I will fix it in next version.
> >
> >> for (i = 0; i < apc->num_queues; i++) {
> >> txq = &apc->tx_qp[i].txq;
> >> -
> >> - while (atomic_read(&txq->pending_sends) > 0)
> >> + while (atomic_read(&txq->pending_sends) > 0 &&
> >> + time_before(jiffies, timeout)) {
> >> usleep_range(1000, 2000);> + }
> >> }
> >
> >120 seconds by 2 msec step is 60000 iterations, by 1 msec is 120000
> >iterations. I know usleep_range() often is much less precise, but still.
> >Do you really need that much time? Has this been measured during the tests
> >that it can take up to 120 seconds or is it just some random value that "should
> >be enough"?
> >If you really need 120 seconds, I'd suggest using a timer / delayed work
> instead
> >of wasting resources.
> Here the intent is not waiting for 120 seconds, rather than avoid continue
> checking the
> pending_sends of each tx queues for an indefinite time, before freeing
> sk_buffs.
> The pending_sends can only get decreased for a tx queue, if mana_poll_tx_cq()
> gets called for a completion notification and increased by xmit.
>
> In this particular bug, apc->port_is_up is not set to false, causing
> xmit to keep increasing the pending_sends for the queue and
> mana_poll_tx_cq()
> not getting called for the queue.
>
> If we see the comment in the function mana_dealloc_queues(), it mentions it :
>
> 2346 /* No packet can be transmitted now since apc->port_is_up is false.
> 2347 * There is still a tiny chance that mana_poll_tx_cq() can re-enable
> 2348 * a txq because it may not timely see apc->port_is_up being cleared
> 2349 * to false, but it doesn't matter since mana_start_xmit() drops any
> 2350 * new packets due to apc->port_is_up being false.
>
> The value 120 seconds has been decided here based on maximum number of
> queues
> are allowed in this specific hardware, it is a safe assumption.
I agree. Also, this waiting time is usually much shorter than 120 sec. The long
wait only happens in rare and unexpected NIC HW non-responding cases. To
further reduce the resource consumption, we can double the usleep_range()
time in every iteration. So, the number of iterations will be greatly reduced
before reaching 120 sec.
Thanks,
- Haiyang
Powered by blists - more mailing lists