[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <PH7PR21MB31163E61E2B3DABD2533F09BCA26A@PH7PR21MB3116.namprd21.prod.outlook.com>
Date: Mon, 26 Jun 2023 20:32:55 +0000
From: Haiyang Zhang <haiyangz@...rosoft.com>
To: Praveen Kumar <kumarpraveen@...ux.microsoft.com>,
souradeep chakrabarti <schakrabarti@...ux.microsoft.com>,
KY Srinivasan <kys@...rosoft.com>,
"wei.liu@...nel.org" <wei.liu@...nel.org>,
Dexuan Cui <decui@...rosoft.com>,
"davem@...emloft.net" <davem@...emloft.net>,
"edumazet@...gle.com" <edumazet@...gle.com>,
"kuba@...nel.org" <kuba@...nel.org>,
"pabeni@...hat.com" <pabeni@...hat.com>,
Long Li <longli@...rosoft.com>,
Ajay Sharma <sharmaajay@...rosoft.com>,
"leon@...nel.org" <leon@...nel.org>,
"cai.huoqing@...ux.dev" <cai.huoqing@...ux.dev>,
"ssengar@...ux.microsoft.com" <ssengar@...ux.microsoft.com>,
"vkuznets@...hat.com" <vkuznets@...hat.com>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>
CC: "stable@...r.kernel.org" <stable@...r.kernel.org>,
Souradeep Chakrabarti <schakrabarti@...rosoft.com>
Subject: RE: [PATCH 2/2 V3 net] net: mana: Fix MANA VF unload when host is
unresponsive
> -----Original Message-----
> From: Praveen Kumar <kumarpraveen@...ux.microsoft.com>
> Sent: Monday, June 26, 2023 10:13 AM
> To: souradeep chakrabarti <schakrabarti@...ux.microsoft.com>; KY Srinivasan
> <kys@...rosoft.com>; Haiyang Zhang <haiyangz@...rosoft.com>;
> wei.liu@...nel.org; Dexuan Cui <decui@...rosoft.com>;
> davem@...emloft.net; edumazet@...gle.com; kuba@...nel.org;
> pabeni@...hat.com; Long Li <longli@...rosoft.com>; Ajay Sharma
> <sharmaajay@...rosoft.com>; leon@...nel.org; cai.huoqing@...ux.dev;
> ssengar@...ux.microsoft.com; vkuznets@...hat.com; tglx@...utronix.de; linux-
> hyperv@...r.kernel.org; netdev@...r.kernel.org; linux-kernel@...r.kernel.org;
> linux-rdma@...r.kernel.org
> Cc: stable@...r.kernel.org; Souradeep Chakrabarti
> <schakrabarti@...rosoft.com>
> Subject: Re: [PATCH 2/2 V3 net] net: mana: Fix MANA VF unload when host is
> unresponsive
>
> On 6/26/2023 2:50 PM, souradeep chakrabarti wrote:
> > From: Souradeep Chakrabarti <schakrabarti@...ux.microsoft.com>
> >
> > This is the second part of the fix.
> >
> > Also this patch adds a new attribute in mana_context, which gets set when
> > mana_hwc_send_request() hits a timeout because of host unresponsiveness.
> > This flag then helps to avoid the timeouts in successive calls.
> >
> > Fixes: ca9c54d2d6a5ab2430c4eda364c77125d62e5e0f (net: mana: Add a
> driver for
> > Microsoft Azure Network Adapter)
> > Signed-off-by: Souradeep Chakrabarti <schakrabarti@...ux.microsoft.com>
> > ---
> > V2 -> V3:
> > * Removed the initialization of vf_unload_timeout
> > * Splitted the patch in two.
> > * Fixed extra space from the commit message.
> > ---
> > drivers/net/ethernet/microsoft/mana/gdma_main.c | 4 +++-
> > drivers/net/ethernet/microsoft/mana/hw_channel.c | 12 +++++++++++-
> > include/net/mana/mana.h | 2 ++
> > 3 files changed, 16 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > index 8f3f78b68592..6411f01be0d9 100644
> > --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > @@ -946,10 +946,12 @@ int mana_gd_deregister_device(struct gdma_dev
> *gd)
> > struct gdma_context *gc = gd->gdma_context;
> > struct gdma_general_resp resp = {};
> > struct gdma_general_req req = {};
> > + struct mana_context *ac;
> > int err;
> >
> > if (gd->pdid == INVALID_PDID)
> > return -EINVAL;
> > + ac = gd->driver_data;
> >
> > mana_gd_init_req_hdr(&req.hdr, GDMA_DEREGISTER_DEVICE,
> sizeof(req),
> > sizeof(resp));
> > @@ -957,7 +959,7 @@ int mana_gd_deregister_device(struct gdma_dev *gd)
> > req.hdr.dev_id = gd->dev_id;
> >
> > err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp);
> > - if (err || resp.hdr.status) {
> > + if ((err || resp.hdr.status) && !ac->vf_unload_timeout) {
> > dev_err(gc->dev, "Failed to deregister device: %d, 0x%x\n",
> > err, resp.hdr.status);
>
> With !ac->vf_unload_timeout option, this message may not be correctly
> showing err, status. Probably you want to add explicit information during
> timeouts so that it give right information ? Or have the err, status field properly
> updated.
>
> > if (!err)
> > diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c
> b/drivers/net/ethernet/microsoft/mana/hw_channel.c
> > index 9d1507eba5b9..492cb2c6e2cb 100644
> > --- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
> > +++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c
> > @@ -1,8 +1,10 @@
> > // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
> > /* Copyright (c) 2021, Microsoft Corporation. */
> >
> > +#include "asm-generic/errno.h"
> > #include <net/mana/gdma.h>
> > #include <net/mana/hw_channel.h>
> > +#include <net/mana/mana.h>
> >
> > static int mana_hwc_get_msg_index(struct hw_channel_context *hwc, u16
> *msg_id)
> > {
> > @@ -786,12 +788,19 @@ int mana_hwc_send_request(struct
> hw_channel_context *hwc, u32 req_len,
> > struct hwc_wq *txq = hwc->txq;
> > struct gdma_req_hdr *req_msg;
> > struct hwc_caller_ctx *ctx;
> > + struct mana_context *ac;
> > u32 dest_vrcq = 0;
> > u32 dest_vrq = 0;
> > u16 msg_id;
> > int err;
> >
> > mana_hwc_get_msg_index(hwc, &msg_id);
> > + ac = hwc->gdma_dev->driver_data;
>
> Is there a case where gdma_dev be invalid here ? If so, lets check the state and
> then proceed further ?
Yes, hwc->gdma_dev is assigned shortly after it's allocated - see the code below. So
it's valid.
But hwc->gdma_dev->driver_data is hwc, not "mana_context *ac". There are two
gdma_dev in gdma_context: hwc & mana.
You can get ac from: hwc->gdma_dev->gdma_context->mana.driver_data
Or, to avoid too many pointer deference, I suggest to put the vf_unload_timeout
into gdma_context.
int mana_hwc_create_channel(struct gdma_context *gc)
{
hwc = kzalloc(sizeof(*hwc), GFP_KERNEL);
...
gd->gdma_context = gc;
gd->driver_data = hwc;
hwc->gdma_dev = gd;
hwc->dev = gc->dev;
Also, mana_gd_send_request/mana_hwc_send_request() is used in many places,
not just unloading.
Should you use timeout value 5 sec, and the vf_unload_timeout flag in unloading
path only, and avoid touching other code paths? Please check with hostnet team for
suggestions.
If we decide to let the vf_unload_timeout flag affect all code paths, not just
unloading, then it should be renamed to hwc_timeout, and submit the second patch
separately.
If just use it for unloading, since mana_gd_deregister_device() is used by PF too,
name it like: unload_hwc_timeout.
Thanks,
-Haiyang
Powered by blists - more mailing lists