lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230627084205.GB31802@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net>
Date:   Tue, 27 Jun 2023 01:42:05 -0700
From:   Souradeep Chakrabarti <schakrabarti@...ux.microsoft.com>
To:     Praveen Kumar <kumarpraveen@...ux.microsoft.com>
Cc:     kys@...rosoft.com, haiyangz@...rosoft.com, wei.liu@...nel.org,
        decui@...rosoft.com, davem@...emloft.net, edumazet@...gle.com,
        kuba@...nel.org, pabeni@...hat.com, longli@...rosoft.com,
        sharmaajay@...rosoft.com, leon@...nel.org, cai.huoqing@...ux.dev,
        ssengar@...ux.microsoft.com, vkuznets@...hat.com,
        tglx@...utronix.de, linux-hyperv@...r.kernel.org,
        netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-rdma@...r.kernel.org, stable@...r.kernel.org,
        schakrabarti@...rosoft.com
Subject: Re: [PATCH 2/2 V3 net] net: mana: Fix MANA VF unload when host is
 unresponsive

On Mon, Jun 26, 2023 at 07:43:07PM +0530, Praveen Kumar wrote:
> On 6/26/2023 2:50 PM, souradeep chakrabarti wrote:
> > From: Souradeep Chakrabarti <schakrabarti@...ux.microsoft.com>
> > 
> > This is the second part of the fix.
> > 
> > Also this patch adds a new attribute in mana_context, which gets set when
> > mana_hwc_send_request() hits a timeout because of host unresponsiveness.
> > This flag then helps to avoid the timeouts in successive calls.
> > 
> > Fixes: ca9c54d2d6a5ab2430c4eda364c77125d62e5e0f (net: mana: Add a driver for
> > Microsoft Azure Network Adapter)
> > Signed-off-by: Souradeep Chakrabarti <schakrabarti@...ux.microsoft.com>
> > ---
> > V2 -> V3:
> > * Removed the initialization of vf_unload_timeout
> > * Splitted the patch in two.
> > * Fixed extra space from the commit message.
> > ---
> >  drivers/net/ethernet/microsoft/mana/gdma_main.c  |  4 +++-
> >  drivers/net/ethernet/microsoft/mana/hw_channel.c | 12 +++++++++++-
> >  include/net/mana/mana.h                          |  2 ++
> >  3 files changed, 16 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > index 8f3f78b68592..6411f01be0d9 100644
> > --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > @@ -946,10 +946,12 @@ int mana_gd_deregister_device(struct gdma_dev *gd)
> >  	struct gdma_context *gc = gd->gdma_context;
> >  	struct gdma_general_resp resp = {};
> >  	struct gdma_general_req req = {};
> > +	struct mana_context *ac;
> >  	int err;
> >  
> >  	if (gd->pdid == INVALID_PDID)
> >  		return -EINVAL;
> > +	ac = gd->driver_data;
> >  
> >  	mana_gd_init_req_hdr(&req.hdr, GDMA_DEREGISTER_DEVICE, sizeof(req),
> >  			     sizeof(resp));
> > @@ -957,7 +959,7 @@ int mana_gd_deregister_device(struct gdma_dev *gd)
> >  	req.hdr.dev_id = gd->dev_id;
> >  
> >  	err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp);
> > -	if (err || resp.hdr.status) {
> > +	if ((err || resp.hdr.status) && !ac->vf_unload_timeout) {
> >  		dev_err(gc->dev, "Failed to deregister device: %d, 0x%x\n",
> >  			err, resp.hdr.status);
> 
> With !ac->vf_unload_timeout option, this message may not be correctly showing err, status. Probably you want to add explicit information during timeouts so that it give right information ? Or have the err, status field properly updated.
This check !ac->vf_unload_timeout here means if ac->vf_unload_timeout is not yet set,
then only consider the err path, else continue the remaining operation.
> 
> >  		if (!err)
> > diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c b/drivers/net/ethernet/microsoft/mana/hw_channel.c
> > index 9d1507eba5b9..492cb2c6e2cb 100644
> > --- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
> > +++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c
> > @@ -1,8 +1,10 @@
> >  // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
> >  /* Copyright (c) 2021, Microsoft Corporation. */
> >  
> > +#include "asm-generic/errno.h"
> >  #include <net/mana/gdma.h>
> >  #include <net/mana/hw_channel.h>
> > +#include <net/mana/mana.h>
> >  
> >  static int mana_hwc_get_msg_index(struct hw_channel_context *hwc, u16 *msg_id)
> >  {
> > @@ -786,12 +788,19 @@ int mana_hwc_send_request(struct hw_channel_context *hwc, u32 req_len,
> >  	struct hwc_wq *txq = hwc->txq;
> >  	struct gdma_req_hdr *req_msg;
> >  	struct hwc_caller_ctx *ctx;
> > +	struct mana_context *ac;
> >  	u32 dest_vrcq = 0;
> >  	u32 dest_vrq = 0;
> >  	u16 msg_id;
> >  	int err;
> >  
> >  	mana_hwc_get_msg_index(hwc, &msg_id);
> > +	ac = hwc->gdma_dev->driver_data;
> 
> Is there a case where gdma_dev be invalid here ? If so, lets check the state and then proceed further ?
I can see Haiyang has already in his comment, responded on the same.
hwc->gdma_dev will be valid here, but as Haiyang pointed we need to use
hwc->gdma_dev->gdma_context->mana.driver_data, or better to relocate the
attribute in gdma_context.
> 
> > +	if (ac->vf_unload_timeout) {
> > +		dev_err(hwc->dev, "HWC: vport is already unloaded.\n");
> > +		err = -ETIMEDOUT;
> > +		goto out;
> > +	}
> >  
> >  	tx_wr = &txq->msg_buf->reqs[msg_id];
> >  
> > @@ -825,9 +834,10 @@ int mana_hwc_send_request(struct hw_channel_context *hwc, u32 req_len,
> >  		goto out;
> >  	}
> >  
> > -	if (!wait_for_completion_timeout(&ctx->comp_event, 30 * HZ)) {
> > +	if (!wait_for_completion_timeout(&ctx->comp_event, 5 * HZ)) {
> 
> IMHO we should have macros instead of magic numbers (5 , 30 or so). But would like others to comment here.
> 
> >  		dev_err(hwc->dev, "HWC: Request timed out!\n");
> >  		err = -ETIMEDOUT;
> > +		ac->vf_unload_timeout = true;
> >  		goto out;
> >  	}
> >  
> > diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
> > index 9eef19972845..5f5affdca1eb 100644
> > --- a/include/net/mana/mana.h
> > +++ b/include/net/mana/mana.h
> > @@ -358,6 +358,8 @@ struct mana_context {
> >  
> >  	u16 num_ports;
> >  
> > +	bool vf_unload_timeout;
> > +
> >  	struct mana_eq *eqs;
> >  
> >  	struct net_device *ports[MAX_PORTS_IN_MANA_DEV];

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ