lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <3cf47bdde036515903a40a8e5577f1559e2ee988.camel@mellanox.com>
Date:   Fri, 20 Mar 2020 01:16:28 +0000
From:   Saeed Mahameed <saeedm@...lanox.com>
To:     "leon@...nel.org" <leon@...nel.org>
CC:     Jason Gunthorpe <jgg@...lanox.com>,
        Mark Zhang <markz@...lanox.com>,
        Maor Gottlieb <maorg@...lanox.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
        "dledford@...hat.com" <dledford@...hat.com>
Subject: Re: [PATCH mlx5-next 1/6] net/mlx5: Enable SW-defined RoCEv2 UDP
 source port

On Thu, 2020-03-19 at 08:05 +0200, Leon Romanovsky wrote:
> On Wed, Mar 18, 2020 at 11:33:46PM +0000, Saeed Mahameed wrote:
> > On Wed, 2020-03-18 at 11:52 +0200, Leon Romanovsky wrote:
> > > From: Mark Zhang <markz@...lanox.com>
> > > 
> > > When this is enabled, UDP source port for RoCEv2 packets are
> > > defined
> > > by software instead of firmware.
> > > 
> > > Signed-off-by: Mark Zhang <markz@...lanox.com>
> > > Reviewed-by: Maor Gottlieb <maorg@...lanox.com>
> > > Signed-off-by: Leon Romanovsky <leonro@...lanox.com>
> > > ---
> > >  .../net/ethernet/mellanox/mlx5/core/main.c    | 39
> > > +++++++++++++++++++
> > >  include/linux/mlx5/mlx5_ifc.h                 |  5 ++-
> > >  2 files changed, 43 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > > b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > > index 6b38ec72215a..bdc73370297b 100644
> > > --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > > @@ -585,6 +585,39 @@ static int handle_hca_cap(struct
> > > mlx5_core_dev
> > > *dev)
> > >  	return err;
> > >  }
> > > 
> > > +static int handle_hca_cap_roce(struct mlx5_core_dev *dev)
> > > +{
> > > +	int set_sz = MLX5_ST_SZ_BYTES(set_hca_cap_in);
> > > +	void *set_hca_cap;
> > > +	void *set_ctx;
> > > +	int err;
> > > +
> > > +	if (!MLX5_CAP_GEN(dev, roce))
> > > +		return 0;
> > > +
> > > +	err = mlx5_core_get_caps(dev, MLX5_CAP_ROCE);
> > > +	if (err)
> > > +		return err;
> > > +
> > > +	if (MLX5_CAP_ROCE(dev, sw_r_roce_src_udp_port) ||
> > > +	    !MLX5_CAP_ROCE_MAX(dev, sw_r_roce_src_udp_port))
> > > +		return 0;
> > > +
> > > +	set_ctx = kzalloc(set_sz, GFP_KERNEL);
> > > +	if (!set_ctx)
> > > +		return -ENOMEM;
> > > +
> > 
> > all the sisters of this function allocate this and free it
> > consecutively, why not allocate it from outside once, pass it to
> > all
> > handle_hca_cap_xyz functions, each one will memset it and reuse it.
> > see below.
> 
> Agree, I'll do it.
> 
> > > +	set_hca_cap = MLX5_ADDR_OF(set_hca_cap_in, set_ctx,
> > > capability);
> > > +	memcpy(set_hca_cap, dev->caps.hca_cur[MLX5_CAP_ROCE],
> > > +	       MLX5_ST_SZ_BYTES(roce_cap));
> > > +	MLX5_SET(roce_cap, set_hca_cap, sw_r_roce_src_udp_port, 1);
> > > +
> > > +	err = set_caps(dev, set_ctx, set_sz,
> > > MLX5_SET_HCA_CAP_OP_MOD_ROCE);
> > > +
> > 
> > Do we really need to fail the whole driver if we just try to set a
> > non
> > mandatory cap ?
> 
> It is less important what caused to failure, but the fact that basic
> mlx5_cmd_exec() failed during initialization flow. I think that it
> is bad enough to stop the driver, because its operation is going to
> be unreliable.
> 
> Please share your end-result decision on that and I'll align to it.
> 

driver stability and reliability is not affected by this failing, since
design-wise we don't count on setting the caps on this stage, we query
them anyway in the next stages of the driver load.

Many reason this could fail, old FW that doesn't handle this new CAP
properly, new FW which has a bug only in the new feature flow.
The driver should be resilient and provide basic functionality or in
this case just drop this feature, since next cap query of this feature
will return 0, and driver will not try to enable this feature anyway.

if it is something really fundamental that caused the issue, then just
let it be, if we fail in a more advanced mandatory stage then we will
fail on that stage, if we didn't, then it is a win win.


> > > +	kfree(set_ctx);
> > > +	return err;
> > > +}
> > > +
> > >  static int set_hca_cap(struct mlx5_core_dev *dev)
> > >  {
> > >  	int err;
> > 
> > let's allocate the set_ctx in this parent function and pass it to
> > all
> > hca cap handlers;
> > 
> > set_sz = MLX5_ST_SZ_BYTES(set_hca_cap_in);
> > set_ctx = kzalloc(set_sz, GFP_KERNEL);
> 
> I'm doing it now.
> 

Awesome, Thanks !

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ