lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240426183333.257ccae5@kernel.org>
Date: Fri, 26 Apr 2024 18:33:33 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Heng Qi <hengqi@...ux.alibaba.com>
Cc: netdev@...r.kernel.org, virtualization@...ts.linux.dev, "David S .
 Miller" <davem@...emloft.net>, Paolo Abeni <pabeni@...hat.com>, Eric
 Dumazet <edumazet@...gle.com>, Jason Wang <jasowang@...hat.com>, "Michael S
 . Tsirkin" <mst@...hat.com>, Brett Creeley <bcreeley@....com>, Ratheesh
 Kannoth <rkannoth@...vell.com>, Alexander Lobakin
 <aleksander.lobakin@...el.com>, Xuan Zhuo <xuanzhuo@...ux.alibaba.com>, Tal
 Gilboa <talgi@...dia.com>, Jonathan Corbet <corbet@....net>,
 linux-doc@...r.kernel.org, Maxime Chevallier
 <maxime.chevallier@...tlin.com>, Jiri Pirko <jiri@...nulli.us>, Paul
 Greenwalt <paul.greenwalt@...el.com>, Ahmed Zaki <ahmed.zaki@...el.com>,
 Vladimir Oltean <vladimir.oltean@....com>, Kory Maincent
 <kory.maincent@...tlin.com>, Andrew Lunn <andrew@...n.ch>, "justinstitt @
 google . com" <justinstitt@...gle.com>
Subject: Re: [PATCH net-next v10 2/4] ethtool: provide customized dim
 profile management

On Fri, 26 Apr 2024 00:59:46 +0800 Heng Qi wrote:
> The NetDIM library, currently leveraged by an array of NICs, delivers
> excellent acceleration benefits. Nevertheless, NICs vary significantly
> in their dim profile list prerequisites.
> 
> Specifically, virtio-net backends may present diverse sw or hw device
> implementation, making a one-size-fits-all parameter list impractical.
> On Alibaba Cloud, the virtio DPU's performance under the default DIM
> profile falls short of expectations, partly due to a mismatch in
> parameter configuration.
> 
> I also noticed that ice/idpf/ena and other NICs have customized
> profilelist or placed some restrictions on dim capabilities.
> 
> Motivated by this, I tried adding new params for "ethtool -C" that provides
> a per-device control to modify and access a device's interrupt parameters.
> 
> Usage
> ========
> The target NIC is named ethx.
> 
> Assume that ethx only declares support for rx profile setting
> (with DIM_PROFILE_RX flag set in profile_flags) and supports modification
> of usec and pkt fields.
> 
> 1. Query the currently customized list of the device
> 
> $ ethtool -c ethx
> ...
> rx-profile:
> {.usec =   1, .pkts = 256, .comps = n/a,},
> {.usec =   8, .pkts = 256, .comps = n/a,},
> {.usec =  64, .pkts = 256, .comps = n/a,},
> {.usec = 128, .pkts = 256, .comps = n/a,},
> {.usec = 256, .pkts = 256, .comps = n/a,}
> tx-profile:   n/a
> 
> 2. Tune
> $ ethtool -C ethx rx-profile 1,1,n_2,n,n_3,3,n_4,4,n_n,5,n
> "n" means do not modify this field.
> $ ethtool -c ethx
> ...
> rx-profile:
> {.usec =   1, .pkts =   1, .comps = n/a,},
> {.usec =   2, .pkts = 256, .comps = n/a,},
> {.usec =   3, .pkts =   3, .comps = n/a,},
> {.usec =   4, .pkts =   4, .comps = n/a,},
> {.usec = 256, .pkts =   5, .comps = n/a,}
> tx-profile:   n/a
> 
> 3. Hint
> If the device does not support some type of customized dim profiles,
> the corresponding "n/a" will display.
> 
> If the "n/a" field is being modified, -EOPNOTSUPP will be reported.
> 
> Signed-off-by: Heng Qi <hengqi@...ux.alibaba.com>
> ---
>  Documentation/netlink/specs/ethtool.yaml     |  23 ++
>  Documentation/networking/ethtool-netlink.rst |   4 +
>  include/linux/dim.h                          |  60 +++++
>  include/linux/ethtool.h                      |   7 +-
>  include/linux/netdevice.h                    |   5 +
>  include/uapi/linux/ethtool_netlink.h         |  20 ++
>  lib/dim/net_dim.c                            |  73 +++++
>  net/ethtool/coalesce.c                       | 264 ++++++++++++++++++-
>  8 files changed, 454 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/netlink/specs/ethtool.yaml b/Documentation/netlink/specs/ethtool.yaml
> index 87ae7b397984..3c51a1a0b5d9 100644
> --- a/Documentation/netlink/specs/ethtool.yaml
> +++ b/Documentation/netlink/specs/ethtool.yaml
> @@ -413,6 +413,18 @@ attribute-sets:
>        -
>          name: combined-count
>          type: u32
> +  -
> +    name: irq-moderation
> +    attributes:
> +      -
> +        name: usec
> +        type: u32
> +      -
> +        name: pkts
> +        type: u32
> +      -
> +        name: comps
> +        type: u32
>  
>    -
>      name: coalesce
> @@ -502,6 +514,15 @@ attribute-sets:
>        -
>          name: tx-aggr-time-usecs
>          type: u32
> +      -
> +        name: rx-profile
> +        type: nest
> +        nested-attributes: irq-moderation
> +      -
> +        name: tx-profile
> +        type: nest
> +        nested-attributes: irq-moderation
> +
>    -
>      name: pause-stat
>      attributes:
> @@ -1313,6 +1334,8 @@ operations:
>              - tx-aggr-max-bytes
>              - tx-aggr-max-frames
>              - tx-aggr-time-usecs
> +            - rx-profile
> +            - tx-profile
>        dump: *coalesce-get-op
>      -
>        name: coalesce-set
> diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
> index 4e63d3708ed9..78ee25081498 100644
> --- a/Documentation/networking/ethtool-netlink.rst
> +++ b/Documentation/networking/ethtool-netlink.rst
> @@ -1040,6 +1040,8 @@ Kernel response contents:
>    ``ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES``     u32     max aggr size, Tx
>    ``ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES``    u32     max aggr packets, Tx
>    ``ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS``    u32     time (us), aggr, Tx
> +  ``ETHTOOL_A_COALESCE_RX_PROFILE``            nested  profile of DIM, Rx
> +  ``ETHTOOL_A_COALESCE_TX_PROFILE``            nested  profile of DIM, Tx
>    ===========================================  ======  =======================
>  
>  Attributes are only included in reply if their value is not zero or the
> @@ -1105,6 +1107,8 @@ Request contents:
>    ``ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES``     u32     max aggr size, Tx
>    ``ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES``    u32     max aggr packets, Tx
>    ``ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS``    u32     time (us), aggr, Tx
> +  ``ETHTOOL_A_COALESCE_RX_PROFILE``            nested  profile of DIM, Rx
> +  ``ETHTOOL_A_COALESCE_TX_PROFILE``            nested  profile of DIM, Tx
>    ===========================================  ======  =======================
>  
>  Request is rejected if it attributes declared as unsupported by driver (i.e.
> diff --git a/include/linux/dim.h b/include/linux/dim.h
> index 43398f5eade2..af01389fcf39 100644
> --- a/include/linux/dim.h
> +++ b/include/linux/dim.h
> @@ -9,6 +9,7 @@
>  #include <linux/module.h>
>  #include <linux/types.h>
>  #include <linux/workqueue.h>
> +#include <linux/netdevice.h>
>  
>  /* Number of DIM profiles and period mode. */
>  #define NET_DIM_PARAMS_NUM_PROFILES 5
> @@ -53,6 +54,39 @@ struct dim_cq_moder {
>  	u8 cq_period_mode;
>  };
>  
> +#define DIM_PROFILE_RX		BIT(0)	/* support rx dim profile modification */
> +#define DIM_PROFILE_TX		BIT(1)	/* support tx dim profile modification */
> +
> +#define DIM_COALESCE_USEC	BIT(0)	/* support usec field modification */
> +#define DIM_COALESCE_PKTS	BIT(1)	/* support pkts field modification */
> +#define DIM_COALESCE_COMPS	BIT(2)	/* support comps field modification */
> +
> +struct dim_irq_moder {
> +	/* See DIM_PROFILE_* */
> +	u8 profile_flags;
> +
> +	/* See DIM_COALESCE_* for Rx and Tx */
> +	u8 coal_flags;
> +
> +	/* Rx DIM period count mode: CQE or EQE */
> +	u8 dim_rx_mode;
> +
> +	/* Tx DIM period count mode: CQE or EQE */
> +	u8 dim_tx_mode;
> +
> +	/* DIM profile list for Rx */
> +	struct dim_cq_moder __rcu *rx_profile;
> +
> +	/* DIM profile list for Tx */
> +	struct dim_cq_moder __rcu *tx_profile;
> +
> +	/* Rx DIM worker function scheduled by net_dim() */
> +	void (*rx_dim_work)(struct work_struct *work);
> +
> +	/* Tx DIM worker function scheduled by net_dim() */
> +	void (*tx_dim_work)(struct work_struct *work);
> +};
> +
>  /**
>   * struct dim_sample - Structure for DIM sample data.
>   * Used for communications between DIM and its consumer.
> @@ -198,6 +232,32 @@ enum dim_step_result {
>  	DIM_ON_EDGE,
>  };
>  
> +/**
> + * net_dim_init_irq_moder - collect information to initialize irq moderation
> + * @dev: target network device
> + * @profile_flags: Rx or Tx profile modification capability
> + * @coal_flags: irq moderation params flags
> + * @rx_mode: CQ period mode for Rx
> + * @tx_mode: CQ period mode for Tx
> + * void (*rx_dim_work)(struct work_struct *work);
> + *	Rx worker called after dim decision.
> + *
> + * void (*tx_dim_work)(struct work_struct *work);
> + *	Tx worker called after dim decision.

this format is not going to make ./scripts/kernel-doc happy,
run it on the modified files it will show you the warnings:

./scripts/kernel-doc -Wall -none file

> diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
> index b4f0d233d048..4837e37e8b10 100644
> --- a/include/uapi/linux/ethtool_netlink.h
> +++ b/include/uapi/linux/ethtool_netlink.h
> @@ -416,12 +416,32 @@ enum {
>  	ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES,		/* u32 */
>  	ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES,		/* u32 */
>  	ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS,		/* u32 */
> +	ETHTOOL_A_COALESCE_RX_PROFILE,			/* nest - _A_PROFILE_IRQ_MODERATION */
> +	ETHTOOL_A_COALESCE_TX_PROFILE,			/* nest - _A_PROFILE_IRQ_MODERATION */
>  
>  	/* add new constants above here */
>  	__ETHTOOL_A_COALESCE_CNT,
>  	ETHTOOL_A_COALESCE_MAX = (__ETHTOOL_A_COALESCE_CNT - 1)
>  };
>  
> +enum {
> +	ETHTOOL_A_PROFILE_UNSPEC,
> +	ETHTOOL_A_PROFILE_IRQ_MODERATION,		/* nest, _A_IRQ_MODERATION_* */
> +
> +	__ETHTOOL_A_PROFILE_CNT,
> +	ETHTOOL_A_PROFILE_MAX = (__ETHTOOL_A_PROFILE_CNT - 1)
> +};

I think this doesn't match what you described in the YAML spec.
There is no "irq-moderation" layer there and no multi-attr: true...
Does tools/net/ynl/cli.py work with the new attributes?

> +enum {
> +	ETHTOOL_A_IRQ_MODERATION_UNSPEC,
> +	ETHTOOL_A_IRQ_MODERATION_USEC,			/* u32 */
> +	ETHTOOL_A_IRQ_MODERATION_PKTS,			/* u32 */
> +	ETHTOOL_A_IRQ_MODERATION_COMPS,			/* u32 */
> +
> +	__ETHTOOL_A_IRQ_MODERATION_CNT,
> +	ETHTOOL_A_IRQ_MODERATION_MAX = (__ETHTOOL_A_IRQ_MODERATION_CNT - 1)
> +};
> +
>  /* PAUSE */
>  
>  enum {
> diff --git a/lib/dim/net_dim.c b/lib/dim/net_dim.c
> index 4e32f7aaac86..ec0dc21793c0 100644
> --- a/lib/dim/net_dim.c
> +++ b/lib/dim/net_dim.c
> @@ -101,6 +101,79 @@ net_dim_get_def_tx_moderation(u8 cq_period_mode)
>  }
>  EXPORT_SYMBOL(net_dim_get_def_tx_moderation);
>  
> +int net_dim_init_irq_moder(struct net_device *dev, u8 profile_flags,
> +			   u8 coal_flags, u8 rx_mode, u8 tx_mode,
> +			   void (*rx_dim_work)(struct work_struct *work),
> +			   void (*tx_dim_work)(struct work_struct *work))
> +{
> +	struct dim_cq_moder *rxp, *txp;
> +	struct dim_irq_moder *moder;
> +	int len;
> +
> +	dev->irq_moder = kzalloc(sizeof(*dev->irq_moder), GFP_KERNEL);
> +	if (!dev->irq_moder)
> +		goto err_moder;
> +
> +	moder = dev->irq_moder;
> +	len = NET_DIM_PARAMS_NUM_PROFILES * sizeof(*moder->rx_profile);
> +
> +	moder->profile_flags = profile_flags;
> +	moder->coal_flags = coal_flags;
> +
> +	if (profile_flags & DIM_PROFILE_RX) {
> +		moder->dim_rx_mode = rx_mode;
> +		moder->rx_dim_work = rx_dim_work;
> +		rxp = kmemdup(rx_profile[rx_mode], len, GFP_KERNEL);
> +		if (!rxp)
> +			goto err_rx_profile;
> +
> +		rcu_assign_pointer(moder->rx_profile, rxp);
> +	}
> +
> +	if (profile_flags & DIM_PROFILE_TX) {
> +		moder->dim_tx_mode = tx_mode;
> +		moder->tx_dim_work = tx_dim_work;
> +		txp = kmemdup(tx_profile[tx_mode], len, GFP_KERNEL);
> +		if (!txp)
> +			goto err_tx_profile;

you need to init rxp to NULL, otherwise this goto may read
uninitialized value

> +		rcu_assign_pointer(moder->tx_profile, txp);
> +	}
> +
> +	return 0;
> +
> +err_tx_profile:
> +	kfree(rxp);
> +err_rx_profile:
> +	kfree(moder);
> +err_moder:
> +	return -ENOMEM;
> +}
> +EXPORT_SYMBOL(net_dim_init_irq_moder);
> +
> +void net_dim_free_irq_moder(struct net_device *dev)
> +{
> +	struct dim_cq_moder *rx_profile, *tx_profile;
> +
> +	if (!dev->irq_moder)
> +		return;
> +
> +	rcu_read_lock();
> +	rx_profile = rcu_dereference(dev->irq_moder->rx_profile);
> +	tx_profile = rcu_dereference(dev->irq_moder->tx_profile);
> +	rcu_read_unlock();

rtnl_dereference() ? use of the pointers you got outside of the read
critical section looks wrong

> +	rcu_assign_pointer(dev->irq_moder->tx_profile, NULL);
> +	rcu_assign_pointer(dev->irq_moder->rx_profile, NULL);
> +
> +	synchronize_rcu();

Better to use kfree_rcu(), synchronize_rcu() can be quite slow

> +	kfree(rx_profile);
> +	kfree(tx_profile);
> +	kfree(dev->irq_moder);
> +}
> +EXPORT_SYMBOL(net_dim_free_irq_moder);
> +
>  static int net_dim_step(struct dim *dim)
>  {
>  	if (dim->tired == (NET_DIM_PARAMS_NUM_PROFILES * 2))
> diff --git a/net/ethtool/coalesce.c b/net/ethtool/coalesce.c
> index 83112c1a71ae..7b852938cf01 100644
> --- a/net/ethtool/coalesce.c
> +++ b/net/ethtool/coalesce.c
> @@ -1,5 +1,6 @@
>  // SPDX-License-Identifier: GPL-2.0-only
>  
> +#include <linux/dim.h>
>  #include "netlink.h"
>  #include "common.h"
>  
> @@ -82,6 +83,14 @@ static int coalesce_prepare_data(const struct ethnl_req_info *req_base,
>  static int coalesce_reply_size(const struct ethnl_req_info *req_base,
>  			       const struct ethnl_reply_data *reply_base)
>  {
> +	int modersz = nla_total_size(0) + /* _PROFILE_IRQ_MODERATION, nest */
> +		      nla_total_size(sizeof(u32)) + /* _IRQ_MODERATION_USEC */
> +		      nla_total_size(sizeof(u32)) + /* _IRQ_MODERATION_PKTS */
> +		      nla_total_size(sizeof(u32));  /* _IRQ_MODERATION_COMPS */
> +
> +	int total_modersz = nla_total_size(0) +  /* _{R,T}X_PROFILE, nest */
> +			modersz * NET_DIM_PARAMS_NUM_PROFILES;
> +
>  	return nla_total_size(sizeof(u32)) +	/* _RX_USECS */
>  	       nla_total_size(sizeof(u32)) +	/* _RX_MAX_FRAMES */
>  	       nla_total_size(sizeof(u32)) +	/* _RX_USECS_IRQ */
> @@ -108,7 +117,8 @@ static int coalesce_reply_size(const struct ethnl_req_info *req_base,
>  	       nla_total_size(sizeof(u8)) +	/* _USE_CQE_MODE_RX */
>  	       nla_total_size(sizeof(u32)) +	/* _TX_AGGR_MAX_BYTES */
>  	       nla_total_size(sizeof(u32)) +	/* _TX_AGGR_MAX_FRAMES */
> -	       nla_total_size(sizeof(u32));	/* _TX_AGGR_TIME_USECS */
> +	       nla_total_size(sizeof(u32)) +	/* _TX_AGGR_TIME_USECS */
> +	       total_modersz * 2;		/* _{R,T}X_PROFILE */
>  }
>  
>  static bool coalesce_put_u32(struct sk_buff *skb, u16 attr_type, u32 val,
> @@ -127,6 +137,75 @@ static bool coalesce_put_bool(struct sk_buff *skb, u16 attr_type, u32 val,
>  	return nla_put_u8(skb, attr_type, !!val);
>  }
>  
> +#if IS_ENABLED(CONFIG_DIMLIB)

Can we decrease the use of IS_ENABLED() here, somehow?
Do we need to protect anything else than accesses to dev->irq_moder ?
Does coalesce_put_profile() need CONFIG_DIMLIB to build?

> +/**
> + * coalesce_put_profile - fill reply with a nla nest with four child nla nests.
> + * @skb: socket buffer the message is stored in
> + * @attr_type: nest attr type ETHTOOL_A_COALESCE_*X_PROFILE
> + * @profile: data passed to userspace
> + * @coal_flags: modifiable parameters supported by the driver
> + *
> + * Put a dim profile nest attribute. Refer to ETHTOOL_A_PROFILE_IRQ_MODERATION.
> + *
> + * Return: 0 on success or a negative error code.
> + */
> +static int coalesce_put_profile(struct sk_buff *skb, u16 attr_type,
> +				const struct dim_cq_moder *profile,
> +				u8 coal_flags)
> +{
> +	struct nlattr *profile_attr, *moder_attr;
> +	int i, ret;
> +
> +	if (!profile || !coal_flags)
> +		return 0;
> +
> +	profile_attr = nla_nest_start(skb, attr_type);
> +	if (!profile_attr)
> +		return -EMSGSIZE;
> +
> +	for (i = 0; i < NET_DIM_PARAMS_NUM_PROFILES; i++) {
> +		moder_attr = nla_nest_start(skb, ETHTOOL_A_PROFILE_IRQ_MODERATION);
> +		if (!moder_attr) {
> +			ret = -EMSGSIZE;
> +			goto cancel_profile;
> +		}
> +
> +		if (coal_flags & DIM_COALESCE_USEC) {
> +			ret = nla_put_u32(skb, ETHTOOL_A_IRQ_MODERATION_USEC,
> +					  profile[i].usec);
> +			if (ret)
> +				goto cancel_moder;
> +		}
> +
> +		if (coal_flags & DIM_COALESCE_PKTS) {
> +			ret = nla_put_u32(skb, ETHTOOL_A_IRQ_MODERATION_PKTS,
> +					  profile[i].pkts);
> +			if (ret)
> +				goto cancel_moder;
> +		}
> +
> +		if (coal_flags & DIM_COALESCE_COMPS) {
> +			ret = nla_put_u32(skb, ETHTOOL_A_IRQ_MODERATION_COMPS,
> +					  profile[i].comps);
> +			if (ret)
> +				goto cancel_moder;
> +		}
> +
> +		nla_nest_end(skb, moder_attr);
> +	}
> +
> +	nla_nest_end(skb, profile_attr);
> +
> +	return 0;
> +
> +cancel_moder:
> +	nla_nest_cancel(skb, moder_attr);
> +cancel_profile:
> +	nla_nest_cancel(skb, profile_attr);
> +	return ret;
> +}
> +#endif
> +
>  static int coalesce_fill_reply(struct sk_buff *skb,
>  			       const struct ethnl_req_info *req_base,
>  			       const struct ethnl_reply_data *reply_base)
> @@ -134,6 +213,12 @@ static int coalesce_fill_reply(struct sk_buff *skb,
>  	const struct coalesce_reply_data *data = COALESCE_REPDATA(reply_base);
>  	const struct kernel_ethtool_coalesce *kcoal = &data->kernel_coalesce;
>  	const struct ethtool_coalesce *coal = &data->coalesce;
> +#if IS_ENABLED(CONFIG_DIMLIB)
> +	struct net_device *dev = req_base->dev;
> +	struct dim_irq_moder *irq_moder = dev->irq_moder;
> +	u8 coal_flags;
> +	int ret;
> +#endif
>  	u32 supported = data->supported_params;
>  
>  	if (coalesce_put_u32(skb, ETHTOOL_A_COALESCE_RX_USECS,
> @@ -192,11 +277,51 @@ static int coalesce_fill_reply(struct sk_buff *skb,
>  			     kcoal->tx_aggr_time_usecs, supported))
>  		return -EMSGSIZE;
>  
> +#if IS_ENABLED(CONFIG_DIMLIB)
> +	if (!irq_moder)
> +		return 0;
> +
> +	coal_flags = irq_moder->coal_flags;
> +	rcu_read_lock();
> +	if (irq_moder->profile_flags & DIM_PROFILE_RX) {
> +		ret = coalesce_put_profile(skb, ETHTOOL_A_COALESCE_RX_PROFILE,
> +					   rcu_dereference(irq_moder->rx_profile),

rtnl_deference can be used there, I assume updates are protected by
rtnl_lock

> +					   coal_flags);
> +		if (ret) {
> +			rcu_read_unlock();
> +			return ret;
> +		}
> +	}
> +
> +	if (irq_moder->profile_flags & DIM_PROFILE_TX) {
> +		ret = coalesce_put_profile(skb, ETHTOOL_A_COALESCE_TX_PROFILE,
> +					   rcu_dereference(irq_moder->tx_profile),
> +					   coal_flags);
> +		if (ret) {
> +			rcu_read_unlock();
> +			return ret;
> +		}
> +	}
> +	rcu_read_unlock();
> +#endif
>  	return 0;
>  }
>  
>  /* COALESCE_SET */
>  
> +#if IS_ENABLED(CONFIG_DIMLIB)
> +static const struct nla_policy coalesce_irq_moderation_policy[] = {
> +	[ETHTOOL_A_IRQ_MODERATION_USEC]	= {.type = NLA_U32},
> +	[ETHTOOL_A_IRQ_MODERATION_PKTS]	= {.type = NLA_U32},
> +	[ETHTOOL_A_IRQ_MODERATION_COMPS] = {.type = NLA_U32},

nit: empty spaces around brackets, please ...

> +};
> +
> +static const struct nla_policy coalesce_profile_irq_policy[] = {
> +	[ETHTOOL_A_PROFILE_IRQ_MODERATION] =
> +		NLA_POLICY_NESTED(coalesce_irq_moderation_policy),
> +};
> +#endif
> +
>  const struct nla_policy ethnl_coalesce_set_policy[] = {
>  	[ETHTOOL_A_COALESCE_HEADER]		=
>  		NLA_POLICY_NESTED(ethnl_header_policy),
> @@ -227,6 +352,12 @@ const struct nla_policy ethnl_coalesce_set_policy[] = {
>  	[ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES] = { .type = NLA_U32 },
>  	[ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES] = { .type = NLA_U32 },
>  	[ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS] = { .type = NLA_U32 },

... like here ^

> +#if IS_ENABLED(CONFIG_DIMLIB)
> +	[ETHTOOL_A_COALESCE_RX_PROFILE] =
> +		NLA_POLICY_NESTED(coalesce_profile_irq_policy),
> +	[ETHTOOL_A_COALESCE_TX_PROFILE] =
> +		NLA_POLICY_NESTED(coalesce_profile_irq_policy),
> +#endif
>  };
>  
>  static int
> @@ -234,6 +365,9 @@ ethnl_set_coalesce_validate(struct ethnl_req_info *req_info,
>  			    struct genl_info *info)
>  {
>  	const struct ethtool_ops *ops = req_info->dev->ethtool_ops;
> +#if IS_ENABLED(CONFIG_DIMLIB)
> +	struct net_device *dev = req_info->dev;
> +#endif
>  	struct nlattr **tb = info->attrs;
>  	u32 supported_params;
>  	u16 a;
> @@ -243,6 +377,15 @@ ethnl_set_coalesce_validate(struct ethnl_req_info *req_info,
>  
>  	/* make sure that only supported parameters are present */
>  	supported_params = ops->supported_coalesce_params;
> +#if IS_ENABLED(CONFIG_DIMLIB)
> +	if (dev->irq_moder) {

This may be NULL

> +		if (dev->irq_moder->profile_flags & DIM_PROFILE_RX)
> +			supported_params |= ETHTOOL_COALESCE_RX_PROFILE;
> +
> +		if (dev->irq_moder->profile_flags & DIM_PROFILE_TX)
> +			supported_params |= ETHTOOL_COALESCE_TX_PROFILE;
> +	}
> +#endif
>  	for (a = ETHTOOL_A_COALESCE_RX_USECS; a < __ETHTOOL_A_COALESCE_CNT; a++)
>  		if (tb[a] && !(supported_params & attr_to_mask(a))) {
>  			NL_SET_ERR_MSG_ATTR(info->extack, tb[a],
> @@ -253,12 +396,104 @@ ethnl_set_coalesce_validate(struct ethnl_req_info *req_info,
>  	return 1;
>  }
>  
> +#if IS_ENABLED(CONFIG_DIMLIB)
> +/**
> + * ethnl_update_profile - get a profile nla nest with child nla nests from userspace.
> + * @dev: netdevice to update the profile
> + * @dst: profile get from the driver and modified by ethnl_update_profile.
> + * @nests: nest attr ETHTOOL_A_COALESCE_*X_PROFILE to set profile.
> + * @extack: Netlink extended ack
> + *
> + * Layout of nests:
> + *   Nested ETHTOOL_A_COALESCE_*X_PROFILE attr
> + *     Nested ETHTOOL_A_PROFILE_IRQ_MODERATION attr
> + *       ETHTOOL_A_IRQ_MODERATION_USEC attr
> + *       ETHTOOL_A_IRQ_MODERATION_PKTS attr
> + *       ETHTOOL_A_IRQ_MODERATION_COMPS attr
> + *     ...
> + *     Nested ETHTOOL_A_PROFILE_IRQ_MODERATION attr
> + *       ETHTOOL_A_IRQ_MODERATION_USEC attr
> + *       ETHTOOL_A_IRQ_MODERATION_PKTS attr
> + *       ETHTOOL_A_IRQ_MODERATION_COMPS attr
> + *
> + * Return: 0 on success or a negative error code.
> + */
> +static int ethnl_update_profile(struct net_device *dev,
> +				struct dim_cq_moder __rcu **dst,
> +				const struct nlattr *nests,
> +				struct netlink_ext_ack *extack)
> +{
> +	struct nlattr *moder[ARRAY_SIZE(coalesce_irq_moderation_policy)];
> +	struct dim_irq_moder *irq_moder = dev->irq_moder;
> +	struct dim_cq_moder *new_profile, *old_profile;
> +	int ret, rem, i = 0, len;
> +	struct nlattr *nest;
> +
> +	if (!nests)
> +		return 0;
> +
> +	if (!*dst)
> +		return -EINVAL;
> +
> +	old_profile = rtnl_dereference(*dst);
> +	len = NET_DIM_PARAMS_NUM_PROFILES * sizeof(*old_profile);
> +	new_profile = kmemdup(old_profile, len, GFP_KERNEL);
> +	if (!new_profile)
> +		return -ENOMEM;
> +
> +	nla_for_each_nested_type(nest, ETHTOOL_A_PROFILE_IRQ_MODERATION, nests, rem) {
> +		ret = nla_parse_nested(moder,
> +				       ARRAY_SIZE(coalesce_irq_moderation_policy) - 1,
> +				       nest, coalesce_irq_moderation_policy,
> +				       extack);
> +		if (ret)
> +			return ret;
> +
> +		if (!NL_REQ_ATTR_CHECK(extack, nest, moder, ETHTOOL_A_IRQ_MODERATION_USEC)) {
> +			if (irq_moder->coal_flags & DIM_COALESCE_USEC)

There are 3 options here, not 2:

	if (irq_moder->coal_flags & flag) {
		if (NL_REQ_ATTR_CHECK())
			val = nla_get_u32(...);
		else
			return -EINVAL;
	} else {
		if (moder[attr_type)) {
			BAD_ATTR()
			return -EOPNOTSUPP;
		}
	}

you probably want to factor this out to a helper..

> +				new_profile[i].usec =
> +					nla_get_u32(moder[ETHTOOL_A_IRQ_MODERATION_USEC]);
> +			else
> +				return -EOPNOTSUPP;
> +		}
> +
> +		if (!NL_REQ_ATTR_CHECK(extack, nest, moder, ETHTOOL_A_IRQ_MODERATION_PKTS)) {
> +			if (irq_moder->coal_flags & DIM_COALESCE_PKTS)
> +				new_profile[i].pkts =
> +					nla_get_u32(moder[ETHTOOL_A_IRQ_MODERATION_PKTS]);
> +			else
> +				return -EOPNOTSUPP;
> +		}
> +
> +		if (!NL_REQ_ATTR_CHECK(extack, nest, moder, ETHTOOL_A_IRQ_MODERATION_COMPS)) {
> +			if (irq_moder->coal_flags & DIM_COALESCE_COMPS)
> +				new_profile[i].comps =
> +					nla_get_u32(moder[ETHTOOL_A_IRQ_MODERATION_COMPS]);
> +			else
> +				return -EOPNOTSUPP;
> +		}
> +
> +		i++;
> +	}
> +
> +	rcu_assign_pointer(*dst, new_profile);
> +
> +	synchronize_rcu();
> +	kfree(old_profile);
> +
> +	return 0;
> +}
> +#endif
> +
>  static int
>  __ethnl_set_coalesce(struct ethnl_req_info *req_info, struct genl_info *info,
>  		     bool *dual_change)
>  {
>  	struct kernel_ethtool_coalesce kernel_coalesce = {};
>  	struct net_device *dev = req_info->dev;
> +#if IS_ENABLED(CONFIG_DIMLIB)
> +	struct dim_irq_moder *irq_moder = dev->irq_moder;
> +#endif
>  	struct ethtool_coalesce coalesce = {};
>  	bool mod_mode = false, mod = false;
>  	struct nlattr **tb = info->attrs;
> @@ -317,6 +552,33 @@ __ethnl_set_coalesce(struct ethnl_req_info *req_info, struct genl_info *info,
>  	ethnl_update_u32(&kernel_coalesce.tx_aggr_time_usecs,
>  			 tb[ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS], &mod);
>  
> +#if IS_ENABLED(CONFIG_DIMLIB)
> +	if (!irq_moder)
> +		goto skip_irq_moder;
> +
> +	if (irq_moder->profile_flags & DIM_PROFILE_RX) {
> +		ret = ethnl_update_profile(dev, &irq_moder->rx_profile,
> +					   tb[ETHTOOL_A_COALESCE_RX_PROFILE],
> +					   info->extack);
> +		if (ret < 0)
> +			return ret;
> +	}
> +
> +	if (irq_moder->profile_flags & DIM_PROFILE_TX) {
> +		ret = ethnl_update_profile(dev, &irq_moder->tx_profile,
> +					   tb[ETHTOOL_A_COALESCE_TX_PROFILE],
> +					   info->extack);
> +		if (ret < 0)
> +			return ret;
> +	}
> +
> +skip_irq_moder:
> +#else
> +	if (tb[ETHTOOL_A_COALESCE_RX_PROFILE] ||
> +	    tb[ETHTOOL_A_COALESCE_TX_PROFILE])
> +		return -EOPNOTSUPP;

ethnl_set_coalesce_validate() should already reject this, no?

> +#endif
>  	/* Update operation modes */
>  	ethnl_update_bool32(&coalesce.use_adaptive_rx_coalesce,
>  			    tb[ETHTOOL_A_COALESCE_USE_ADAPTIVE_RX], &mod_mode);


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ