netdev - Re: [RFC PATCH] net: introduce HW Rate Limiting Driver API

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240528101845.414cff22@kernel.org>
Date: Tue, 28 May 2024 10:18:45 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Paolo Abeni <pabeni@...hat.com>
Cc: netdev@...r.kernel.org, Jiri Pirko <jiri@...nulli.us>, Madhu Chittim
 <madhu.chittim@...el.com>, Sridhar Samudrala <sridhar.samudrala@...el.com>,
 Simon Horman <horms@...nel.org>, John Fastabend <john.fastabend@...il.com>,
 Sunil Kovvuri Goutham <sgoutham@...vell.com>, Jamal Hadi Salim
 <jhs@...atatu.com>
Subject: Re: [RFC PATCH] net: introduce HW Rate Limiting Driver API

On Wed,  8 May 2024 22:20:51 +0200 Paolo Abeni wrote:
> +/**
> + * struct net_shaper_info - represents a shaping node on the NIC H/W
> + * @metric: Specify if the bw limits refers to PPS or BPS
> + * @bw_min: Minimum guaranteed rate for this shaper
> + * @bw_max: Maximum peak bw allowed for this shaper
> + * @burst: Maximum burst for the peek rate of this shaper
> + * @priority: Scheduling priority for this shaper
> + * @weight: Scheduling weight for this shaper
> + */
> +struct net_shaper_info {
> +	enum net_shaper_metric metric;
> +	u64 bw_min;	/* minimum guaranteed bandwidth, according to metric */
> +	u64 bw_max;	/* maximum allowed bandwidth */
> +	u32 burst;	/* maximum burst in bytes for bw_max */

Burst is burst, either we need two or we assume it's for both bw_min
and bw_max, but it most certainly can't be just for bw_max.

Also presumably not just bytes - if "metric" is pps, burst is pps?

> +	u32 priority;	/* scheduling strict priority */
> +	u32 weight;	/* scheduling WRR weight*/

I wonder if we should somehow more clearly specify what a node can do.
Like Andrew pointed out, if we have a WRR node, presumably the weights
go on the children, since there's only one weigh param. But then the
WRR node itself is either empty (no params) or has rate params... which
is odd.

Maybe shaping nodes and RR nodes should be separate node classes,
somehow?

> +};
> +
> +/**
> + * enum net_shaper_scope - the different scopes where a shaper could be attached
> + * @NET_SHAPER_SCOPE_PORT:   The root shaper for the whole H/W.
> + * @NET_SHAPER_SCOPE_NETDEV: The main shaper for the given network device.
> + * @NET_SHAPER_SCOPE_VF:     The shaper is attached to the given virtual
> + * function.
> + * @NET_SHAPER_SCOPE_QUEUE_GROUP: The shaper groups multiple queues under the
> + * same device.
> + * @NET_SHAPER_SCOPE_QUEUE:  The shaper is attached to the given device queue.

I wonder if we need traffic class? Some devices may have two schedulers,
one from the host interfaces (PCIe) into the device buffer. And then
from the device buffer independently into the wire.

> + * NET_SHAPER_SCOPE_PORT and NET_SHAPER_SCOPE_VF are only available on
> + * PF devices, usually inside the host/hypervisor.
> + * NET_SHAPER_SCOPE_NETDEV, NET_SHAPER_SCOPE_QUEUE_GROUP and
> + * NET_SHAPER_SCOPE_QUEUE are available on both PFs and VFs devices.
> + */
> +enum net_shaper_scope {
> +	NET_SHAPER_SCOPE_PORT,
> +	NET_SHAPER_SCOPE_NETDEV,
> +	NET_SHAPER_SCOPE_VF,

I realized now that we do indeed need this VF node (if we want to
internally express the legacy SRIOV NDOs in this API), as much 
as I hate it. Could you annotate somehow my nack on ever exposing
the ability to hook on the VF to user space?

> +	NET_SHAPER_SCOPE_QUEUE_GROUP,

We may need a definition for a queue group. Did I suggest this?
Isn't queue group just a bunch of queues feeding a trivial RR node?
Why does it need to be a "scope"?

> +	NET_SHAPER_SCOPE_QUEUE,
> +};
> +
> +/**
> + * struct net_shaper_ops - Operations on device H/W shapers
> + * @add: Creates a new shaper in the specified scope.

"in a scope"? Isn't the scope just defining the ingress and egress
points of the scheduling hierarchy?

Also your example moves schedulers from queue scope to queue group
scope.

> + * @set: Modify the existing shaper.
> + * @delete: Delete the specified shaper.
> + * @move: Move an existing shaper under a different parent.
> + *
> + * The initial shaping configuration ad device initialization is empty/

and

> + * a no-op/does not constraint the b/w in any way.
> + * The network core keeps track of the applied user-configuration in
> + * per device storage.

"keeps track .. per device" -- "storage" may make people think NVM.

> + * Each shaper is uniquely identified within the device with an 'handle',
> + * dependent on the shaper scope and other data, see @shaper_make_handle()
> + */
> +struct net_shaper_ops {
> +	/** add - Add a shaper inside the shaper hierarchy
> +	 * @dev: netdevice to operate on
> +	 * @handle: the shaper indetifier
> +	 * @shaper: configuration of shaper
> +	 * @extack: Netlink extended ACK for reporting errors.
> +	 *
> +	 * Return:
> +	 * * 0 on success
> +	 * * %-EOPNOTSUPP - Operation is not supported by hardware, driver,
> +	 *                  or core for any reason. @extack should be set to
> +	 *                  text describing the reason.
> +	 * * Other negative error values on failure.
> +	 *
> +	 * Examples or reasons this operation may fail include:
> +	 * * H/W resources limits.
> +	 * * Can’t respect the requested bw limits.
> +	 */
> +	int (*add)(struct net_device *dev, u32 handle,
> +		   const struct net_shaper_info *shaper,
> +		   struct netlink_ext_ack *extack);
> +
> +	/** set - Update the specified shaper, if it exists

Why "if it exists" ? Core should make sure it exists, no?

In addition to ops and state, the device will likely need to express
capabilities of some sort. So that the core can do some work for the
drivers and in due course we can expose them to user space for
discoverability.