lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <66100bee9f979_55e88208fd@john.notmuch>
Date: Fri, 05 Apr 2024 07:34:22 -0700
From: John Fastabend <john.fastabend@...il.com>
To: Simon Horman <horms@...nel.org>, 
 netdev@...r.kernel.org
Cc: Jakub Kicinski <kuba@...nel.org>, 
 Jiri Pirko <jiri@...nulli.us>, 
 Madhu Chittim <madhu.chittim@...el.com>, 
 Sridhar Samudrala <sridhar.samudrala@...el.com>, 
 Paolo Abeni <pabeni@...hat.com>
Subject: RE: [RFC] HW TX Rate Limiting Driver API

Simon Horman wrote:
> Hi,
> 
> This is follow-up to the ongoing discussion started by Intel to extend the
> support for TX shaping H/W offload [1].
> 
> The goal is allowing the user-space to configure TX shaping offload on a
> per-queue basis with min guaranteed B/W, max B/W limit and burst size on a
> VF device.
> 
> 
> In the past few months several different solutions were attempted and
> discussed, without finding a perfect fit:
> 
> - devlink_rate APIs are not appropriate for to control TX shaping on netdevs
> - No existing TC qdisc offload covers the required feature set
> - HTB does not allow direct queue configuration
> - MQPRIO imposes constraint on the maximum number of TX queues

iirc the TX queue limitation was somewhat artifical. Probably we could
extend it.

> - TBF does not support max B/W limit
> - ndo_set_tx_maxrate() only controls the max B/W limit
> 
> A new H/W offload API is needed, but offload API proliferation should be
> avoided.

Did you consider the dcbnl_rtnl_ops? These have the advantage of being
implemented in intel, mlx, amd, broadcom and a few more drivers. There
is an IEEE spec as well fwiw.

This has a getmaxrate, setmaxrate API. Adding a getminrate and setminrate
would be relatively straightforward. The spec defines how to do WRR and
NICs support it.

The main limitation is spec only defined 8 TCs so interface followed
spec. But I think we could make this as large as needed to map TCs
1:1 to queues if you want. By the way someone probably already thought
of it, but having queue sets rate limited proved very useful in the
past. For example having a 40Gbps limit on a 100Gbps NIC likely needs
multiple queues (a TC in DCB language).

Just an idea I don't have the full context, but looks like an easy fit
to me. Assuming the mqprio mapping can be easily extended.

> 
> The following proposal intends to cover the above specified requirement and
> provide a possible base to unify all the shaping offload APIs mentioned above.
> 
> The following only defines the in-kernel interface between the core and
> drivers. The intention is to expose the feature to user-space via Netlink.
> Hopefully the latter part should be straight-forward after agreement
> on the in-kernel interface.
> 
> All feedback and comment is more then welcome!
> 
> [1] https://lore.kernel.org/netdev/20230808015734.1060525-1-wenjun1.wu@intel.com/
> 
> Regards,
> Simon with much assistance from Paolo

Cool to see some hw offloads.

> 
> --- 
> /* SPDX-License-Identifier: GPL-2.0-or-later */
> 
> #ifndef _NET_SHAPER_H_
> #define _NET_SHAPER_H_
> 
> /**
>  * enum shaper_metric - the metric of the shaper
>  * @SHAPER_METRIC_PPS: Shaper operates on a packets per second basis
>  * @SHAPER_METRIC_BPS: Shaper operates on a bits per second basis
>  */
> enum shaper_metric {
> 	SHAPER_METRIC_PPS;
> 	SHAPER_METRIC_BPS;
> };

Interesting hw has a PPS limiter?

> 
> #define SHAPER_ROOT_ID 0
> #define SHAPER_NONE_ID UINT_MAX
> 
> /**
>  * struct shaper_info - represent a node of the shaper hierarchy
>  * @id: Unique identifier inside the shaper tree.
>  * @parent_id: ID of parent shaper, or SHAPER_NONE_ID if the shaper has
>  *             no parent. Only the root shaper has no parent.
>  * @metric: Specify if the bw limits refers to PPS or BPS
>  * @bw_min: Minimum guaranteed rate for this shaper
>  * @bw_max: Maximum peak bw allowed for this shaper
>  * @burst: Maximum burst for the peek rate of this shaper
>  * @priority: Scheduling priority for this shaper
>  * @weight: Scheduling weight for this shaper
>  *
>  * The full shaper hierarchy is maintained only by the
>  * NIC driver (or firmware), possibly in a NIC-specific format
>  * and/or in H/W tables.

Is the hw actually implementing hierarchy? For some reason I
imagined different scopes, PF, VF, QueueSet, Queue. And if it
has more hierarchy maybe FW is just translating this? IF that
is the case perhaps instead of hierarchy we expose a

  enum hw_rate_limit_scope scope

and

  enum hw_rate_limit_scope {
     SHAPER_LOOKUP_BY_PORT,
     SHAPER_LOOKUP_BY_NETDEV,
     SHAPER_LOOKUP_BY_VF,
     SHAPER_LOOKUP_BY_QUEUE_SET,
     SHAPER_LOOKUP_BY_QUEUE,
  }

My preference is almost always to push logic out of firmware
and into OS if possible.

>  * The kernel uses this representation and the shaper_ops to
>  * access, traverse, and update it.
>  */
> struct shaper_info {
> 	/* The following fields allow the full traversal of the whole
> 	 * hierarchy.
> 	 */
> 	u32 id;
> 	u32 parent_id;
> 
> 	/* The following fields define the behavior of the shaper. */
> 	enum shaper_metric metric;
> 	u64 bw_min;
> 	u64 bw_max;
> 	u32 burst;

Any details on how burst is implemented in HW?

> 	u32 priority;

What is priority?

> 	u32 weight;

Weight to me is a reference to a WRR algorithm? Is there some other
notion here?

> };
> 
> /**
>  * enum shaper_lookup_mode - Lookup method used to access a shaper
>  * @SHAPER_LOOKUP_BY_PORT: The root shaper for the whole H/W, @id is unused
>  * @SHAPER_LOOKUP_BY_NETDEV: The main shaper for the given network device,
>  *                           @id is unused
>  * @SHAPER_LOOKUP_BY_VF: @id is a virtual function number.
>  * @SHAPER_LOOKUP_BY_QUEUE: @id is a queue identifier.
>  * @SHAPER_LOOKUP_BY_TREE_ID: @id is the unique shaper identifier inside the
>  *                            shaper hierarchy as in shaper_info.id
>  *
>  * SHAPER_LOOKUP_BY_PORT and SHAPER_LOOKUP_BY_VF, SHAPER_LOOKUP_BY_TREE_ID are
>  * only available on PF devices, usually inside the host/hypervisor.
>  * SHAPER_LOOKUP_BY_NETDEV is available on both PFs and VFs devices, but
>  * only if the latter are privileged ones.
>  * The same shaper can be reached with different lookup mode/id pairs,
>  * mapping network visible objects (devices, VFs, queues) to the scheduler
>  * hierarchy and vice-versa.
>  */
> enum shaper_lookup_mode {
>     SHAPER_LOOKUP_BY_PORT,
>     SHAPER_LOOKUP_BY_NETDEV,
>     SHAPER_LOOKUP_BY_VF,
>     SHAPER_LOOKUP_BY_QUEUE,
>     SHAPER_LOOKUP_BY_TREE_ID,
> };
> 
> 
> /**
>  * struct shaper_ops - Operations on shaper hierarchy
>  * @get: Access the specified shaper.
>  * @set: Modify the specifier shaper.
>  * @move: Move the specifier shaper inside the hierarchy.
>  * @add: Add a shaper inside the shaper hierarchy.
>  * @delete: Delete the specified shaper .
>  *
>  * The netdevice exposes a pointer to these ops.
>  *
>  * It’s up to the driver or firmware to create the default shapers hierarchy,
>  * according to the H/W capabilities.
>  */
> struct shaper_ops {
> 	/* get - Fetch the specified shaper, if it exists
> 	 * @dev: Netdevice to operate on.
> 	 * @lookup_mode: How to perform the shaper lookup
> 	 * @id: ID of the specified shaper,
> 	 *      relative to the specified @lookup_mode.
> 	 * @shaper: Object to return shaper.
> 	 * @extack: Netlink extended ACK for reporting errors.
> 	 *
> 	 * Multiple placement domain/id pairs can refer to the same shaper.
> 	 * And multiple entities (e.g. VF and PF) can try to access the same
> 	 * shaper concurrently.
> 	 *
> 	 * Values of @id depend on the @access_type:
> 	 * * If @access_type is SHAPER_LOOKUP_BY_PORT or
> 	 *   SHAPER_LOOKUP_BY_NETDEV, then @placement_id is unused.
> 	 * * If @access_type is SHAPER_LOOKUP_BY_VF,
> 	 *   then @id is a virtual function number, relative to @dev
> 	 *   which should be phisical function
> 	 * * If @access_type is SHAPER_LOOKUP_BY_QUEUE,
> 	 *   Then @id represents the queue number, relative to @dev
> 	 * * If @access_type is SHAPER_LOOKUP_BY_TREE_ID,
> 	 *   then @id is a @shaper_info.id and any shaper inside the
> 	 *   hierarcy can be accessed directly.
> 	 *
> 	 * Return:
> 	 * * %0 - Success
> 	 * * %-EOPNOTSUPP - Operation is not supported by hardware, driver,
> 	 *		    or core for any reason. @extack should be set
> 	 *		    to text describing the reason.
> 	 * * Other negative error value on failure.
> 	 */
> 	int (*get)(struct net_device *dev,
> 		   enum shaper_lookup_mode lookup_mode, u32 id,
>                    struct shaper_info *shaper, struct netlink_ext_ack *extack);
> 
> 	/* set - Update the specified shaper, if it exists
> 	 * @dev: Netdevice to operate on.
> 	 * @lookup_mode: How to perform the shaper lookup
> 	 * @id: ID of the specified shaper,
> 	 *      relative to the specified @access_type.
> 	 * @shaper: Configuration of shaper.
> 	 * @extack: Netlink extended ACK for reporting errors.
> 	 *
> 	 * Configure the parameters of @shaper according to values supplied
> 	 * in the following fields:
> 	 * * @shaper.metric
> 	 * * @shaper.bw_min
> 	 * * @shaper.bw_max
> 	 * * @shaper.burst
> 	 * * @shaper.priority
> 	 * * @shaper.weight
> 	 * Values supplied in other fields of @shaper must be zero and,
> 	 * other than verifying that, are ignored.
> 	 *
> 	 * Return:
> 	 * * %0 - Success
> 	 * * %-EOPNOTSUPP - Operation is not supported by hardware, driver,
> 	 *		    or core for any reason. @extack should be set
> 	 *		    to text describing the reason.
> 	 * * Other negative error values on failure.
> 	 */
> 	int (*set)(struct net_device *dev,
> 		   enum shaper_lookup_mode lookup_mode, u32 id,
> 		   const struct shaper_info *shaper,
> 		   struct netlink_ext_ack *extack);
> 
> 	/* Move - change the parent id of the specified shaper
> 	 * @dev: netdevice to operate on.
> 	 * @lookup_mode: how to perform the shaper lookup
> 	 * @id: ID of the specified shaper,
> 	 *                      relative to the specified @access_mode.
> 	 * @new_parent_id: new ID of the parent shapers,
> 	 *                      always relative to the SHAPER_LOOKUP_BY_TREE_ID
> 	 *                      lookup mode
> 	 * @extack: Netlink extended ACK for reporting errors.
> 	 *
> 	 * Move the specified shaper in the hierarchy replacing its
> 	 * current parent shaper with @new_parent_id
> 	 *
> 	 * Return:
> 	 * * %0 - Success
> 	 * * %-EOPNOTSUPP - Operation is not supported by hardware, driver,
> 	 *		    or core for any reason. @extack should be set
> 	 *		    to text describing the reason.
> 	 * * Other negative error values on failure.
> 	 */

Some heavy firmware or onchip CPU managing this move operation? Is this
a reset operation as well? Is there a need for an atomic move like this
in initial API? Maybe start with just set and push down an entire config
in one hit. If hw can really move things around dynamically I think it
would make sense though.

> 	int (*move)(struct net_device *dev,
> 		    enum shaper_lookup_mode lookup_mode, u32 id,
> 		    u32 new_parent_id, struct netlink_ext_ack *extack);
> 
> 	/* add - Add a shaper inside the shaper hierarchy
> 	 * @dev: netdevice to operate on.
> 	 * @shaper: configuration of shaper.
> 	 * @extack: Netlink extended ACK for reporting errors.
> 	 *
> 	 * @shaper.id must be set to SHAPER_NONE_ID as
> 	 * the id for the shaper will be automatically allocated.
> 	 * @shaper.parent_id determines where inside the shaper's tree
> 	 * this node is inserted.
> 	 *
> 	 * Return:
> 	 * * non-negative shaper id on success
> 	 * * %-EOPNOTSUPP - Operation is not supported by hardware, driver,
> 	 *		    or core for any reason. @extack should be set
> 	 *		    to text describing the reason.
> 	 * * Other negative error values on failure.
> 	 *
> 	 * Examples or reasons this operation may fail include:
> 	 * * H/W resources limits.
> 	 * * The parent is a ‘leaf’ node - attached to a queue.
> 	 * * Can’t respect the requested bw limits.
> 	 */
> 	int (*add)(struct net_device *dev, const struct shaper_info *shaper,
> 		   struct netlink_ext_ack *extack);
> 
> 	/* delete - Add a shaper inside the shaper hierarchy
> 	 * @dev: netdevice to operate on.
> 	 * @lookup_mode: how to perform the shaper lookup
> 	 * @id: ID of the specified shaper,
> 	 *                      relative to the specified @access_type.
> 	 * @shaper: Object to return the deleted shaper configuration.
> 	 *              Ignored if NULL.
> 	 * @extack: Netlink extended ACK for reporting errors.
> 	 *
> 	 * Return:
> 	 * * %0 - Success
> 	 * * %-EOPNOTSUPP - Operation is not supported by hardware, driver,
> 	 *		    or core for any reason. @extack should be set
> 	 *		    to text describing the reason.
> 	 * * Other negative error values on failure.
> 	 */
> 	int (*delete)(struct net_device *dev,
> 		      enum shaper_lookup_mode lookup_mode,
> 		      u32 id, struct shaper_info *shaper,
> 		      struct netlink_ext_ack *extack);

One thought I have about exposing hierarchy like this is user will need 
to have nic user manual in hand to navigate hardware limitation I presume.
If hw is this flexible than lets do something. But, this is why I started
to think more about scopes (nic, pf, vf, queueSet, queue) than arbitrary
hierarchy. Perhaps this is going to target some DPU though with lots of
flexibility here.

> };
> 
> /*
>  * Examples:
>  * - set shaping on a given queue
>  *   struct shaper_info info = { // fill this };
>  *   dev->shaper_ops->set(dev, SHAPER_LOOKUP_BY_QUEUE, queue_id, &info, NULL);
>  *
>  * - create a queue group with a queue group shaping limits.
>  *   Assuming the following topology already exists:
>  *                    < netdev shaper >
>  *                      /           \
>  *         <queue 0 shaper> . . .  <queue N shaper>
>  *
>  *   struct shaper_info pinfo, ginfo;
>  *   dev->shaper_ops->get(dev, SHAPER_LOOKUP_BY_NETDEV, 0, &pinfo);
>  *
>  *   ginfo.parent_id = pinfo.id;
>  *   // fill-in other shaper params...
>  *   new_node_id = dev->shaper_ops->add(dev, &ginfo);
>  *
>  *   // now topology is:
>  *   //                  <    netdev shaper    >
>  *   //                  /            |        \
>  *   //                 /             |        <newly created shaper>
>  *   //                /              |
>  *   // <queue 0 shaper> . . . <queue N shaper>

I think we need a queue set shaper as part of this for larger bw limits.

>  *
>  *   // move a shapers for queues 3..n out of such queue group
>  *   for (i = 0; i <= 2; ++i)
>  *           dev->shaper_ops->move(dev, SHAPER_LOOKUP_BY_QUEUE, i, new_node_id);
>  *
>  *   // now topology is:
>  *   //                   < netdev shaper >
>  *   //                   /              \
>  *   //        <newly created shaper>   <queue 3 shaper> ... <queue n shaper>
>  *   //         /                   \
>  *   // <queue 0 shaper> ... <queue 2 shaper>
>  */
> #endif
> 
> 

Thanks,
John

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ