netdev - Re: [RFC] HW TX Rate Limiting Driver API

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <91451f2da3dcd70de3138975ad7d21f0548e19c9.camel@redhat.com>
Date: Wed, 10 Apr 2024 10:33:50 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: Jakub Kicinski <kuba@...nel.org>, Simon Horman <horms@...nel.org>
Cc: netdev@...r.kernel.org, Jiri Pirko <jiri@...nulli.us>, Madhu Chittim
	 <madhu.chittim@...el.com>, Sridhar Samudrala <sridhar.samudrala@...el.com>
Subject: Re: [RFC] HW TX Rate Limiting Driver API

On Tue, 2024-04-09 at 15:32 -0700, Jakub Kicinski wrote:
> On Fri, 5 Apr 2024 11:23:13 +0100 Simon Horman wrote:
> > /**
> >  * enum shaper_lookup_mode - Lookup method used to access a shaper
> >  * @SHAPER_LOOKUP_BY_PORT: The root shaper for the whole H/W, @id is unused
> >  * @SHAPER_LOOKUP_BY_NETDEV: The main shaper for the given network device,
> >  *                           @id is unused
> >  * @SHAPER_LOOKUP_BY_VF: @id is a virtual function number.
> >  * @SHAPER_LOOKUP_BY_QUEUE: @id is a queue identifier.
> >  * @SHAPER_LOOKUP_BY_TREE_ID: @id is the unique shaper identifier inside the
> >  *                            shaper hierarchy as in shaper_info.id
> >  *
> >  * SHAPER_LOOKUP_BY_PORT and SHAPER_LOOKUP_BY_VF, SHAPER_LOOKUP_BY_TREE_ID are
> >  * only available on PF devices, usually inside the host/hypervisor.
> >  * SHAPER_LOOKUP_BY_NETDEV is available on both PFs and VFs devices, but
> >  * only if the latter are privileged ones.
> 
> The privileged part is an implementation limitation. 
> Limiting oneself
> or subqueues should not require elevated privileges, in principle,
> right?

The reference to privileged functions is here to try to ensure proper
isolation when required.

E.g. Let's suppose the admin in the the host wants to restricts/limits
the B/W for a given VF (the VF itself, not the representor! See below
WRT shaper_lookup_mode) to some rate, he/she likely wants intends
additionally preventing the guest from relaxing the setting configuring
the such rate on the guest device.

> >  * The same shaper can be reached with different lookup mode/id pairs,
> >  * mapping network visible objects (devices, VFs, queues) to the scheduler
> >  * hierarchy and vice-versa.
> 
> :o
> 
> > enum shaper_lookup_mode {
> >     SHAPER_LOOKUP_BY_PORT,
> >     SHAPER_LOOKUP_BY_NETDEV,
> >     SHAPER_LOOKUP_BY_VF,
> >     SHAPER_LOOKUP_BY_QUEUE,
> >     SHAPER_LOOKUP_BY_TREE_ID,
> 
> Two questions.
> 
> 1) are these looking up real nodes or some special kind of node which
> can't have settings assigned directly? 
> IOW if I want to rate limit 
> the port do I get + set the port object or create a node above it and
> link it up?

There is no concept of such special kind of nodes. Probably the above
enum needs a better/clearer definition of each element.
How to reach a specific configuration for the port shaper depends on
the NIC defaults - whatever hierarchy it provides/creates at
initialization time. 

The NIC/firmware can either provide a default shaper for the port level
- in such case the user/admin just need to set it. Otherwise user/admin
will have to create the shaper and link it.

I guess the first option should be more common/the expected one.

This proposal allows both cases.

> Or do those special nodes not exit implicitly (from the example it
> seems like they do?)

Could you please re-phrase the above?

> 2) the objects themselves seem rather different. I'm guessing we need
> port/netdev/vf because either some users don't want to use switchdev
> (vf = repr netdev) or drivers don't implement it "correctly" (port !=
> netdev?!)?

Yes, the nodes inside the hierarchy can be linked to very different
objects. The different lookup mode are there just to provide easy
access to relevant objects.

Likely a much better description is needed:  'port' here is really the
cable plug level, 'netdev' refers to the Linux network device. There
could be multiple netdev for the same port as 'netdev' could be either
referring to a PF or a VFs. Finally VF is really the virtual function
device, not the representor, so that the host can configure/limits the
guest tx rate. 

The same shaper can be reached/looked-up with different ids.

e.g. the device level shaper for a virtual function can be reached
with:

- SHAPER_LOOKUP_BY_TREE_ID + unique tree id (every node is reachable
this way) from any host device in the same hierarcy
- SHAPER_LOOKUP_BY_VF + virtual function id, from the host PF device
- SHAPER_LOOKUP_BY_NETDEV, from the guest VF device

> Also feels like some of these are root nodes, some are leaf nodes?

There is a single root node (the port's parent), possibly many internal
nodes (port, netdev, vf, possibly more intermediate levels depending on
the actual configuration [e.g. the firmware or the admin could create
'device groups' or 'queue groups']) and likely many leave nodes (queue
level).

My personal take is than from an API point of view differentiating
between leaves and internal nodes makes the API more complex with no
practical advantage for the API users.

> > };
> > 
> > 
> > /**
> >  * struct shaper_ops - Operations on shaper hierarchy
> >  * @get: Access the specified shaper.
> >  * @set: Modify the specifier shaper.
> >  * @move: Move the specifier shaper inside the hierarchy.
> >  * @add: Add a shaper inside the shaper hierarchy.
> >  * @delete: Delete the specified shaper .
> >  *
> >  * The netdevice exposes a pointer to these ops.
> >  *
> >  * It’s up to the driver or firmware to create the default shapers hierarchy,
> >  * according to the H/W capabilities.
> >  */
> > struct shaper_ops {
> > 	/* get - Fetch the specified shaper, if it exists
> > 	 * @dev: Netdevice to operate on.
> > 	 * @lookup_mode: How to perform the shaper lookup
> > 	 * @id: ID of the specified shaper,
> > 	 *      relative to the specified @lookup_mode.
> > 	 * @shaper: Object to return shaper.
> > 	 * @extack: Netlink extended ACK for reporting errors.
> > 	 *
> > 	 * Multiple placement domain/id pairs can refer to the same shaper.
> > 	 * And multiple entities (e.g. VF and PF) can try to access the same
> > 	 * shaper concurrently.
> > 	 *
> > 	 * Values of @id depend on the @access_type:
> > 	 * * If @access_type is SHAPER_LOOKUP_BY_PORT or
> > 	 *   SHAPER_LOOKUP_BY_NETDEV, then @placement_id is unused.
> > 	 * * If @access_type is SHAPER_LOOKUP_BY_VF,
> > 	 *   then @id is a virtual function number, relative to @dev
> > 	 *   which should be phisical function
> > 	 * * If @access_type is SHAPER_LOOKUP_BY_QUEUE,
> > 	 *   Then @id represents the queue number, relative to @dev
> > 	 * * If @access_type is SHAPER_LOOKUP_BY_TREE_ID,
> > 	 *   then @id is a @shaper_info.id and any shaper inside the
> > 	 *   hierarcy can be accessed directly.
> > 	 *
> > 	 * Return:
> > 	 * * %0 - Success
> > 	 * * %-EOPNOTSUPP - Operation is not supported by hardware, driver,
> > 	 *		    or core for any reason. @extack should be set
> > 	 *		    to text describing the reason.
> > 	 * * Other negative error value on failure.
> > 	 */
> > 	int (*get)(struct net_device *dev,
> > 		   enum shaper_lookup_mode lookup_mode, u32 id,
> >                    struct shaper_info *shaper, struct netlink_ext_ack *extack);
> 
> How about we store the hierarchy in the core?
> Assume core has the source of truth, no need to get?

One design choice was _not_ duplicating the hierarchy in the core: the
full hierarchy is maintained only by the NIC/firmware. The NIC/firmware
can perform some changes "automatically" e.g. when adding or deleting
VFs or queues it will likely create/delete the paired shaper. The
synchronization looks cumbersome and error prone.

The hierarchy could be exposed to both host and guests, I guess only
the host core could be the source of truth, right?


Thanks for the feedback,

Paolo