netdev - Re: [RFC PATCH] net: introduce HW Rate Limiting Driver API

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <16d7b761c3c1b4c4bd327d4486d958682a5f33dd.camel@redhat.com>
Date: Fri, 31 May 2024 11:22:46 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: netdev@...r.kernel.org, Jiri Pirko <jiri@...nulli.us>, Madhu Chittim
 <madhu.chittim@...el.com>, Sridhar Samudrala <sridhar.samudrala@...el.com>,
  Simon Horman <horms@...nel.org>, John Fastabend
 <john.fastabend@...il.com>, Sunil Kovvuri Goutham <sgoutham@...vell.com>,
 Jamal Hadi Salim <jhs@...atatu.com>
Subject: Re: [RFC PATCH] net: introduce HW Rate Limiting Driver API

Hi,

Thank you for your time and reply. 

On Tue, 2024-05-28 at 10:18 -0700, Jakub Kicinski wrote:
> > +	u32 priority;	/* scheduling strict priority */
> > +	u32 weight;	/* scheduling WRR weight*/
> 
> I wonder if we should somehow more clearly specify what a node can do.
> Like Andrew pointed out, if we have a WRR node, presumably the weights
> go on the children, since there's only one weigh param. But then the
> WRR node itself is either empty (no params) or has rate params... which
> is odd.
> 
> Maybe shaping nodes and RR nodes should be separate node classes,
> somehow?

Possibly clarifying the meaning of 'weight' field would help?  
It means: this node is scheduled WRR according to the specified weight
among the sibling shapers with the same priority.

I think it's quite simpler than introducing different node classes with
separate handling. My understanding is that the latter would help only
to workaround some H/W limitation and will make the implementation more
difficult for more capable H/W.

What kind of problems do you foresee with the above definition?

> > +};
> > +
> > +/**
> > + * enum net_shaper_scope - the different scopes where a shaper could be attached
> > + * @NET_SHAPER_SCOPE_PORT:   The root shaper for the whole H/W.
> > + * @NET_SHAPER_SCOPE_NETDEV: The main shaper for the given network device.
> > + * @NET_SHAPER_SCOPE_VF:     The shaper is attached to the given virtual
> > + * function.
> > + * @NET_SHAPER_SCOPE_QUEUE_GROUP: The shaper groups multiple queues under the
> > + * same device.
> > + * @NET_SHAPER_SCOPE_QUEUE:  The shaper is attached to the given device queue.
> 
> I wonder if we need traffic class? Some devices may have two schedulers,
> one from the host interfaces (PCIe) into the device buffer. And then
> from the device buffer independently into the wire.

I feel like I'm really missing your point here. How would you use
traffic class? And how the 2 schedulers come into play here? Each of
them will be tied to one or more of the scopes above, why exposing H/W
details that will not expand user visible features?

> > + * NET_SHAPER_SCOPE_PORT and NET_SHAPER_SCOPE_VF are only available on
> > + * PF devices, usually inside the host/hypervisor.
> > + * NET_SHAPER_SCOPE_NETDEV, NET_SHAPER_SCOPE_QUEUE_GROUP and
> > + * NET_SHAPER_SCOPE_QUEUE are available on both PFs and VFs devices.
> > + */
> > +enum net_shaper_scope {
> > +	NET_SHAPER_SCOPE_PORT,
> > +	NET_SHAPER_SCOPE_NETDEV,
> > +	NET_SHAPER_SCOPE_VF,
> 
> I realized now that we do indeed need this VF node (if we want to
> internally express the legacy SRIOV NDOs in this API), as much 
> as I hate it. Could you annotate somehow my nack on ever exposing
> the ability to hook on the VF to user space?

This work sparked from the need to allow configuring a shaper on
specific queues of some VF from the host. I hope this is not what you
are nacking here? Could you please elaborate a bit what concern you
with 'hook on the VF to user space'? Would the ability of attaching a
shaper to the VF from the host hit your nack?

> 
> > +	NET_SHAPER_SCOPE_QUEUE_GROUP,
> 
> We may need a definition for a queue group. Did I suggest this?

I think this was mentioned separately by you, John Fastabend and Intel.

> Isn't queue group just a bunch of queues feeding a trivial RR node?
> Why does it need to be a "scope"?

The goal is allowing arbitrary manipulation at the queue group level.
e.g. you can have different queue groups with different priority, or
weigh or shaping, and below them arbitrary shaping at the queue level.

Note that a similar concept could be introduced for device (or VFs)
groups.

Why would you like to constraint the features avail at the queue
groups?

> 
> > +	NET_SHAPER_SCOPE_QUEUE,
> > +};
> > +
> > +/**
> > + * struct net_shaper_ops - Operations on device H/W shapers
> > + * @add: Creates a new shaper in the specified scope.
> 
> "in a scope"? Isn't the scope just defining the ingress and egress
> points of the scheduling hierarchy?

This is purely lexical matter, right? The scope, and more specifically
the full 'handle' comprising more scoped-related information, specifies
'where' the shaper is located. Do you have a suggested alternative
wording?

> Also your example moves schedulers from queue scope to queue group
> scope.

In the example, this part creates/enables a shaper at the queue level:

	u32 ghandle = shaper_make_handle(NET_SHAPER_SCOPE_QUEUE_GROUP, 0, 0);
	dev->shaper_ops->add(dev, ghandle, &ginfo);

and this:
	
	u32 handle = shaper_make_handle(NET_SHAPER_SCOPE_QUEUE, 0, queue_id);
	//...
	dev->netshaper_ops->move(dev, qhandle, ghandle, NULL);

changes the _parent_ of qhandle setting it to the previously creates
queue group shaper. qhandle initial/implicit/default parent was the
device scope shaper.

The scope of the queue shaper remains unchanged. An I misunderstanding
your point?


> 
> > + * @set: Modify the existing shaper.
> > + * @delete: Delete the specified shaper.
> > + * @move: Move an existing shaper under a different parent.
> > + *
> > + * The initial shaping configuration ad device initialization is empty/
> 
> and
> 
> > + * a no-op/does not constraint the b/w in any way.
> > + * The network core keeps track of the applied user-configuration in
> > + * per device storage.
> 
> "keeps track .. per device" -- "storage" may make people think NVM.
> 
> > + * Each shaper is uniquely identified within the device with an 'handle',
> > + * dependent on the shaper scope and other data, see @shaper_make_handle()
> > + */
> > +struct net_shaper_ops {
> > +	/** add - Add a shaper inside the shaper hierarchy
> > +	 * @dev: netdevice to operate on
> > +	 * @handle: the shaper indetifier
> > +	 * @shaper: configuration of shaper
> > +	 * @extack: Netlink extended ACK for reporting errors.
> > +	 *
> > +	 * Return:
> > +	 * * 0 on success
> > +	 * * %-EOPNOTSUPP - Operation is not supported by hardware, driver,
> > +	 *                  or core for any reason. @extack should be set to
> > +	 *                  text describing the reason.
> > +	 * * Other negative error values on failure.
> > +	 *
> > +	 * Examples or reasons this operation may fail include:
> > +	 * * H/W resources limits.
> > +	 * * Can’t respect the requested bw limits.
> > +	 */
> > +	int (*add)(struct net_device *dev, u32 handle,
> > +		   const struct net_shaper_info *shaper,
> > +		   struct netlink_ext_ack *extack);
> > +
> > +	/** set - Update the specified shaper, if it exists
> 
> Why "if it exists" ? Core should make sure it exists, no?
> 
> In addition to ops and state, the device will likely need to express
> capabilities of some sort. So that the core can do some work for the
> drivers and in due course we can expose them to user space for
> discoverability.

What kind of capabilities are you thinking about? supported scopes?
supported metrics? What else? I feel like there is a lot of
mixed/partial kind of support which is hard to express in a formal way
but would fit nicely an extended ack for a failing op - as the SP/WRR
constrains Andrew reported.

Do we need to introduce this introspection support from the start? I
think that having a few H/W implementations around would help (at least
me) understanding which properties could relevant here.

Thanks,

Paolo