netdev - Re: [RFC PATCH] net: introduce HW Rate Limiting Driver API

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240531090057.02fb8616@kernel.org>
Date: Fri, 31 May 2024 09:00:57 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Paolo Abeni <pabeni@...hat.com>
Cc: netdev@...r.kernel.org, Jiri Pirko <jiri@...nulli.us>, Madhu Chittim
 <madhu.chittim@...el.com>, Sridhar Samudrala <sridhar.samudrala@...el.com>,
 Simon Horman <horms@...nel.org>, John Fastabend <john.fastabend@...il.com>,
 Sunil Kovvuri Goutham <sgoutham@...vell.com>, Jamal Hadi Salim
 <jhs@...atatu.com>
Subject: Re: [RFC PATCH] net: introduce HW Rate Limiting Driver API

On Fri, 31 May 2024 11:22:46 +0200 Paolo Abeni wrote:
> On Tue, 2024-05-28 at 10:18 -0700, Jakub Kicinski wrote:
> > > +	u32 priority;	/* scheduling strict priority */
> > > +	u32 weight;	/* scheduling WRR weight*/  
> > 
> > I wonder if we should somehow more clearly specify what a node can do.
> > Like Andrew pointed out, if we have a WRR node, presumably the weights
> > go on the children, since there's only one weigh param. But then the
> > WRR node itself is either empty (no params) or has rate params... which
> > is odd.
> > 
> > Maybe shaping nodes and RR nodes should be separate node classes,
> > somehow?  
> 
> Possibly clarifying the meaning of 'weight' field would help?  
> It means: this node is scheduled WRR according to the specified weight
> among the sibling shapers with the same priority.
> 
> I think it's quite simpler than introducing different node classes with
> separate handling. My understanding is that the latter would help only
> to workaround some H/W limitation and will make the implementation more
> difficult for more capable H/W.
> 
> What kind of problems do you foresee with the above definition?

The problem Andrew mentioned, basically. There may not be a path to
transition one fully offloaded hierarchy to another (e.g. switching
strict prio to WRR). TBH I haven't really grasped your proposal in:
https://lore.kernel.org/all/db51b7ccff835dd5a96293fb84d527be081de062.camel@redhat.com/

> > > + * enum net_shaper_scope - the different scopes where a shaper could be attached
> > > + * @NET_SHAPER_SCOPE_PORT:   The root shaper for the whole H/W.
> > > + * @NET_SHAPER_SCOPE_NETDEV: The main shaper for the given network device.
> > > + * @NET_SHAPER_SCOPE_VF:     The shaper is attached to the given virtual
> > > + * function.
> > > + * @NET_SHAPER_SCOPE_QUEUE_GROUP: The shaper groups multiple queues under the
> > > + * same device.
> > > + * @NET_SHAPER_SCOPE_QUEUE:  The shaper is attached to the given device queue.  
> > 
> > I wonder if we need traffic class? Some devices may have two schedulers,
> > one from the host interfaces (PCIe) into the device buffer. And then
> > from the device buffer independently into the wire.  
> 
> I feel like I'm really missing your point here. How would you use
> traffic class? And how the 2 schedulers come into play here? Each of
> them will be tied to one or more of the scopes above, why exposing H/W
> details that will not expand user visible features?

I was just thinking aloud, I'm not sure anyone would ever use a single
host queue to service (ingress) traffic from multiple TCs.
Or if any HW actually supports that. 

We can add it later.

> > > + * NET_SHAPER_SCOPE_PORT and NET_SHAPER_SCOPE_VF are only available on
> > > + * PF devices, usually inside the host/hypervisor.
> > > + * NET_SHAPER_SCOPE_NETDEV, NET_SHAPER_SCOPE_QUEUE_GROUP and
> > > + * NET_SHAPER_SCOPE_QUEUE are available on both PFs and VFs devices.
> > > + */
> > > +enum net_shaper_scope {
> > > +	NET_SHAPER_SCOPE_PORT,
> > > +	NET_SHAPER_SCOPE_NETDEV,
> > > +	NET_SHAPER_SCOPE_VF,  
> > 
> > I realized now that we do indeed need this VF node (if we want to
> > internally express the legacy SRIOV NDOs in this API), as much 
> > as I hate it. Could you annotate somehow my nack on ever exposing
> > the ability to hook on the VF to user space?  
> 
> This work sparked from the need to allow configuring a shaper on
> specific queues of some VF from the host. I hope this is not what you
> are nacking here? Could you please elaborate a bit what concern you
> with 'hook on the VF to user space'? Would the ability of attaching a
> shaper to the VF from the host hit your nack?

Queue configuration, for the VF, from the hypervisor?
I thought it was from the VF.
In any case, hypervisor has the representors. 
Use the representor's NETDEV scope?

> > > +	NET_SHAPER_SCOPE_QUEUE_GROUP,  
> > 
> > We may need a definition for a queue group. Did I suggest this?  
> 
> I think this was mentioned separately by you, John Fastabend and Intel.

Oh ugh, I can't type. I think I meant to say "Why do we need..."

> > Isn't queue group just a bunch of queues feeding a trivial RR node?
> > Why does it need to be a "scope"?  
> 
> The goal is allowing arbitrary manipulation at the queue group level.
> e.g. you can have different queue groups with different priority, or
> weigh or shaping, and below them arbitrary shaping at the queue level.
> 
> Note that a similar concept could be introduced for device (or VFs)
> groups.
> 
> Why would you like to constraint the features avail at the queue
> groups?

Wait! You don't have a way to create pure RR nodes other than queue
group now? Perhaps that's what I'm missing...

> > > +	NET_SHAPER_SCOPE_QUEUE,
> > > +};
> > > +
> > > +/**
> > > + * struct net_shaper_ops - Operations on device H/W shapers
> > > + * @add: Creates a new shaper in the specified scope.  
> > 
> > "in a scope"? Isn't the scope just defining the ingress and egress
> > points of the scheduling hierarchy?  
> 
> This is purely lexical matter, right? The scope, and more specifically
> the full 'handle' comprising more scoped-related information, specifies
> 'where' the shaper is located. Do you have a suggested alternative
> wording?

... and also what confused me here.

How are you going to do 2 layers of grouping with arbitrary shaping?
We need arbitrary inner nodes. Unless I'm missing a trick.

> > Also your example moves schedulers from queue scope to queue group
> > scope.  
> 
> In the example, this part creates/enables a shaper at the queue level:
> 
> 	u32 ghandle = shaper_make_handle(NET_SHAPER_SCOPE_QUEUE_GROUP, 0, 0);
> 	dev->shaper_ops->add(dev, ghandle, &ginfo);
> 
> and this:
> 	
> 	u32 handle = shaper_make_handle(NET_SHAPER_SCOPE_QUEUE, 0, queue_id);
> 	//...
> 	dev->netshaper_ops->move(dev, qhandle, ghandle, NULL);
> 
> changes the _parent_ of qhandle setting it to the previously creates
> queue group shaper. qhandle initial/implicit/default parent was the
> device scope shaper.
> 
> The scope of the queue shaper remains unchanged. An I misunderstanding
> your point?
> 

> > Why "if it exists" ? Core should make sure it exists, no?
> > 
> > In addition to ops and state, the device will likely need to express
> > capabilities of some sort. So that the core can do some work for the
> > drivers and in due course we can expose them to user space for
> > discoverability.  
> 
> What kind of capabilities are you thinking about? supported scopes?
> supported metrics? What else? I feel like there is a lot of
> mixed/partial kind of support which is hard to express in a formal way
> but would fit nicely an extended ack for a failing op - as the SP/WRR
> constrains Andrew reported.
> 
> Do we need to introduce this introspection support from the start? I
> think that having a few H/W implementations around would help (at least
> me) understanding which properties could relevant here.

Try to write good tests which can run on HW for more than one
vendor. The introspection and capabilities will become apparent.