lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 31 May 2024 09:00:57 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Paolo Abeni <pabeni@...hat.com>
Cc: netdev@...r.kernel.org, Jiri Pirko <jiri@...nulli.us>, Madhu Chittim
 <madhu.chittim@...el.com>, Sridhar Samudrala <sridhar.samudrala@...el.com>,
 Simon Horman <horms@...nel.org>, John Fastabend <john.fastabend@...il.com>,
 Sunil Kovvuri Goutham <sgoutham@...vell.com>, Jamal Hadi Salim
 <jhs@...atatu.com>
Subject: Re: [RFC PATCH] net: introduce HW Rate Limiting Driver API

On Fri, 31 May 2024 11:22:46 +0200 Paolo Abeni wrote:
> On Tue, 2024-05-28 at 10:18 -0700, Jakub Kicinski wrote:
> > > +	u32 priority;	/* scheduling strict priority */
> > > +	u32 weight;	/* scheduling WRR weight*/  
> > 
> > I wonder if we should somehow more clearly specify what a node can do.
> > Like Andrew pointed out, if we have a WRR node, presumably the weights
> > go on the children, since there's only one weigh param. But then the
> > WRR node itself is either empty (no params) or has rate params... which
> > is odd.
> > 
> > Maybe shaping nodes and RR nodes should be separate node classes,
> > somehow?  
> 
> Possibly clarifying the meaning of 'weight' field would help?  
> It means: this node is scheduled WRR according to the specified weight
> among the sibling shapers with the same priority.
> 
> I think it's quite simpler than introducing different node classes with
> separate handling. My understanding is that the latter would help only
> to workaround some H/W limitation and will make the implementation more
> difficult for more capable H/W.
> 
> What kind of problems do you foresee with the above definition?

The problem Andrew mentioned, basically. There may not be a path to
transition one fully offloaded hierarchy to another (e.g. switching
strict prio to WRR). TBH I haven't really grasped your proposal in:
https://lore.kernel.org/all/db51b7ccff835dd5a96293fb84d527be081de062.camel@redhat.com/

> > > + * enum net_shaper_scope - the different scopes where a shaper could be attached
> > > + * @NET_SHAPER_SCOPE_PORT:   The root shaper for the whole H/W.
> > > + * @NET_SHAPER_SCOPE_NETDEV: The main shaper for the given network device.
> > > + * @NET_SHAPER_SCOPE_VF:     The shaper is attached to the given virtual
> > > + * function.
> > > + * @NET_SHAPER_SCOPE_QUEUE_GROUP: The shaper groups multiple queues under the
> > > + * same device.
> > > + * @NET_SHAPER_SCOPE_QUEUE:  The shaper is attached to the given device queue.  
> > 
> > I wonder if we need traffic class? Some devices may have two schedulers,
> > one from the host interfaces (PCIe) into the device buffer. And then
> > from the device buffer independently into the wire.  
> 
> I feel like I'm really missing your point here. How would you use
> traffic class? And how the 2 schedulers come into play here? Each of
> them will be tied to one or more of the scopes above, why exposing H/W
> details that will not expand user visible features?

I was just thinking aloud, I'm not sure anyone would ever use a single
host queue to service (ingress) traffic from multiple TCs.
Or if any HW actually supports that. 

We can add it later.

> > > + * NET_SHAPER_SCOPE_PORT and NET_SHAPER_SCOPE_VF are only available on
> > > + * PF devices, usually inside the host/hypervisor.
> > > + * NET_SHAPER_SCOPE_NETDEV, NET_SHAPER_SCOPE_QUEUE_GROUP and
> > > + * NET_SHAPER_SCOPE_QUEUE are available on both PFs and VFs devices.
> > > + */
> > > +enum net_shaper_scope {
> > > +	NET_SHAPER_SCOPE_PORT,
> > > +	NET_SHAPER_SCOPE_NETDEV,
> > > +	NET_SHAPER_SCOPE_VF,  
> > 
> > I realized now that we do indeed need this VF node (if we want to
> > internally express the legacy SRIOV NDOs in this API), as much 
> > as I hate it. Could you annotate somehow my nack on ever exposing
> > the ability to hook on the VF to user space?  
> 
> This work sparked from the need to allow configuring a shaper on
> specific queues of some VF from the host. I hope this is not what you
> are nacking here? Could you please elaborate a bit what concern you
> with 'hook on the VF to user space'? Would the ability of attaching a
> shaper to the VF from the host hit your nack?

Queue configuration, for the VF, from the hypervisor?
I thought it was from the VF.
In any case, hypervisor has the representors. 
Use the representor's NETDEV scope?

> > > +	NET_SHAPER_SCOPE_QUEUE_GROUP,  
> > 
> > We may need a definition for a queue group. Did I suggest this?  
> 
> I think this was mentioned separately by you, John Fastabend and Intel.

Oh ugh, I can't type. I think I meant to say "Why do we need..."

> > Isn't queue group just a bunch of queues feeding a trivial RR node?
> > Why does it need to be a "scope"?  
> 
> The goal is allowing arbitrary manipulation at the queue group level.
> e.g. you can have different queue groups with different priority, or
> weigh or shaping, and below them arbitrary shaping at the queue level.
> 
> Note that a similar concept could be introduced for device (or VFs)
> groups.
> 
> Why would you like to constraint the features avail at the queue
> groups?

Wait! You don't have a way to create pure RR nodes other than queue
group now? Perhaps that's what I'm missing...

> > > +	NET_SHAPER_SCOPE_QUEUE,
> > > +};
> > > +
> > > +/**
> > > + * struct net_shaper_ops - Operations on device H/W shapers
> > > + * @add: Creates a new shaper in the specified scope.  
> > 
> > "in a scope"? Isn't the scope just defining the ingress and egress
> > points of the scheduling hierarchy?  
> 
> This is purely lexical matter, right? The scope, and more specifically
> the full 'handle' comprising more scoped-related information, specifies
> 'where' the shaper is located. Do you have a suggested alternative
> wording?

... and also what confused me here.

How are you going to do 2 layers of grouping with arbitrary shaping?
We need arbitrary inner nodes. Unless I'm missing a trick.

> > Also your example moves schedulers from queue scope to queue group
> > scope.  
> 
> In the example, this part creates/enables a shaper at the queue level:
> 
> 	u32 ghandle = shaper_make_handle(NET_SHAPER_SCOPE_QUEUE_GROUP, 0, 0);
> 	dev->shaper_ops->add(dev, ghandle, &ginfo);
> 
> and this:
> 	
> 	u32 handle = shaper_make_handle(NET_SHAPER_SCOPE_QUEUE, 0, queue_id);
> 	//...
> 	dev->netshaper_ops->move(dev, qhandle, ghandle, NULL);
> 
> changes the _parent_ of qhandle setting it to the previously creates
> queue group shaper. qhandle initial/implicit/default parent was the
> device scope shaper.
> 
> The scope of the queue shaper remains unchanged. An I misunderstanding
> your point?
> 

> > Why "if it exists" ? Core should make sure it exists, no?
> > 
> > In addition to ops and state, the device will likely need to express
> > capabilities of some sort. So that the core can do some work for the
> > drivers and in due course we can expose them to user space for
> > discoverability.  
> 
> What kind of capabilities are you thinking about? supported scopes?
> supported metrics? What else? I feel like there is a lot of
> mixed/partial kind of support which is hard to express in a formal way
> but would fit nicely an extended ack for a failing op - as the SP/WRR
> constrains Andrew reported.
> 
> Do we need to introduce this introspection support from the start? I
> think that having a few H/W implementations around would help (at least
> me) understanding which properties could relevant here.

Try to write good tests which can run on HW for more than one
vendor. The introspection and capabilities will become apparent.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ