lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f7fa91a89f16e45de56c1aa8d2c533c6f94648ba.camel@redhat.com>
Date: Thu, 09 May 2024 17:43:33 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: Andrew Lunn <andrew@...n.ch>
Cc: netdev@...r.kernel.org, Jakub Kicinski <kuba@...nel.org>, Jiri Pirko
 <jiri@...nulli.us>, Madhu Chittim <madhu.chittim@...el.com>, Sridhar
 Samudrala <sridhar.samudrala@...el.com>, Simon Horman <horms@...nel.org>,
 John Fastabend <john.fastabend@...il.com>, Sunil Kovvuri Goutham
 <sgoutham@...vell.com>,  Jamal Hadi Salim <jhs@...atatu.com>
Subject: Re: [RFC PATCH] net: introduce HW Rate Limiting Driver API

On Thu, 2024-05-09 at 17:00 +0200, Andrew Lunn wrote:
> On Thu, May 09, 2024 at 04:19:52PM +0200, Paolo Abeni wrote:
> > On Wed, 2024-05-08 at 23:47 +0200, Andrew Lunn wrote:
> > > > + * struct net_shaper_info - represents a shaping node on the NIC H/W
> > > > + * @metric: Specify if the bw limits refers to PPS or BPS
> > > > + * @bw_min: Minimum guaranteed rate for this shaper
> > > > + * @bw_max: Maximum peak bw allowed for this shaper
> > > > + * @burst: Maximum burst for the peek rate of this shaper
> > > > + * @priority: Scheduling priority for this shaper
> > > > + * @weight: Scheduling weight for this shaper
> > > > + */
> > > > +struct net_shaper_info {
> > > > +	enum net_shaper_metric metric;
> > > > +	u64 bw_min;	/* minimum guaranteed bandwidth, according to metric */
> > > > +	u64 bw_max;	/* maximum allowed bandwidth */
> > > > +	u32 burst;	/* maximum burst in bytes for bw_max */
> > > > +	u32 priority;	/* scheduling strict priority */
> > > 
> > > Above it say priority. Here is strict priority? Is there a difference?
> > 
> > the 'priority' field is the strict priority for scheduling. We will
> > clarify the first comment.
> > 
> > > 
> > > > +	u32 weight;	/* scheduling WRR weight*/
> > > > +};
> > > 
> > > Are there any special semantics for weight? Looking at the hardware i
> > > have, which has 8 queues for a switch port, i can either set it to
> > > strict priority, meaning queue 7 needs to be empty before it look at
> > > queue 6, and it will only look at queue 5 when 6 is empty etc. Or i
> > > can set weights per queue. How would i expect strict priority?
> > 
> > The expected behaviour is that schedulers at the same level with the
> > same priority will be served according to their weight.
> > 
> > I'm unsure if I read your question correctly. My understanding is that
> > the your H/W don't allow strict priority and WRR simultaneously.
> 
> Correct. There is one bit to select between the two. If WRR is
> enabled, it then becomes possible for some generations of the hardware
> to configure the weights, for others the weights are fixed in silicon.
> 
> > In
> > such case, the set/create operations should reject calls setting both
> > non zero weight and priority values.
> 
> So in order to set strict priority, i need to set the priority field
> to 7, 6, 5, 4, 3, 2, 1, 0, for my 8 queues, and weight to 0. For WRR,
> i need to set the priority fields to 0, and the weight values, either
> to what is hard coded in the silicon, or valid weights when it is
> configurable.
> 
> Now the question is, how do i get between these two states? It is not
> possible to mix WRR and strict priority. Any kAPI which only modifies
> one queue at once will go straight into an invalid state, and the
> driver will need to return -EOPNOTSUPP. So it seems like there needs
> to be an atomic set N queue configuration at once, so i can cleanly go
> from strict priority across 8 queues to WRR across 8 queues. Is that
> foreseen?

You could delete all the WRR shapers and then create/add SP based ones.

> > It sounds correct. You can avoid creating the queue group and set the
> > rate at the NETDEV scope, or possibly not even set/create such shaper:
> > the transmission is expected to be limited by the line rate.
> 
> Well, the hardware exists, i probably should just create the shapers
> to match the hardware. However, if i set the hardware equivalent of
> bw_max to 0, it then uses line rate. Is this something we want to
> document? You can disable a shaper from shaping by setting the
> bandwidth to 0? Or do we want a separate enable/disable bit in the
> structure?

I documenting that '0' means 'unlimited' for bw_max fields should be
good enough. If the NIC can't support any kind of shaping it will
reject with an appropriate extended ack any configuration change with
bw != 0.

I guess we need another comment update here :)


> > > > + * NET_SHAPER_SCOPE_NETDEV, NET_SHAPER_SCOPE_QUEUE_GROUP and
> > > > + * NET_SHAPER_SCOPE_QUEUE are available on both PFs and VFs devices.
> > > 
> > > Are they also available on plain boring devices which don't have PFs
> > > or VFs?
> > 
> > Yes, a driver is free to implement/support an arbitrary subset of the
> > available feature. Operations attempting to set an unsupported state
> > should fail with a relevant extended ack.
> 
> Great. Please update the comment.

Sure, will do.

> > > In my case, only one field appears to be relevant in
> > > each shaper, and maybe we want to give a hint about that to userspace?
> > 
> > Any suggestion on how to expose that in reasonable way more then
> > welcome!
> 
> We might first want to gather knowledge on what these shapers actually
> look like in different hardwares. There are going to be some which are
> very limited fixed functions as in the hardware i have, probably most
> SOHO switches are like this. And then we have TOR and smart NICs which
> might be able to create and destroy shapers on the fly, and are a lot
> less restrictive.

As a reference the initial configuration status has been discussed
here:

https://lore.kernel.org/netdev/20240410075745.4637c537@kernel.org/

(grep for 'Transforming arbitrary pre-existing driver hierarchy')

The conclusion was (to my understanding) to reduce the core complexity
enforcing an uniform views of the NIC defaults.

The 'create' op is just an abstraction to tell the NIC to switch from
the default configuration to the specified one.

The default configuration means 'no restrictions/shaping beyond the
link rate'. If the NIC has buildin shapers it can't disable, it must
initialize them to match the above. In both case the kernel view of the
default will be 'no shapers added'

The 'delete' op will restore the default for the specified scope
location. If the NIC has a persistent shaper there, it will update the
setting to the default, or actually delete the H/W shaper if possible.
In both case the kernel view of such scope location with return to be
'no shapers added'

My understading/guess/hope is that is possible to implement the above
default with any H/W.

Does the above solve your doubt/answer your question?

Thanks!

Paolo


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ