lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bf24862c280a97c98c69d22aa1a7cc5109878164.camel@redhat.com>
Date: Wed, 10 Apr 2024 11:41:06 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: Jamal Hadi Salim <jhs@...atatu.com>
Cc: Simon Horman <horms@...nel.org>, netdev@...r.kernel.org, Jakub Kicinski
 <kuba@...nel.org>, Jiri Pirko <jiri@...nulli.us>, Madhu Chittim
 <madhu.chittim@...el.com>, Sridhar Samudrala <sridhar.samudrala@...el.com>,
  John Fastabend <john.fastabend@...il.com>
Subject: Re: [RFC] HW TX Rate Limiting Driver API

On Sat, 2024-04-06 at 09:48 -0400, Jamal Hadi Salim wrote:
> On Fri, Apr 5, 2024 at 1:06 PM Paolo Abeni <pabeni@...hat.com> wrote:
> > 
> > On Fri, 2024-04-05 at 09:33 -0400, Jamal Hadi Salim wrote:
> > > On Fri, Apr 5, 2024 at 6:25 AM Simon Horman <horms@...nel.org> wrote:
> > > > This is follow-up to the ongoing discussion started by Intel to extend the
> > > > support for TX shaping H/W offload [1].
> > > > 
> > > > The goal is allowing the user-space to configure TX shaping offload on a
> > > > per-queue basis with min guaranteed B/W, max B/W limit and burst size on a
> > > > VF device.
> > > > 
> > > > 
> > > > In the past few months several different solutions were attempted and
> > > > discussed, without finding a perfect fit:
> > > > 
> > > > - devlink_rate APIs are not appropriate for to control TX shaping on netdevs
> > > > - No existing TC qdisc offload covers the required feature set
> > > > - HTB does not allow direct queue configuration
> > > > - MQPRIO imposes constraint on the maximum number of TX queues
> > > > - TBF does not support max B/W limit
> > > > - ndo_set_tx_maxrate() only controls the max B/W limit
> > > > 
> > > > A new H/W offload API is needed, but offload API proliferation should be
> > > > avoided.
> > > > 
> > > > The following proposal intends to cover the above specified requirement and
> > > > provide a possible base to unify all the shaping offload APIs mentioned above.
> > > > 
> > > > The following only defines the in-kernel interface between the core and
> > > > drivers. The intention is to expose the feature to user-space via Netlink.
> > > > Hopefully the latter part should be straight-forward after agreement
> > > > on the in-kernel interface.
> > > > 
> > > > All feedback and comment is more then welcome!
> > > > 
> > > > [1] https://lore.kernel.org/netdev/20230808015734.1060525-1-wenjun1.wu@intel.com/
> > > > 
> > > 
> > > My 2 cents:
> > > I did peruse the lore quoted thread but i am likely to have missed something.
> > > It sounds like the requirement is for egress-from-host (which to a
> > > device internal looks like ingress-from-host on the device). Doesn't
> > > existing HTB offload already support this? I didnt see this being
> > > discussed in the thread.
> > 
> > Yes, HTB has been one of the possible option discussed, but not in that
> > thread, let me find the reference:
> > 
> > https://lore.kernel.org/netdev/131da9645be5ef6ea584da27ecde795c52dfbb00.camel@redhat.com/
> > 
> > it turns out that HTB does not allow configuring TX shaping on a per
> > (existing, direct) queue basis. It could, with some small functional
> > changes, but then we will be in the suboptimal scenario I mentioned in
> > my previous email: quite similar to creating a new offload type,
> > and will not be 'future proof'.
> > 
> > > Also, IIUC, there is no hierarchy
> > > requirement. That is something you can teach HTB but there's probably
> > > something i missed because i didnt understand the context of "HTB does
> > > not allow direct queue configuration". If HTB is confusing from a
> > > config pov then it seems what Paolo was describing in the thread on
> > > TBF is a reasonable approach too. I couldnt grok why that TBF
> > > extension for max bw was considered a bad idea.
> > 
> > TBF too was also in the category 'near enough but not 100% fit'
> > 
> > > On config:
> > > Could we not introduce skip_sw/hw semantics for qdiscs? IOW, skip_sw
> > > means the config is only subjected to hw and you have DIRECT
> > > semantics, etc.
> > > I understand the mlnx implementation of HTB does a lot of things in
> > > the driver but the one nice thing they had was ability to use classid
> > > X:Y to select a egress h/w queue. The driver resolution of all the
> > > hierarchy is not needed at all here if i understood the requirement
> > > above.
> > > You still need to have a classifier in s/w (which could be attached to
> > > clsact egress) to select the queue. That is something the mlnx
> > > implementation allowed. So there is no "double queueing"
> > 
> > AFAICS the current status of qdisc H/W offload implementation is a bit
> > mixed-up. e.g. HTB requires explicit syntax on the command line to
> > enable H/W offload, TBF doesn't.
> > 
> > H/W offload enabled on MQPRIO implies skipping the software path, while
> > for HTB and TBF doesn't.
> > 
> > > If this is about totally bypassing s/w config then its a different ballgame..
> > 
> > Yes, this does not have s/w counter-part. It limits itself to
> > configure/expose H/W features.
> > 
> 
> I think this changes the dynamics altogether. Would IDPF[1] be a fit for this?

Thank for the pointer, I admit we did not look closely to that before.

Anyway it looks like quite a different beast: we are looking for way to
configure H/W shaping that could be applied to most/all/as much as
possible existing H/W, while AFAICS the above dictates specification
for 'standard' high-perf netdev H/W. Only matching H/W will be
affected.

> > My take is that configuring the shapers on a queue/device/queue
> > group/vfs group basis, the admin is enforcing shared resources
> > reservation: we don't strictly need a software counter-part.
> 
> I am assuming then, if the hw allows it one could run offloaded TC/htb
> on the queue/device/queue

HTB is one of the thing we looked at. The existing offload interface
proved not being enough for the use-case at hands.

We could extend it - or others, notably ndo_set_tx_maxrate() would be
simpler to adapt - but one request here is to avoid a delta to H/W
offload dictated by a specific request and covering only/exactly that
specific request that will lead to exactly this same process in some
(little?) time.

The idea is to cover a reasonable large spectrum of the existing
capabilities related to H/W shaping - "once for all" I would say if I
were not sure that such statement will fire back in the most painful
way ;)

Thanks,

Paolo



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ