netdev - Re: [RFC] HW TX Rate Limiting Driver API

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b32bab8ee1468647b4b9d93407cf8287bcffc67f.camel@redhat.com>
Date: Fri, 05 Apr 2024 19:06:10 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: Jamal Hadi Salim <jhs@...atatu.com>, Simon Horman <horms@...nel.org>
Cc: netdev@...r.kernel.org, Jakub Kicinski <kuba@...nel.org>, Jiri Pirko
 <jiri@...nulli.us>, Madhu Chittim <madhu.chittim@...el.com>, Sridhar
 Samudrala <sridhar.samudrala@...el.com>, John Fastabend
 <john.fastabend@...il.com>
Subject: Re: [RFC] HW TX Rate Limiting Driver API

On Fri, 2024-04-05 at 09:33 -0400, Jamal Hadi Salim wrote:
> On Fri, Apr 5, 2024 at 6:25 AM Simon Horman <horms@...nel.org> wrote:
> > This is follow-up to the ongoing discussion started by Intel to extend the
> > support for TX shaping H/W offload [1].
> > 
> > The goal is allowing the user-space to configure TX shaping offload on a
> > per-queue basis with min guaranteed B/W, max B/W limit and burst size on a
> > VF device.
> > 
> > 
> > In the past few months several different solutions were attempted and
> > discussed, without finding a perfect fit:
> > 
> > - devlink_rate APIs are not appropriate for to control TX shaping on netdevs
> > - No existing TC qdisc offload covers the required feature set
> > - HTB does not allow direct queue configuration
> > - MQPRIO imposes constraint on the maximum number of TX queues
> > - TBF does not support max B/W limit
> > - ndo_set_tx_maxrate() only controls the max B/W limit
> > 
> > A new H/W offload API is needed, but offload API proliferation should be
> > avoided.
> > 
> > The following proposal intends to cover the above specified requirement and
> > provide a possible base to unify all the shaping offload APIs mentioned above.
> > 
> > The following only defines the in-kernel interface between the core and
> > drivers. The intention is to expose the feature to user-space via Netlink.
> > Hopefully the latter part should be straight-forward after agreement
> > on the in-kernel interface.
> > 
> > All feedback and comment is more then welcome!
> > 
> > [1] https://lore.kernel.org/netdev/20230808015734.1060525-1-wenjun1.wu@intel.com/
> > 
> 
> My 2 cents:
> I did peruse the lore quoted thread but i am likely to have missed something.
> It sounds like the requirement is for egress-from-host (which to a
> device internal looks like ingress-from-host on the device). Doesn't
> existing HTB offload already support this? I didnt see this being
> discussed in the thread. 

Yes, HTB has been one of the possible option discussed, but not in that
thread, let me find the reference:

https://lore.kernel.org/netdev/131da9645be5ef6ea584da27ecde795c52dfbb00.camel@redhat.com/

it turns out that HTB does not allow configuring TX shaping on a per
(existing, direct) queue basis. It could, with some small functional
changes, but then we will be in the suboptimal scenario I mentioned in
my previous email: quite similar to creating a new offload type,
and will not be 'future proof'.

> Also, IIUC, there is no hierarchy
> requirement. That is something you can teach HTB but there's probably
> something i missed because i didnt understand the context of "HTB does
> not allow direct queue configuration". If HTB is confusing from a
> config pov then it seems what Paolo was describing in the thread on
> TBF is a reasonable approach too. I couldnt grok why that TBF
> extension for max bw was considered a bad idea.

TBF too was also in the category 'near enough but not 100% fit'

> On config:
> Could we not introduce skip_sw/hw semantics for qdiscs? IOW, skip_sw
> means the config is only subjected to hw and you have DIRECT
> semantics, etc.
> I understand the mlnx implementation of HTB does a lot of things in
> the driver but the one nice thing they had was ability to use classid
> X:Y to select a egress h/w queue. The driver resolution of all the
> hierarchy is not needed at all here if i understood the requirement
> above.
> You still need to have a classifier in s/w (which could be attached to
> clsact egress) to select the queue. That is something the mlnx
> implementation allowed. So there is no "double queueing"

AFAICS the current status of qdisc H/W offload implementation is a bit
mixed-up. e.g. HTB requires explicit syntax on the command line to
enable H/W offload, TBF doesn't.

H/W offload enabled on MQPRIO implies skipping the software path, while
for HTB and TBF doesn't.

> If this is about totally bypassing s/w config then its a different ballgame..

Yes, this does not have s/w counter-part. It limits itself to
configure/expose H/W features.

My take is that configuring the shapers on a queue/device/queue
group/vfs group basis, the admin is enforcing shared resources
reservation: we don't strictly need a software counter-part.

Thanks for the feedback!

Paolo