netdev - Re: [RFC] HW TX Rate Limiting Driver API

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM0EoMkeErAmjksQKqOjL6uz00XXahvbgoHJWpCRjR-S6CkSGA@mail.gmail.com>
Date: Sat, 6 Apr 2024 09:48:19 -0400
From: Jamal Hadi Salim <jhs@...atatu.com>
To: Paolo Abeni <pabeni@...hat.com>
Cc: Simon Horman <horms@...nel.org>, netdev@...r.kernel.org, 
	Jakub Kicinski <kuba@...nel.org>, Jiri Pirko <jiri@...nulli.us>, 
	Madhu Chittim <madhu.chittim@...el.com>, Sridhar Samudrala <sridhar.samudrala@...el.com>, 
	John Fastabend <john.fastabend@...il.com>
Subject: Re: [RFC] HW TX Rate Limiting Driver API

On Fri, Apr 5, 2024 at 1:06 PM Paolo Abeni <pabeni@...hat.com> wrote:
>
> On Fri, 2024-04-05 at 09:33 -0400, Jamal Hadi Salim wrote:
> > On Fri, Apr 5, 2024 at 6:25 AM Simon Horman <horms@...nel.org> wrote:
> > > This is follow-up to the ongoing discussion started by Intel to extend the
> > > support for TX shaping H/W offload [1].
> > >
> > > The goal is allowing the user-space to configure TX shaping offload on a
> > > per-queue basis with min guaranteed B/W, max B/W limit and burst size on a
> > > VF device.
> > >
> > >
> > > In the past few months several different solutions were attempted and
> > > discussed, without finding a perfect fit:
> > >
> > > - devlink_rate APIs are not appropriate for to control TX shaping on netdevs
> > > - No existing TC qdisc offload covers the required feature set
> > > - HTB does not allow direct queue configuration
> > > - MQPRIO imposes constraint on the maximum number of TX queues
> > > - TBF does not support max B/W limit
> > > - ndo_set_tx_maxrate() only controls the max B/W limit
> > >
> > > A new H/W offload API is needed, but offload API proliferation should be
> > > avoided.
> > >
> > > The following proposal intends to cover the above specified requirement and
> > > provide a possible base to unify all the shaping offload APIs mentioned above.
> > >
> > > The following only defines the in-kernel interface between the core and
> > > drivers. The intention is to expose the feature to user-space via Netlink.
> > > Hopefully the latter part should be straight-forward after agreement
> > > on the in-kernel interface.
> > >
> > > All feedback and comment is more then welcome!
> > >
> > > [1] https://lore.kernel.org/netdev/20230808015734.1060525-1-wenjun1.wu@intel.com/
> > >
> >
> > My 2 cents:
> > I did peruse the lore quoted thread but i am likely to have missed something.
> > It sounds like the requirement is for egress-from-host (which to a
> > device internal looks like ingress-from-host on the device). Doesn't
> > existing HTB offload already support this? I didnt see this being
> > discussed in the thread.
>
> Yes, HTB has been one of the possible option discussed, but not in that
> thread, let me find the reference:
>
> https://lore.kernel.org/netdev/131da9645be5ef6ea584da27ecde795c52dfbb00.camel@redhat.com/
>
> it turns out that HTB does not allow configuring TX shaping on a per
> (existing, direct) queue basis. It could, with some small functional
> changes, but then we will be in the suboptimal scenario I mentioned in
> my previous email: quite similar to creating a new offload type,
> and will not be 'future proof'.
>
> > Also, IIUC, there is no hierarchy
> > requirement. That is something you can teach HTB but there's probably
> > something i missed because i didnt understand the context of "HTB does
> > not allow direct queue configuration". If HTB is confusing from a
> > config pov then it seems what Paolo was describing in the thread on
> > TBF is a reasonable approach too. I couldnt grok why that TBF
> > extension for max bw was considered a bad idea.
>
> TBF too was also in the category 'near enough but not 100% fit'
>
> > On config:
> > Could we not introduce skip_sw/hw semantics for qdiscs? IOW, skip_sw
> > means the config is only subjected to hw and you have DIRECT
> > semantics, etc.
> > I understand the mlnx implementation of HTB does a lot of things in
> > the driver but the one nice thing they had was ability to use classid
> > X:Y to select a egress h/w queue. The driver resolution of all the
> > hierarchy is not needed at all here if i understood the requirement
> > above.
> > You still need to have a classifier in s/w (which could be attached to
> > clsact egress) to select the queue. That is something the mlnx
> > implementation allowed. So there is no "double queueing"
>
> AFAICS the current status of qdisc H/W offload implementation is a bit
> mixed-up. e.g. HTB requires explicit syntax on the command line to
> enable H/W offload, TBF doesn't.
>
> H/W offload enabled on MQPRIO implies skipping the software path, while
> for HTB and TBF doesn't.
>
> > If this is about totally bypassing s/w config then its a different ballgame..
>
> Yes, this does not have s/w counter-part. It limits itself to
> configure/expose H/W features.
>

I think this changes the dynamics altogether. Would IDPF[1] be a fit for this?

> My take is that configuring the shapers on a queue/device/queue
> group/vfs group basis, the admin is enforcing shared resources
> reservation: we don't strictly need a software counter-part.
>

I am assuming then, if the hw allows it one could run offloaded TC/htb
on the queue/device/queue

cheers,
jamal

[1] https://netdevconf.info/0x16/sessions/workshop/infrastructure-datapath-functionidpf-workshop.html
> Thanks for the feedback!
>
> Paolo
>