netdev - Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20231218133319.3eef8931@kernel.org>
Date: Mon, 18 Dec 2023 13:33:19 -0800
From: Jakub Kicinski <kuba@...nel.org>
To: Paolo Abeni <pabeni@...hat.com>
Cc: Jiri Pirko <jiri@...nulli.us>, netdev@...r.kernel.org,
 anthony.l.nguyen@...el.com, intel-wired-lan@...ts.osuosl.org,
 qi.z.zhang@...el.com, Wenjun Wu <wenjun1.wu@...el.com>,
 maxtram95@...il.com, "Chittim, Madhu" <madhu.chittim@...el.com>,
 "Samudrala, Sridhar" <sridhar.samudrala@...el.com>, Simon Horman
 <simon.horman@...hat.com>
Subject: Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and
 devlink rate support'

On Mon, 18 Dec 2023 21:12:35 +0100 Paolo Abeni wrote:
> On Fri, 2023-12-15 at 14:41 -0800, Jakub Kicinski wrote:
> > I explained before (perhaps on the netdev call) - Qdiscs have two
> > different offload models. "local" and "switchdev", here we want "local"
> > AFAIU and TBF only has "switchdev" offload (take a look at the enqueue
> > method and which drivers support it today).  
> 
> I must admit the above is not yet clear to me.
> 
> I initially thought you meant that "local" offloads properly
> reconfigure the S/W datapath so that locally generated traffic would go
> through the expected processing (e.g. shaping) just once, while with
> "switchdev" offload locally generated traffic will see shaping done
> both by the S/W and the H/W[1].
> 
> Reading the above I now think you mean that local offloads has only
> effect for locally generated traffic but not on traffic forwarded via
> eswitch, and vice versa[2]. 
> 
> The drivers I looked at did not show any clue (to me).
> 
> FTR, I think that [1] is a bug worth fixing and [2] is evil ;)
> 
> Could you please clarify which is the difference exactly between them?

The practical difference which you can see in the code is that
"locally offloaded" qdiscs will act like a FIFO in the SW path (at least
to some extent). While "switchdev" offload qdiscs act exactly the same
regardless of the offload.

Neither is wrong, they are offloading different things. Qdisc offload
on a representor (switchdev) offloads from the switch perspective, i.e.
"ingress to host". Only fallback goes thru SW path, and should be
negligible.

"Local" offload can be implemented as admission control (and is
sometimes work conserving), it's on the "real" interface, it's egress,
and doesn't take part in forwarding.

> > I question whether something as basic as scheduling and ACLs should
> > follow the "offload SW constructs" mantra. You are exposed to more
> > diverse users so please don't hesitate to disagree, but AFAICT
> > the transparent offload (user installs SW constructs and if offload
> > is available - offload, otherwise use SW is good enough) has not
> > played out like we have hoped.
> > 
> > Let's figure out what is the abstract model of scheduling / shaping
> > within a NIC that we want to target. And then come up with a way of
> > representing it in SW. Not which uAPI we can shoehorn into the use
> > case.  
> 
> I thought the model was quite well defined since the initial submission
> from Intel, and is quite simple: expose TX shaping on per tx queue
> basis, with min rate, max rate (in bps) and burst (in bytes).

For some definition of a model, I guess. Given the confusion about
switchdev vs local (ingress vs egress) - I can't agree that the model
is well defined :(

What I mean is - given piece of functionality like "Tx queue shaping"
you can come up with a reasonable uAPI that you can hijack and it makes
sense to you. But someone else (switchdev ingress) can chose the same
API to implement a different offload. Not to mention that yet another
person will chose a different API to implement the same things as you :(

Off the top of my head we have at least:

 - Tx DMA admission control / scheduling (which Tx DMA queue will NIC 
   pull from)
 - Rx DMA scheduling (which Rx queue will NIC push to)

 - buffer/queue configuration (how to deal with buildup of packets in
   NIC SRAM, usually mostly for ingress)
 - NIC buffer configuration (how the SRAM is allocated to queues)

 - policers in the NIC forwarding logic


Let's extend this list so that it covers all reasonable NIC designs,
and them work on mapping how each of them is configured?

> I think that making it more complex (e.g. with nesting, pkt overhead,
> etc) we will still not cover every possible use case and will add
> considerable complexity.