[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z7MCxTDyVWGpRtOv@shredder>
Date: Mon, 17 Feb 2025 11:35:01 +0200
From: Ido Schimmel <idosch@...dia.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Simon Horman <horms@...nel.org>, Amit Cohen <amcohen@...dia.com>,
Alexei Starovoitov <alexei.starovoitov@...il.com>,
Petr Machata <petrm@...dia.com>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
Andrew Lunn <andrew+netdev@...n.ch>,
Network Development <netdev@...r.kernel.org>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Jesper Dangaard Brouer <hawk@...nel.org>,
John Fastabend <john.fastabend@...il.com>,
bpf <bpf@...r.kernel.org>, mlxsw <mlxsw@...dia.com>
Subject: Re: [PATCH net-next 00/12] mlxsw: Preparations for XDP support
On Sat, Feb 15, 2025 at 08:10:43AM -0800, Jakub Kicinski wrote:
> On Sat, 15 Feb 2025 14:02:52 +0000 Simon Horman wrote:
> > > TBH I also feel a little ambivalent about adding advanced software
> > > features to mlxsw. You have a dummy device off which you hang the NAPIs,
> > > the page pools, and now the RXQ objects. That already works poorly with
> > > our APIs. How are you going to handle the XDP side? Program per port,
> > > I hope? But the basic fact remains that only fallback traffic goes thru
> > > the XDP program which is not the normal Linux model, routing is after
> > > XDP.
> > >
> > > On one hand it'd be great if upstream switch drivers could benefit from
> > > the advanced features. On the other the HW is clearly not capable of
> > > delivering in line with how NICs work, so we're signing up for a stream
> > > of corner cases, bugs and incompatibility. Dunno.
> >
> > FWIIW, I do think that as this driver is actively maintained by the vendor,
> > and this is a grey zone, it is reasonable to allow the vendor to decide if
> > they want the burden of this complexity to gain some performance.
>
> Yes, I left this series in PW for an extra couple of days expecting
> a discussion but I suppose my email was taken as a final judgment.
Yes.
> The object separation can be faked more accurately, and analyzed
> (in the cover letter) to give us more confidence that the divergence
> won't create problems.
Unlike regular NICs, this device has more ports than Rx queues, so we
cannot associate a Rx queue with a net device. Like you said, this is
why NAPI instances and RXQ objects are associated with a dummy net
device. However, there are already drivers such as mtk that have the
same problem and do the same thing. The only API change that we made in
this regard is adding a net device argument to xdp_build_skb_from_buff()
instead of having it use rxq->dev.
Regarding the invocation of XDP programs, they are of course invoked on
a per-port basis. It's just that the driver first needs to look up the
XDP program in an internal array based on the Rx port in the completion
info.
Regarding motivation, one use case we thought about is telemetry. For
example, today you can configure a tc filter with a sample action that
will mirror one out of N packets to the CPU. The driver identifies such
packets according to the trap ID in the completion info and then passes
them to the psample module with various metadata that it extracted from
the completion info (e.g., latency, egress queue occupancy, if sampled
on egress). Some users don't want to process these packets locally, but
instead have them sent together with the metadata to a server for
processing. If XDP programs had access to this metadata we could do this
on the CPU with relatively low overhead. However, this is not supported
with tc-bpf, so you might tell me that it shouldn't be supported with
XDP either.
Powered by blists - more mailing lists