netdev - Re: [xdp-hints] Re: [RFC bpf-next 0/5] xdp: hints via kfuncs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5aeda7f6bb26b20cb74ef21ae9c28ac91d57fae6.camel@siemens.com>
Date:   Mon, 31 Oct 2022 14:10:32 +0000
From:   "Bezdeka, Florian" <florian.bezdeka@...mens.com>
To:     "kuba@...nel.org" <kuba@...nel.org>,
        "john.fastabend@...il.com" <john.fastabend@...il.com>
CC:     "alexandr.lobakin@...el.com" <alexandr.lobakin@...el.com>,
        "anatoly.burakov@...el.com" <anatoly.burakov@...el.com>,
        "sdf@...gle.com" <sdf@...gle.com>,
        "song@...nel.org" <song@...nel.org>,
        "Deric, Nemanja" <nemanja.deric@...mens.com>,
        "andrii@...nel.org" <andrii@...nel.org>,
        "Kiszka, Jan" <jan.kiszka@...mens.com>,
        "magnus.karlsson@...il.com" <magnus.karlsson@...il.com>,
        "willemb@...gle.com" <willemb@...gle.com>,
        "ast@...nel.org" <ast@...nel.org>,
        "brouer@...hat.com" <brouer@...hat.com>, "yhs@...com" <yhs@...com>,
        "martin.lau@...ux.dev" <martin.lau@...ux.dev>,
        "kpsingh@...nel.org" <kpsingh@...nel.org>,
        "daniel@...earbox.net" <daniel@...earbox.net>,
        "bpf@...r.kernel.org" <bpf@...r.kernel.org>,
        "mtahhan@...hat.com" <mtahhan@...hat.com>,
        "xdp-hints@...-project.net" <xdp-hints@...-project.net>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "jolsa@...nel.org" <jolsa@...nel.org>,
        "haoluo@...gle.com" <haoluo@...gle.com>
Subject: Re: [xdp-hints] Re: [RFC bpf-next 0/5] xdp: hints via kfuncs

Hi all,

I was closely following this discussion for some time now. Seems we
reached the point where it's getting interesting for me.

On Fri, 2022-10-28 at 18:14 -0700, Jakub Kicinski wrote:
> On Fri, 28 Oct 2022 16:16:17 -0700 John Fastabend wrote:
> > > > And it's actually harder to abstract away inter HW generation
> > > > differences if the user space code has to handle all of it.  
> > 
> > I don't see how its any harder in practice though?
> 
> You need to find out what HW/FW/config you're running, right?
> And all you have is a pointer to a blob of unknown type.
> 
> Take timestamps for example, some NICs support adjusting the PHC 
> or doing SW corrections (with different versions of hw/fw/server
> platforms being capable of both/one/neither).
> 
> Sure you can extract all this info with tracing and careful
> inspection via uAPI. But I don't think that's _easier_.
> And the vendors can't run the results thru their validation 
> (for whatever that's worth).
> 
> > > I've had the same concern:
> > > 
> > > Until we have some userspace library that abstracts all these details,
> > > it's not really convenient to use. IIUC, with a kptr, I'd get a blob
> > > of data and I need to go through the code and see what particular type
> > > it represents for my particular device and how the data I need is
> > > represented there. There are also these "if this is device v1 -> use
> > > v1 descriptor format; if it's a v2->use this another struct; etc"
> > > complexities that we'll be pushing onto the users. With kfuncs, we put
> > > this burden on the driver developers, but I agree that the drawback
> > > here is that we actually have to wait for the implementations to catch
> > > up.  
> > 
> > I agree with everything there, you will get a blob of data and then
> > will need to know what field you want to read using BTF. But, we
> > already do this for BPF programs all over the place so its not a big
> > lift for us. All other BPF tracing/observability requires the same
> > logic. I think users of BPF in general perhaps XDP/tc are the only
> > place left to write BPF programs without thinking about BTF and
> > kernel data structures.
> > 
> > But, with proposed kptr the complexity lives in userspace and can be
> > fixed, added, updated without having to bother with kernel updates, etc.
> > From my point of view of supporting Cilium its a win and much preferred
> > to having to deal with driver owners on all cloud vendors, distributions,
> > and so on.
> > 
> > If vendor updates firmware with new fields I get those immediately.
> 
> Conversely it's a valid concern that those who *do* actually update
> their kernel regularly will have more things to worry about.
> 
> > > Jakub mentions FW and I haven't even thought about that; so yeah, bpf
> > > programs might have to take a lot of other state into consideration
> > > when parsing the descriptors; all those details do seem like they
> > > belong to the driver code.  
> > 
> > I would prefer to avoid being stuck on requiring driver writers to
> > be involved. With just a kptr I can support the device and any
> > firwmare versions without requiring help.
> 
> 1) where are you getting all those HW / FW specs :S
> 2) maybe *you* can but you're not exactly not an ex-driver developer :S
> 
> > > Feel free to send it early with just a handful of drivers implemented;
> > > I'm more interested about bpf/af_xdp/user api story; if we have some
> > > nice sample/test case that shows how the metadata can be used, that
> > > might push us closer to the agreement on the best way to proceed.  
> > 
> > I'll try to do a intel and mlx implementation to get a cross section.
> > I have a good collection of nics here so should be able to show a
> > couple firmware versions. It could be fine I think to have the raw
> > kptr access and then also kfuncs for some things perhaps.
> > 
> > > > I'd prefer if we left the door open for new vendors. Punting descriptor
> > > > parsing to user space will indeed result in what you just said - major
> > > > vendors are supported and that's it.  
> > 
> > I'm not sure about why it would make it harder for new vendors? I think
> > the opposite, 
> 
> TBH I'm only replying to the email because of the above part :)
> I thought this would be self evident, but I guess our perspectives 
> are different.
> 
> Perhaps you look at it from the perspective of SW running on someone
> else's cloud, an being able to move to another cloud, without having 
> to worry if feature X is available in xdp or just skb.
> 
> I look at it from the perspective of maintaining a cloud, with people
> writing random XDP applications. If I swap a NIC from an incumbent to a
> (superior) startup, and cloud users are messing with raw descriptor -
> I'd need to go find every XDP program out there and make sure it
> understands the new descriptors.

Here is another perspective:

As AF_XDP application developer I don't wan't to deal with the
underlying hardware in detail. I like to request a feature from the OS
(in this case rx/tx timestamping). If the feature is available I will
simply use it, if not I might have to work around it - maybe by falling
back to SW timestamping.

All parts of my application (BPF program included) should not be
optimized/adjusted for all the different HW variants out there.

My application might be run on bare metal/cloud/virtual systems. I do
not want to care about this scenarios differently.

I followed the idea of having a library for parsing the driver specific
meta information. That would mean that this library has to keep in sync
with the kernel, right? It doesn't help if a newer kernel provides XDP
hints support for more devices/drivers but the library is not updated.
That might be relevant for all the device update strategies out there.

In addition - and maybe even contrary - we care about zero copy (ZC)
support. Our current use case has to deal with a lot of small packets,
so we hope to benefit from that. If XDP hints support requires a copy
of the meta data - maybe to drive a HW independent interface - that
might be a bottle neck for us.

> 
> There is a BPF foundation or whatnot now - what about starting a
> certification program for cloud providers and making it clear what
> features must be supported to be compatible with XDP 1.0, XDP 2.0 etc?
> 
> > it would be easier because I don't need vendor support at all.
> 
> Can you support the enfabrica NIC on day 1? :) To an extent, its just
> shifting the responsibility from the HW vendor to the middleware vendor.
> 
> > Thinking it over seems there could be room for both.
> 
> Are you thinking more or less Stan's proposal but with one of 
> the callbacks being "give me the raw thing"? Probably as a ro dynptr?
> Possible, but I don't think we need to hold off Stan's work.