lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Mon, 31 Oct 2022 12:36:42 -0700 From: Yonghong Song <yhs@...a.com> To: Toke Høiland-Jørgensen <toke@...hat.com>, "Bezdeka, Florian" <florian.bezdeka@...mens.com>, "kuba@...nel.org" <kuba@...nel.org>, "john.fastabend@...il.com" <john.fastabend@...il.com> Cc: "alexandr.lobakin@...el.com" <alexandr.lobakin@...el.com>, "anatoly.burakov@...el.com" <anatoly.burakov@...el.com>, "sdf@...gle.com" <sdf@...gle.com>, "song@...nel.org" <song@...nel.org>, "Deric, Nemanja" <nemanja.deric@...mens.com>, "andrii@...nel.org" <andrii@...nel.org>, "Kiszka, Jan" <jan.kiszka@...mens.com>, "magnus.karlsson@...il.com" <magnus.karlsson@...il.com>, "willemb@...gle.com" <willemb@...gle.com>, "ast@...nel.org" <ast@...nel.org>, "brouer@...hat.com" <brouer@...hat.com>, "yhs@...com" <yhs@...com>, "martin.lau@...ux.dev" <martin.lau@...ux.dev>, "kpsingh@...nel.org" <kpsingh@...nel.org>, "daniel@...earbox.net" <daniel@...earbox.net>, "bpf@...r.kernel.org" <bpf@...r.kernel.org>, "mtahhan@...hat.com" <mtahhan@...hat.com>, "xdp-hints@...-project.net" <xdp-hints@...-project.net>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>, "jolsa@...nel.org" <jolsa@...nel.org>, "haoluo@...gle.com" <haoluo@...gle.com> Subject: Re: [xdp-hints] Re: [RFC bpf-next 0/5] xdp: hints via kfuncs On 10/31/22 8:28 AM, Toke Høiland-Jørgensen wrote: > "Bezdeka, Florian" <florian.bezdeka@...mens.com> writes: > >> Hi all, >> >> I was closely following this discussion for some time now. Seems we >> reached the point where it's getting interesting for me. >> >> On Fri, 2022-10-28 at 18:14 -0700, Jakub Kicinski wrote: >>> On Fri, 28 Oct 2022 16:16:17 -0700 John Fastabend wrote: >>>>>> And it's actually harder to abstract away inter HW generation >>>>>> differences if the user space code has to handle all of it. >>>> >>>> I don't see how its any harder in practice though? >>> >>> You need to find out what HW/FW/config you're running, right? >>> And all you have is a pointer to a blob of unknown type. >>> >>> Take timestamps for example, some NICs support adjusting the PHC >>> or doing SW corrections (with different versions of hw/fw/server >>> platforms being capable of both/one/neither). >>> >>> Sure you can extract all this info with tracing and careful >>> inspection via uAPI. But I don't think that's _easier_. >>> And the vendors can't run the results thru their validation >>> (for whatever that's worth). >>> >>>>> I've had the same concern: >>>>> >>>>> Until we have some userspace library that abstracts all these details, >>>>> it's not really convenient to use. IIUC, with a kptr, I'd get a blob >>>>> of data and I need to go through the code and see what particular type >>>>> it represents for my particular device and how the data I need is >>>>> represented there. There are also these "if this is device v1 -> use >>>>> v1 descriptor format; if it's a v2->use this another struct; etc" >>>>> complexities that we'll be pushing onto the users. With kfuncs, we put >>>>> this burden on the driver developers, but I agree that the drawback >>>>> here is that we actually have to wait for the implementations to catch >>>>> up. >>>> >>>> I agree with everything there, you will get a blob of data and then >>>> will need to know what field you want to read using BTF. But, we >>>> already do this for BPF programs all over the place so its not a big >>>> lift for us. All other BPF tracing/observability requires the same >>>> logic. I think users of BPF in general perhaps XDP/tc are the only >>>> place left to write BPF programs without thinking about BTF and >>>> kernel data structures. >>>> >>>> But, with proposed kptr the complexity lives in userspace and can be >>>> fixed, added, updated without having to bother with kernel updates, etc. >>>> From my point of view of supporting Cilium its a win and much preferred >>>> to having to deal with driver owners on all cloud vendors, distributions, >>>> and so on. >>>> >>>> If vendor updates firmware with new fields I get those immediately. >>> >>> Conversely it's a valid concern that those who *do* actually update >>> their kernel regularly will have more things to worry about. >>> >>>>> Jakub mentions FW and I haven't even thought about that; so yeah, bpf >>>>> programs might have to take a lot of other state into consideration >>>>> when parsing the descriptors; all those details do seem like they >>>>> belong to the driver code. >>>> >>>> I would prefer to avoid being stuck on requiring driver writers to >>>> be involved. With just a kptr I can support the device and any >>>> firwmare versions without requiring help. >>> >>> 1) where are you getting all those HW / FW specs :S >>> 2) maybe *you* can but you're not exactly not an ex-driver developer :S >>> >>>>> Feel free to send it early with just a handful of drivers implemented; >>>>> I'm more interested about bpf/af_xdp/user api story; if we have some >>>>> nice sample/test case that shows how the metadata can be used, that >>>>> might push us closer to the agreement on the best way to proceed. >>>> >>>> I'll try to do a intel and mlx implementation to get a cross section. >>>> I have a good collection of nics here so should be able to show a >>>> couple firmware versions. It could be fine I think to have the raw >>>> kptr access and then also kfuncs for some things perhaps. >>>> >>>>>> I'd prefer if we left the door open for new vendors. Punting descriptor >>>>>> parsing to user space will indeed result in what you just said - major >>>>>> vendors are supported and that's it. >>>> >>>> I'm not sure about why it would make it harder for new vendors? I think >>>> the opposite, >>> >>> TBH I'm only replying to the email because of the above part :) >>> I thought this would be self evident, but I guess our perspectives >>> are different. >>> >>> Perhaps you look at it from the perspective of SW running on someone >>> else's cloud, an being able to move to another cloud, without having >>> to worry if feature X is available in xdp or just skb. >>> >>> I look at it from the perspective of maintaining a cloud, with people >>> writing random XDP applications. If I swap a NIC from an incumbent to a >>> (superior) startup, and cloud users are messing with raw descriptor - >>> I'd need to go find every XDP program out there and make sure it >>> understands the new descriptors. >> >> Here is another perspective: >> >> As AF_XDP application developer I don't wan't to deal with the >> underlying hardware in detail. I like to request a feature from the OS >> (in this case rx/tx timestamping). If the feature is available I will >> simply use it, if not I might have to work around it - maybe by falling >> back to SW timestamping. >> >> All parts of my application (BPF program included) should not be >> optimized/adjusted for all the different HW variants out there. > > Yes, absolutely agreed. Abstracting away those kinds of hardware > differences is the whole *point* of having an OS/driver model. I.e., > it's what the kernel is there for! If people want to bypass that and get > direct access to the hardware, they can already do that by using DPDK. > > So in other words, 100% agreed that we should not expect the BPF > developers to deal with hardware details as would be required with a > kptr-based interface. > > As for the kfunc-based interface, I think it shows some promise. > Exposing a list of function names to retrieve individual metadata items > instead of a struct layout is sorta comparable in terms of developer UI > accessibility etc (IMO). Looks like there are quite some use cases for hw_timestamp. Do you think we could add it to the uapi like struct xdp_md? The following is the current xdp_md: struct xdp_md { __u32 data; __u32 data_end; __u32 data_meta; /* Below access go through struct xdp_rxq_info */ __u32 ingress_ifindex; /* rxq->dev->ifindex */ __u32 rx_queue_index; /* rxq->queue_index */ __u32 egress_ifindex; /* txq->dev->ifindex */ }; We could add __u64 hw_timestamp to the xdp_md so user can just do xdp_md->hw_timestamp to get the value. xdp_md->hw_timestamp == 0 means hw_timestamp is not available. Inside the kernel, the ctx rewriter can generate code to call driver specific function to retrieve the data. The kfunc approach can be used to *less* common use cases? > > There are three main drawbacks, AFAICT: > > 1. It requires driver developers to write and maintain the code that > generates the unrolled BPF bytecode to access the metadata fields, which > is a non-trivial amount of complexity. Maybe this can be abstracted away > with some internal helpers though (like, e.g., a > bpf_xdp_metadata_copy_u64(dst, src, offset) helper which would spit out > the required JMP/MOV/LDX instructions? > > 2. AF_XDP programs won't be able to access the metadata without using a > custom XDP program that calls the kfuncs and puts the data into the > metadata area. We could solve this with some code in libxdp, though; if > this code can be made generic enough (so it just dumps the available > metadata functions from the running kernel at load time), it may be > possible to make it generic enough that it will be forward-compatible > with new versions of the kernel that add new fields, which should > alleviate Florian's concern about keeping things in sync. > > 3. It will make it harder to consume the metadata when building SKBs. I > think the CPUMAP and veth use cases are also quite important, and that > we want metadata to be available for building SKBs in this path. Maybe > this can be resolved by having a convenient kfunc for this that can be > used for programs doing such redirects. E.g., you could just call > xdp_copy_metadata_for_skb() before doing the bpf_redirect, and that > would recursively expand into all the kfunc calls needed to extract the > metadata supported by the SKB path? > > -Toke >
Powered by blists - more mailing lists