[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87k08eo7qy.fsf@toke.dk>
Date: Fri, 15 Jul 2022 14:55:33 +0200
From: Toke Høiland-Jørgensen <toke@...hat.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Stanislav Fomichev <sdf@...gle.com>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>,
Song Liu <song@...nel.org>, Yonghong Song <yhs@...com>,
John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>, Hao Luo <haoluo@...gle.com>,
Jiri Olsa <jolsa@...nel.org>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Jesper Dangaard Brouer <hawk@...nel.org>,
Björn Töpel <bjorn@...nel.org>,
Magnus Karlsson <magnus.karlsson@...el.com>,
Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
Jonathan Lemon <jonathan.lemon@...il.com>,
Mykola Lysenko <mykolal@...com>,
Kumar Kartikeya Dwivedi <memxor@...il.com>,
netdev@...r.kernel.org, bpf@...r.kernel.org,
Freysteinn Alfredsson <freysteinn.alfredsson@....se>,
Cong Wang <xiyou.wangcong@...il.com>
Subject: Re: [RFC PATCH 00/17] xdp: Add packet queueing and scheduling
capabilities
Alexei Starovoitov <alexei.starovoitov@...il.com> writes:
> On Thu, Jul 14, 2022 at 12:46:54PM +0200, Toke Høiland-Jørgensen wrote:
>> >
>> > Maybe a related question here: with the way you do
>> > BPF_MAP_TYPE_PIFO_GENERIC vs BPF_MAP_TYPE_PIFO_XDP, how hard it would
>> > be have support for storing xdp_frames/skb in any map? Let's say we
>> > have generic BPF_MAP_TYPE_RBTREE, where the key is
>> > priority/timestamp/whatever, can we, based on the value's btf_id,
>> > figure out the rest? (that the value is kernel structure and needs
>> > special care and more constraints - can't be looked up from user space
>> > and so on)
>> >
>> > Seems like we really need to have two special cases: where we transfer
>> > ownership of xdp_frame/skb to/from the map, any other big
>> > complications?
>> >
>> > That way we can maybe untangle the series a bit: we can talk about
>> > efficient data structures for storing frames/skbs independently of
>> > some generic support for storing them in the maps. Any major
>> > complications with that approach?
>>
>> I've had discussions with Kartikeya on this already (based on his 'kptr
>> in map' work). That may well end up being feasible, which would be
>> fantastic. The reason we didn't use it for this series is that there's
>> still some work to do on the generic verifier/infrastructure support
>> side of this (the PIFO map is the oldest part of this series), and I
>> didn't want to hold up the rest of the queueing work until that landed.
>
> Certainly makes sense for RFC to be sent out earlier,
> but Stan's point stands. We have to avoid type-specific maps when
> generic will do. kptr infra is getting close to be that answer.
ACK, I'll iterate on the map types and see how far we can get with the
kptr approach.
Do people feel a generic priority queue type would be generally useful?
Because in that case I can split out this work into a separate series...
>> >> Maybe I'm missing something, though; could you elaborate on how you'd
>> >> use kfuncs instead?
>> >
>> > I was thinking about the approach in general. In networking bpf, we've
>> > been adding new program types, new contexts and new explicit hooks.
>> > This all requires a ton of boiler plate (converting from uapi ctx to
>> > the kernel, exposing hook points, etc, etc). And looking at Benjamin's
>> > HID series, it's so much more elegant: there is no uapi, just kernel
>> > function that allows it to be overridden and a bunch of kfuncs
>> > exposed. No uapi, no helpers, no fake contexts.
>> >
>> > For networking and xdp the ship might have sailed, but I was wondering
>> > whether we should be still stuck in that 'old' boilerplate world or we
>> > have a chance to use new nice shiny things :-)
>> >
>> > (but it might be all moot if we'd like to have stable upis?)
>>
>> Right, I see what you mean. My immediate feeling is that having an
>> explicit stable UAPI for XDP has served us well. We do all kinds of
>> rewrite tricks behind the scenes (things like switching between xdp_buff
>> and xdp_frame, bulking, direct packet access, reading ifindexes by
>> pointer walking txq->dev, etc) which are important ways to improve
>> performance without exposing too many nitty-gritty details into the API.
>>
>> There's also consistency to consider: I think the addition of queueing
>> should work as a natural extension of the existing programming model for
>> XDP. So I feel like this is more a case of "if we were starting from
>> scratch today we might do things differently (like the HID series), but
>> when extending things let's keep it consistent"?
>
> "consistent" makes sense when new feature follows established path.
> The programmable packet scheduling in TX is just as revolutionary as
> XDP in RX was years ago :)
> This feature can be done similar to hid-bpf without cast-in-stone uapi
> and hooks. Such patches would be much easier to land and iterate on top.
> The amount of bike shedding will be 10 times less.
> No need for new program type, no new hooks, no new FDs and attach uapi-s.
> Even libbpf won't need any changes.
> Add few kfuncs and __weak noinline "hooks" in TX path.
> Only new map type would be necessary.
> If it can be made with kptr then it will be the only uapi exposure that
> will be heavily scrutinized.
I'm not quite convinced it's that obvious that it can be implemented
this way; but I don't mind trying it out either, if nothing else it'll
give us something to compare against...
> It doesn't mean that it will stay unstable-api forever. Once it demonstrates
> that it is on par with fq/fq_codel/cake feature-wise we can bake it into uapi.
In any case, I don't think we should merge anything until we've shown
this :)
-Toke
Powered by blists - more mailing lists