[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YtRfG5YNtNHZXOUc@pop-os.localdomain>
Date: Sun, 17 Jul 2022 12:12:27 -0700
From: Cong Wang <xiyou.wangcong@...il.com>
To: Toke Høiland-Jørgensen <toke@...hat.com>
Cc: Stanislav Fomichev <sdf@...gle.com>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>,
Song Liu <song@...nel.org>, Yonghong Song <yhs@...com>,
John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>, Hao Luo <haoluo@...gle.com>,
Jiri Olsa <jolsa@...nel.org>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Jesper Dangaard Brouer <hawk@...nel.org>,
Björn Töpel <bjorn@...nel.org>,
Magnus Karlsson <magnus.karlsson@...el.com>,
Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
Jonathan Lemon <jonathan.lemon@...il.com>,
Mykola Lysenko <mykolal@...com>,
Kumar Kartikeya Dwivedi <memxor@...il.com>,
netdev@...r.kernel.org, bpf@...r.kernel.org,
Freysteinn Alfredsson <freysteinn.alfredsson@....se>
Subject: Re: [RFC PATCH 00/17] xdp: Add packet queueing and scheduling
capabilities
On Thu, Jul 14, 2022 at 12:46:54PM +0200, Toke Høiland-Jørgensen wrote:
> Stanislav Fomichev <sdf@...gle.com> writes:
>
> > On Wed, Jul 13, 2022 at 2:52 PM Toke Høiland-Jørgensen <toke@...hat.com> wrote:
> >>
> >> Stanislav Fomichev <sdf@...gle.com> writes:
> >>
> >> > On Wed, Jul 13, 2022 at 4:14 AM Toke Høiland-Jørgensen <toke@...hat.com> wrote:
> >> >>
> >> >> Packet forwarding is an important use case for XDP, which offers
> >> >> significant performance improvements compared to forwarding using the
> >> >> regular networking stack. However, XDP currently offers no mechanism to
> >> >> delay, queue or schedule packets, which limits the practical uses for
> >> >> XDP-based forwarding to those where the capacity of input and output links
> >> >> always match each other (i.e., no rate transitions or many-to-one
> >> >> forwarding). It also prevents an XDP-based router from doing any kind of
> >> >> traffic shaping or reordering to enforce policy.
> >> >>
> >> >> This series represents a first RFC of our attempt to remedy this lack. The
> >> >> code in these patches is functional, but needs additional testing and
> >> >> polishing before being considered for merging. I'm posting it here as an
> >> >> RFC to get some early feedback on the API and overall design of the
> >> >> feature.
> >> >>
> >> >> DESIGN
> >> >>
> >> >> The design consists of three components: A new map type for storing XDP
> >> >> frames, a new 'dequeue' program type that will run in the TX softirq to
> >> >> provide the stack with packets to transmit, and a set of helpers to dequeue
> >> >> packets from the map, optionally drop them, and to schedule an interface
> >> >> for transmission.
> >> >>
> >> >> The new map type is modelled on the PIFO data structure proposed in the
> >> >> literature[0][1]. It represents a priority queue where packets can be
> >> >> enqueued in any priority, but is always dequeued from the head. From the
> >> >> XDP side, the map is simply used as a target for the bpf_redirect_map()
> >> >> helper, where the target index is the desired priority.
> >> >
> >> > I have the same question I asked on the series from Cong:
> >> > Any considerations for existing carousel/edt-like models?
> >>
> >> Well, the reason for the addition in patch 5 (continuously increasing
> >> priorities) is exactly to be able to implement EDT-like behaviour, where
> >> the priority is used as time units to clock out packets.
> >
> > Ah, ok, I didn't read the patches closely enough. I saw some limits
> > for the ranges and assumed that it wasn't capable of efficiently
> > storing 64-bit timestamps..
>
> The goal is definitely to support full 64-bit priorities. Right now you
> have to start out at 0 but can go on for a full 64 bits, but that's a
> bit of an API wart that I'd like to get rid of eventually...
>
> >> > Can we make the map flexible enough to implement different qdisc
> >> > policies?
> >>
> >> That's one of the things we want to be absolutely sure about. We are
> >> starting out with the PIFO map type because the literature makes a good
> >> case that it is flexible enough to implement all conceivable policies.
> >> The goal of the test harness linked as note [4] is to actually examine
> >> this; Frey is our PhD student working on this bit.
> >>
> >> Thus far we haven't hit any limitations on this, but we'll need to add
> >> more policies before we are done with this. Another consideration is
> >> performance, of course, so we're also planning to do a comparison with a
> >> more traditional "bunch of FIFO queues" type data structure for at least
> >> a subset of the algorithms. Kartikeya also had an idea for an
> >> alternative way to implement a priority queue using (semi-)lockless
> >> skiplists, which may turn out to perform better.
> >>
> >> If there's any particular policy/algorithm you'd like to see included in
> >> this evaluation, please do let us know, BTW! :)
> >
> > I honestly am not sure what the bar for accepting this should be. But
> > on the Cong's series I mentioned Martin's CC bpf work as a great
> > example of what we should be trying to do for qdisc-like maps. Having
> > a bpf version of fq/fq_codel/whatever_other_complex_qdisc might be
> > very convincing :-)
>
> Just doing flow queueing is quite straight forward with PIFOs. We're
> working on fq_codel. Personally I also want to implement something that
> has feature parity with sch_cake (which includes every feature and the
> kitchen sink already) :)
And how exactly would you plan to implement Least Slack Time First with
PIFOs? See https://www.usenix.org/system/files/nsdi20-paper-sharma.pdf.
Can you be as specific as possible ideally with a pesudo code?
BTW, this is very easy to do with my approach as no FO limitations.
Thanks!
Powered by blists - more mailing lists