[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <649b581ded8c1_75d8a208c@john.notmuch>
Date: Tue, 27 Jun 2023 14:43:57 -0700
From: John Fastabend <john.fastabend@...il.com>
To: Stanislav Fomichev <sdf@...gle.com>,
Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Jakub Kicinski <kuba@...nel.org>,
John Fastabend <john.fastabend@...il.com>,
Donald Hunter <donald.hunter@...il.com>,
bpf <bpf@...r.kernel.org>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>,
Song Liu <song@...nel.org>,
Yonghong Song <yhs@...com>,
KP Singh <kpsingh@...nel.org>,
Hao Luo <haoluo@...gle.com>,
Jiri Olsa <jolsa@...nel.org>,
Network Development <netdev@...r.kernel.org>
Subject: Re: [RFC bpf-next v2 11/11] net/mlx5e: Support TX timestamp metadata
Stanislav Fomichev wrote:
> On Mon, Jun 26, 2023 at 3:37 PM Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
> >
> > On Mon, Jun 26, 2023 at 2:36 PM Stanislav Fomichev <sdf@...gle.com> wrote:
> > >
> > > > >
> > > > > I'd think HW TX csum is actually simpler than dealing with time,
> > > > > will you change your mind if Stan posts Tx csum within a few days? :)
> >
> > Absolutely :) Happy to change my mind.
> >
> > > > > The set of offloads is barely changing, the lack of clarity
> > > > > on what is needed seems overstated. IMHO AF_XDP is getting no use
> > > > > today, because everything remotely complex was stripped out of
> > > > > the implementation to get it merged. Aren't we hand waving the
> > > > > complexity away simply because we don't want to deal with it?
> > > > >
> > > > > These are the features today's devices support (rx/tx is a mirror):
> > > > > - L4 csum
> > > > > - segmentation
> > > > > - time reporting
> > > > >
> > > > > Some may also support:
> > > > > - forwarding md tagging
> > > > > - Tx launch time
> > > > > - no fcs
> > > > > Legacy / irrelevant:
> > > > > - VLAN insertion
> > > >
> > > > Right, the goal of the series is to lay out the foundation to support
> > > > AF_XDP offloads. I'm starting with tx timestamp because that's more
> > > > pressing. But, as I mentioned in another thread, we do have other
> > > > users that want to adopt AF_XDP, but due to missing tx offloads, they
> > > > aren't able to.
> > > >
> > > > IMHO, with pre-tx/post-tx hooks, it's pretty easy to go from TX
> > > > timestamp to TX checksum offload, we don't need a lot:
> > > > - define another generic kfunc bpf_request_tx_csum(from, to)
> > > > - drivers implement it
> > > > - af_xdp users call this kfunc from devtx hook
> > > >
> > > > We seem to be arguing over start-with-my-specific-narrow-use-case vs
> > > > start-with-generic implementation, so maybe time for the office hours?
> > > > I can try to present some cohesive plan of how we start with the framework
> > > > plus tx-timestamp and expand with tx-checksum/etc. There is a lot of
> > > > commonality in these offloads, so I'm probably not communicating it
> > > > properly..
> > >
> > > Or, maybe a better suggestion: let me try to implement TX checksum
> > > kfunc in the v3 (to show how to build on top this series).
> > > Having code is better than doing slides :-D
> >
> > That would certainly help :)
> > What I got out of your lsfmmbpf talk is that timestamp is your
> > main and only use case. tx checksum for af_xdp is the other use case,
> > but it's not yours, so you sort-of use it as an extra justification
> > for timestamp. Hence my negative reaction to 'generality'.
> > I think we'll have better results in designing an api
> > when we look at these two use cases independently.
> > And implement them in patches solving specifically timestamp
> > with normal skb traffic and tx checksum for af_xdp as two independent apis.
> > If it looks like we can extract a common framework out of them. Great.
> > But trying to generalize before truly addressing both cases
> > is likely to cripple both apis.
>
> I need timestamps for the af_xdp case and I don't really care about skb :-(
> I brought skb into the picture mostly to cover John's cases.
> So maybe let's drop the skb case for now and focus on af_xdp?
> skb is convenient testing-wise though (with veth), but maybe I can
> somehow carve-out af_xdp skbs only out of it..
I'm ok if your drop my use case but I read above and I seem to have a
slightly different opinion/approach in mind.
What I think would be the most straight-forward thing and most flexible
is to create a <drvname>_devtx_submit_skb(<drivname>descriptor, sk_buff)
and <drvname>_devtx_submit_xdp(<drvname>descriptor, xdp_frame) and then
corresponding calls for <drvname>_devtx_complete_{skb|xdp}() Then you
don't spend any cycles building the metadata thing or have to even
worry about read kfuncs. The BPF program has read access to any
fields they need. And with the skb, xdp pointer we have the context
that created the descriptor and generate meaningful metrics.
I'm clearly sacrificing usability in some sense of a general user that
doesn't know about drivers, hardware and so on for performance,
flexibility and simplicity of implementation. In general I'm OK with
this. I have trouble understanding who the dev is that is coding at
this layer, but can't read kernel driver code or at least has a good
understanding of the hardware. We are deep in optimization and
performance world once we get to putting hooks in the driver we should
expect a certain amount of understanding before we let folks just plant
hooks here. Its going to be very easy to cause all sort of damage
even if we go to the other extreme and make a unified interface and
push the complexity onto kernel devs to maintain. I really believe
folks writing AF_XDP code (for DPDK or otherwise) have a really good
handle on networking, hardware, and drivers. I also expect that
AF_XDP users will mostly be abstracted away from AF_XDP internals
by DPDK and other libs or applications. My $.02 is these will be
primarily middle box type application built for special purpose l2/l3/l4
firewalling, DDOS, etc and telco protocols. Rant off.
But I can admit <drvname>_devtx_submit_xdp(<drvname>descriptor, ...)
is a bit raw. For one its going to require an object file per
driver/hardware and maybe worse multiple objects per driver/hw to
deal with hw descriptor changes with features. My $.02 is we could
solve this with better attach time linking. Now you have to at
compile time know what you are going to attach to and how to parse
the descriptor. If we had attach time linking we could dynamically
link to the hardware specific code at link time. And then its up
to us here or others who really understand the hardware to write
a ice_read_ts, mlx_read_tx but that can all just be normal BPF code.
Also I hand waved through a step where at attach time we have
some way to say link the thing that is associated with the
driver I'm about to attach to. As a first step a loader could
do this.
Its maybe more core work and less wrangling drivers then and
it means kfuncs become just blocks of BPF that anyone can
write. The big win in my mind is I don't need to know now
what I want tomorrow because I should have access. Also we push
the complexity and maintenance out of driver/kernel and into
BPF libs and users. Then we don't have to touch BPF core just
to add new things.
Last bit that complicates things is I need a way to also write
allowed values into the descriptor. We don't have anything that
can do this now. So maybe kfuncs for the write tstamp flags and
friends?
Anyways, my $.02.
>
> Regarding timestamp vs checksum: timestamp is more pressing, but I do
One request would be to also include a driver that doesn't have
always on timestamps so some writer is needed. CSUM enabling
I'm interested to see what the signature looks like? On the
skb side we use the skb a fair amount to build the checksum
it seems so I guess AF_XDP needs to push down the csum_offset?
In the SKB case its less interesting I think becuase the
stack is already handling it.
> have people around that want to use af_xdp but need multibuf + tx
> offloads, so I was hoping to at least have a path for more tx offloads
> after we're done with tx timestamp "offload"..
>
> > It doesn't have to be only two use cases.
> > I completely agree with Kuba that:
> > - L4 csum
> > - segmentation
> > - time reporting
> > are universal HW NIC features and we need to have an api
> > that exposes these features in programmable way to bpf progs in the kernel
> > and through af_xdp to user space.
> > I mainly suggest addressing them one by one and look
> > for common code bits and api similarities later.
>
> Ack, let me see if I can fit tx csum into the picture. I still feel
> like we need these dev-bound tracing programs if we want to trigger
> kfuncs safely, but maybe we can simplify further..
Its not clear to me how you get to a dev specific attach here
without complicating the hot path more. I think we need to
really be careful to not make hotpath more complex. Will
follow along for sure to see what gets created.
Thanks,
John
Powered by blists - more mailing lists