[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20230703123303.220ee6ef@kernel.org>
Date: Mon, 3 Jul 2023 12:33:03 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: John Fastabend <john.fastabend@...il.com>
Cc: Toke Høiland-Jørgensen <toke@...hat.com>,
Stanislav Fomichev <sdf@...gle.com>, Alexei Starovoitov
<alexei.starovoitov@...il.com>, Donald Hunter <donald.hunter@...il.com>,
bpf <bpf@...r.kernel.org>, Alexei Starovoitov <ast@...nel.org>, Daniel
Borkmann <daniel@...earbox.net>, Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>, Song Liu <song@...nel.org>,
Yonghong Song <yhs@...com>, KP Singh <kpsingh@...nel.org>, Hao Luo
<haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>, Network Development
<netdev@...r.kernel.org>
Subject: Re: [RFC bpf-next v2 11/11] net/mlx5e: Support TX timestamp
metadata
On Mon, 03 Jul 2023 11:30:44 -0700 John Fastabend wrote:
> Jakub Kicinski wrote:
> > On Fri, 30 Jun 2023 17:52:05 -0700 John Fastabend wrote:
> > > I would expect BPF/driver experts would write the libraries for the
> > > datapath API that the network/switch developer is going to use. I would
> > > even put the BPF programs in kernel and ship them with the release
> > > if that helps.
> > >
> > > We have different visions on who the BPF user is that writes XDP
> > > programs I think.
> >
> > Yes, crucially. What I've seen talking to engineers working on TC/XDP
> > BPF at Meta (and I may not be dealing with experts, Martin would have
> > a broader view) is that they don't understand basics like s/g or
> > details of checksums.
>
> Interesting data point. But these same engineers will want to get
> access to the checksum, but don't understand it? Seems if your
> going to start reading/writing descriptors even through kfuncs
> we need to get some docs/notes on how to use them correctly then.
> We certainly wont put guardrails on the read/writes for performance
> reasons.
Dunno about checksum, but it's definitely the same kind of person
that'd want access to timestamps.
> > I don't think it is reasonable to call you, Maxim, Nik and co. "users".
> > We're risking building system so complex normal people will _need_ an
> > overlay on top to make it work.
>
> I consider us users. We write networking CNI and observability/sec
> tooling on top of BPF. Most of what we create is driven by customer
> environments and performance. Maybe not typical users I guess, but
> also Meta users are not typical and have their own set of constraints
> and insights.
One thing Meta certainly does (and I think is a large part of success
of BPF) is delegating the development of applications away from the core
kernel team. Meta is different than a smaller company in that it _has_
a kernel team, but the "network application" teams I suspect are fairly
typical.
> > > Its pushing complexity into the kernel that we maintain in kernel
> > > when we could push the complexity into BPF and maintain as user
> > > space code and BPF codes. Its a choice to make I think.
> >
> > Right, and I believe having the code in the kernel, appropriately
> > integrated with the drivers is beneficial. The main argument against
> > it is that in certain environments kernels are old. But that's a very
> > destructive argument.
>
> My main concern here is we forget some kfunc that we need and then
> we are stuck. We don't have the luxury of upgrading kernels easily.
> It doesn't need to be an either/or discussion if we have a ctx()
> call we can drop into BTF over the descriptor and use kfuncs for
> the most common things. Other option is to simply write a kfunc
> for every field I see that could potentially have some use even
> if I don't fully understand it at the moment.
>
> I suspect I am less concerned about raw access because we already
> have BTF infra built up around our network observability/sec
> solution so we already handle per kernel differences and desc.
> just looks like another BTF object we want to read. And we
> know what dev and types we are attaching to so we don't have
> issues with is this a mlx or intel or etc device.
>
> Also as a more practical concern how do we manage nic specific
> things?
What are the NIC specific things?
> Have nic spcific kfuncs? Per descriptor tx_flags and
> status flags. Other things we need are ptr to skb and access
> to the descriptor ring so we can pull stats off the ring. I'm
> not arguing it can't be done with kfuncs, but if we go kfunc
> route be prepared for a long list of kfuncs and driver specific
> ones.
IDK why you say that, I gave the base list of offloads in an earlier
email.
Powered by blists - more mailing lists