[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANP3RGeNVSwSfb9T_6Xp8GyggbwnY7YQjv1Fw5L2wTtqiFJbpw@mail.gmail.com>
Date: Tue, 21 Dec 2021 11:46:40 -0800
From: Maciej Żenczykowski <maze@...gle.com>
To: "Tyler Wear (QUIC)" <quic_twear@...cinc.com>
Cc: Martin KaFai Lau <kafai@...com>, Yonghong Song <yhs@...com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"bpf@...r.kernel.org" <bpf@...r.kernel.org>
Subject: Re: [PATCH] Bpf Helper Function BPF_FUNC_skb_change_dsfield
On Tue, Dec 21, 2021 at 11:16 AM Tyler Wear (QUIC)
<quic_twear@...cinc.com> wrote:
> > On Mon, Dec 20, 2021 at 07:18:42PM -0800, Yonghong Song wrote:
> > > On 12/20/21 12:40 PM, Tyler Wear wrote:
> > > > New bpf helper function BPF_FUNC_skb_change_dsfield "int
> > > > bpf_skb_change_dsfield(struct sk_buff *skb, u8 mask, u8 value)".
> > > > BPF_PROG_TYPE_CGROUP_SKB typed bpf_prog which currently can be
> > > > attached to the ingress and egress path. The helper is needed
> > > > because this type of bpf_prog cannot modify the skb directly.
> > > >
> > > > Used by a bpf_prog to specify DS field values on egress or ingress.
> > >
> > > Maybe you can expand a little bit here for your use case?
> > > I know DS field might help but a description of your actual use case
> > > will make adding this helper more compelling.
> > +1. More details on the use case is needed.
> > Also, having an individual helper for each particular header field is too specific.
> >
> > For egress, there is bpf_setsockopt() for IP_TOS and IPV6_TCLASS and it can be called in other cgroup hooks. e.g.
> > BPF_PROG_TYPE_SOCK_OPS during tcp ESTABLISHED event.
> > There is an example in tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c.
> > Is it enough for egress?
>
> Using bpf_setsockopt() has 2 issues: 1) it changes the userspace visible state 2) won't work with udp sendmsg cmsg
Right, so to clarify since I've been working with Tyler on a project
of which this patch is a small component.
Note, I may be wrong here, I don't fully understand how all of this
works... but:
ad 1) AFAIK if bpf calls bpf_setsockopt on the socket in question,
then userspace's view of the socket settings via
getsockopt(IP_TOS/IPV6_TCLASS) will also be affected - this may be
undesirable (it's technically userspace visible change in behaviour
and could, as unlikely as it is, lead to application misbehaviour).
This can be worked around via also overriding getsockopt/setsockopt
with bpf, but then you need to store the value to return to userspace
somewhere... AFAICT it all ends up being pretty ugly and very complex.
I wouldn't be worried about needing to override each individual field,
as the only other field that looks likely to be potentially beneficial
to override would be the ipv6 flowlabel.
ad 2) I don't think the bpf_setsockopt(IP_TOS/IPV6_TCLASS) approach
works for packets generated via udp sendmsg where cmsg is being used
to set tos.
3) I also think the bpf_setsockopt(IP_TOS/IPV6_TCLASS) might be too
late, since it would be in response to an already built packet, and
would thus presumably only take effect on the next packet, and not for
this one, no?
Technically this could be done by attaching the programs to tc egress
instead of the cgroup hook, but then it's per interface, which is
potentially the wrong granularity...
As for what is driving this? Upcoming wifi standard to allow access
points to inform client devices how to dscp mark individual flows.
As for the patch itself, I wonder if the return value shouldn't be
reversed, currently '1 if the DS field is set, 0 if it is not set.'
But I think returning 0 on success and an error on failure is more in
line with what other bpf helpers do?
OTOH, it does match bpf_skb_ecn_set_ce() returning 0 on failure...
- Maciej
Powered by blists - more mailing lists