[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANP3RGfJ2G8P40hN2F=PGDYUc3pm84=SNppHp_J0V+YiDkLM_A@mail.gmail.com>
Date: Fri, 14 Jan 2022 13:18:44 -0800
From: Maciej Żenczykowski <maze@...gle.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: "Tyler Wear (QUIC)" <quic_twear@...cinc.com>,
Network Development <netdev@...r.kernel.org>,
bpf <bpf@...r.kernel.org>, Yonghong Song <yhs@...com>,
Martin KaFai Lau <kafai@...com>,
Toke Høiland-Jørgensen <toke@...hat.com>,
Daniel Borkmann <daniel@...earbox.net>,
Song Liu <song@...nel.org>
Subject: Re: [PATCH bpf-next v6 1/2] Add skb_store_bytes() for BPF_PROG_TYPE_CGROUP_SKB
> > > > This is wrong.
> > > > CGROUP_INET_EGRESS bpf prog cannot arbitrary change packet data.
I agree with this sentiment, which is why the original proposal was
simply to add a helper which is only capable of modifying the
tos/tclass/dscp field, and not any arbitrary bytes. (note: there
already is such a helper to set the ECN congestion notification bits,
so there's somewhat of a precedent)
> > > > The networking stack populated the IP header at that point.
> > > > If the prog changes it to something else it will be confusing other
> > > > layers of stack. neigh(L2) will be wrong, etc.
> > > > We can still change certain things in the packet, but not arbitrary bytes.
> > > >
> > > > We cannot change the DS field directly in the packet either.
This part I won't agree with. In most cases there is no DSCP based
routing decision, in which case it seems perfectly reasonable to
change the DSCP bits here. Indeed last I checked (though this was a
few years ago) the ipv4 tos routing code wasn't even capable of making
sane decisions, because it looks at the bottom 4 bits of the TOS
field, instead of the top 6 bits, ie. you can route on ECN bits, but
you can't route on the full DSCP field. Additionally afaik the ipv6
tclass routing simply wasn't implemented. However, I last had to deal
with this probably half a decade ago, on even older kernels, so
perhaps the situation has changed.
Additionally DSCP bits may affect transmit queue selection (for
something like wifi qos / traffic prioritization across multiple
transmit queues with different air-time behaviours - which can use
dscp), so ideally we need dscp to be set *before* the mq qdisc /
dispatch. I think this implies it needs to happen before tc (though
again, I'm not too certain of the ordering here).
> > > > It can only be changed by changing its value in the socket.
Changing it directly in the socket has two problems:
- it becomes visible to userspace which is undesirable (ie. I've run
across userspace code which will set tos to A, then read it back and
exit/fail/crash if it doesn't see A)
- if the tos bits themselves are an input to the decision about what
tos bits to actually use, then this becomes recursive and basically
impossible to get right. (for example ssh sets tos to different
values for interactive/bulk (ie. copy) traffic, so using application
selected tos to select wire tos is perfectly reasonable)
> > > Why is the DS field unchangeable, but ecn is changeable?
> >
> > Per spec the requirement is to modify the ds field of egress packets with DSCP value. Setting ds field on socket will not suffice here.
> > Another case is where device is a middle-man and needs to modify the packets of a connected tethered client with the DSCP value, using a sock will not be able to change the packet here.
>
> If DS field needs to be changed differently for every packet
> it's better to use TC layer for this task.
> qdiscs may send packets with different DSs to different queues.
Powered by blists - more mailing lists