[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <148E4177-8064-4824-898A-997C145E19B6@fb.com>
Date: Sun, 24 Mar 2019 01:14:57 +0000
From: Lawrence Brakmo <brakmo@...com>
To: Eric Dumazet <eric.dumazet@...il.com>,
netdev <netdev@...r.kernel.org>
CC: Martin Lau <kafai@...com>, Alexei Starovoitov <ast@...com>,
"Daniel Borkmann" <daniel@...earbox.net>,
Kernel Team <Kernel-team@...com>
Subject: Re: [PATCH bpf-next 0/7] bpf: Propagate cn to TCP
On 3/23/19, 10:12 AM, "Eric Dumazet" <eric.dumazet@...il.com> wrote:
On 03/23/2019 01:05 AM, brakmo wrote:
> This patchset adds support for propagating congestion notifications (cn)
> to TCP from cgroup inet skb egress BPF programs.
>
> Current cgroup skb BPF programs cannot trigger TCP congestion window
> reductions, even when they drop a packet. This patch-set adds support
> for cgroup skb BPF programs to send congestion notifications in the
> return value when the packets are TCP packets. Rather than the
> current 1 for keeping the packet and 0 for dropping it, they can
> now return:
> NET_XMIT_SUCCESS (0) - continue with packet output
> NET_XMIT_DROP (1) - drop packet and do cn
> NET_XMIT_CN (2) - continue with packet output and do cn
> -EPERM - drop packet
>
I believe I already mentioned this model is broken, if you have any virtual
device before the cgroup BPF program.
Current qdisc can return values 0 to 2, how is this different from the cgroup
BPF program returning these values? I understand that virtual devices before
the cgroup or qdisc may not propagate these values (and I would say the
problem is then with the virtual device), but not everyone uses virtual
devices like that. For them, HBM if not appropriate if they are using
Cubic (however, it works with DCTCP or with fq's EDT).
Please think about offloading the pacing/throttling in the NIC,
there is no way we will report back to tcp stack instant notifications.
Not everyone has the ability for offloading to the NIC.
This patch series is going way too far for my taste.
Too far which way? I'm simply extending a mechanism that has been present
for a while with qdiscs to cgroup egress BPF programs. I understand if
it has no use in your environment, but we believe it has in ours.
This idea is not new, you were at Google when it was experimented by Nandita and
others, and we know it is not worth the pain.
There was no eBPF at that time. We like the flexibility we get by programing
the algorithms in eBPF.
These are not intrusive changes, they simply extend the current limited return
values form cgroup skb egress BPF programs to be more in line with qdiscs.
Powered by blists - more mailing lists