netdev - Re: [PATCH bpf-next 0/7] bpf: Propagate cn to TCP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5aec97f1-545a-f898-fdd9-c5821d5c6e39@gmail.com>
Date:   Mon, 25 Mar 2019 01:33:49 -0700
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc:     brakmo <brakmo@...com>, netdev <netdev@...r.kernel.org>,
        Martin Lau <kafai@...com>, Alexei Starovoitov <ast@...com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Kernel Team <Kernel-team@...com>
Subject: Re: [PATCH bpf-next 0/7] bpf: Propagate cn to TCP



On 03/24/2019 09:19 AM, Alexei Starovoitov wrote:
> On Sat, Mar 23, 2019 at 10:36:24PM -0700, Eric Dumazet wrote:
>>
>>
>> On 03/23/2019 08:41 AM, Alexei Starovoitov wrote:
>>> On Sat, Mar 23, 2019 at 02:12:39AM -0700, Eric Dumazet wrote:
>>>>
>>>>
>>>> On 03/23/2019 01:05 AM, brakmo wrote:
>>>>> This patchset adds support for propagating congestion notifications (cn)
>>>>> to TCP from cgroup inet skb egress BPF programs.
>>>>>
>>>>> Current cgroup skb BPF programs cannot trigger TCP congestion window
>>>>> reductions, even when they drop a packet. This patch-set adds support
>>>>> for cgroup skb BPF programs to send congestion notifications in the
>>>>> return value when the packets are TCP packets. Rather than the
>>>>> current 1 for keeping the packet and 0 for dropping it, they can
>>>>> now return:
>>>>>     NET_XMIT_SUCCESS    (0)    - continue with packet output
>>>>>     NET_XMIT_DROP       (1)    - drop packet and do cn
>>>>>     NET_XMIT_CN         (2)    - continue with packet output and do cn
>>>>>     -EPERM                     - drop packet
>>>>>
>>>>
>>>> I believe I already mentioned this model is broken, if you have any virtual
>>>> device before the cgroup BPF program.
>>>>
>>>> Please think about offloading the pacing/throttling in the NIC,
>>>> there is no way we will report back to tcp stack instant notifications.
>>>
>>> I don't think 'offload to google proprietary nic' is a suggestion
>>> that folks can practically follow.
>>> Very few NICs can offload pacing to hw and there are plenty of limitations.
>>> This patch set represents a pure sw solution that works and scales to millions of flows.
>>>
>>>> This patch series is going way too far for my taste.
>>>
>>> I would really appreciate if you can do a technical review of the patches.
>>> Our previous approach didn't quite work due to complexity around locked/non-locked socket.
>>> This is a cleaner approach.
>>> Either we go with this one or will add a bpf hook into __tcp_transmit_skb.
>>> This approach is better since it works for other protocols and can be
>>> used by qdiscs w/o any bpf.
>>>
>>>> This idea is not new, you were at Google when it was experimented by Nandita and
>>>> others, and we know it is not worth the pain.
>>>
>>> google networking needs are different from the rest of the world.
>>>
>>
>> This has nothing to do with Google against Facebook really, it is a bit sad you react like this Alexei.
>>
>> We just upstreamed bpf_skb_ecn_set_ce(), so I doubt you already have numbers to show that this strategy
>> is not enough.
>>
>> All recent discussions about ECN (TCP Prague and SCE) do not _require_ instant feedback to the sender.
>>
>> Please show us experimental results before we have to carry these huge hacks.
> 
> I have a feeling we're looking at different patches.
> All of your comments seems to be talking about something else.
> I have a hard time connecting them to this patch set.
> Have you read the cover letter and patches of _this_ set?
> 
> The results are in the cover letter and there is no change in behavior in networking stack.
> Patches 1, 2, 3 are simple refactoring of bpf-cgroup return codes.
> Patch 4 is the only one that touches net/ipv4/ip_output.c only to pass
> the return code to tcp/udp layer.
> The concept works for both despite of what you're claiming being tcp only.
> 
> Cover letter also explains why bpf_skb_ecn_set_ce is not enough.
> Please realize that existing qdiscs already doing this.
> The patchset allows bpf-cgroup to do the same.

Not the same thing I am afraid.

> 
> If you were arguing about sysctl knob in patch 5 that I would understand.
> Instead of a knob we can hard code the value for now. But please explain
> what issues you see with that knob.
> 
> Also to be fair I applied your
> commit f11216b24219 ("bpf: add skb->tstamp r/w access from tc clsact and cg skb progs")
> without demanding 'experimental results' because the feature makes sense.
> Yet, you folks didn't produce a _single_ example or test result since then.
> This is not cool.

This patch did not change fast path at all Alexei

Look at [PATCH bpf-next 4/7] bpf: Update BPF_CGROUP_RUN_PROG_INET_EGRESS calls

This part changes ip stacks.