netdev - Re: [net v5 2/3] net: sched: add check tc_skip

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1ba06b2f-6c78-cec1-4ba4-98494a402d0e@iogearbox.net>
Date:   Thu, 16 Dec 2021 13:37:17 +0100
From:   Daniel Borkmann <daniel@...earbox.net>
To:     Tonghao Zhang <xiangxia.m.yue@...il.com>
Cc:     John Fastabend <john.fastabend@...il.com>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Andrii Nakryiko <andrii@...nel.org>,
        Martin KaFai Lau <kafai@...com>,
        Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
        KP Singh <kpsingh@...nel.org>,
        Eric Dumazet <edumazet@...gle.com>,
        Antoine Tenart <atenart@...nel.org>,
        Alexander Lobakin <alexandr.lobakin@...el.com>,
        Wei Wang <weiwan@...gle.com>, Arnd Bergmann <arnd@...db.de>
Subject: Re: [net v5 2/3] net: sched: add check tc_skip_classify in sch egress

On 12/11/21 1:37 AM, Tonghao Zhang wrote:
> On Sat, Dec 11, 2021 at 4:11 AM Daniel Borkmann <daniel@...earbox.net> wrote:
>> On 12/10/21 8:54 PM, Tonghao Zhang wrote:
>>> On Sat, Dec 11, 2021 at 1:46 AM Tonghao Zhang <xiangxia.m.yue@...il.com> wrote:
>>>> On Sat, Dec 11, 2021 at 1:37 AM Tonghao Zhang <xiangxia.m.yue@...il.com> wrote:
>>>>> On Sat, Dec 11, 2021 at 12:43 AM John Fastabend
>>>>> <john.fastabend@...il.com> wrote:
>>>>>> xiangxia.m.yue@ wrote:
[...]
>>>>> Hi John
>>>>> Tx ethx -> __dev_queue_xmit -> sch_handle_egress
>>>>> ->  execute BPF program on ethx with bpf_redirect(ifb0) ->
>>>>> -> ifb_xmit -> ifb_ri_tasklet -> dev_queue_xmit -> __dev_queue_xmit
>>>>> the packets loopbacks, that means bpf_redirect doesn't work with ifb
>>>>> netdev, right ?
>>>>> so in sch_handle_egress, I add the check skb_skip_tc_classify().
>>
>> But why would you do that? Usage like this is just broken by design..
> As I understand, we can redirect packets to a target device either at
> ingress or at *egress
> 
> The commit ID: 3896d655f4d491c67d669a15f275a39f713410f8
> Allow eBPF programs attached to classifier/actions to call
> bpf_clone_redirect(skb, ifindex, flags) helper which will mirror or
> redirect the packet by dynamic ifindex selection from within the
> program to a target device either at ingress or at egress. Can be used
> for various scenarios, for example, to load balance skbs into veths,
> split parts of the traffic to local taps, etc.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=3896d655f4d491c67d669a15f275a39f713410f8
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=27b29f63058d26c6c1742f1993338280d5a41dc6
> 
> But at egress the bpf_redirect doesn't work with ifb.
>> If you need to loop anything back to RX, just use bpf_redirect() with
> Not use it to loop packets back. the flags of bpf_redirect is 0. for example:
> 
> tc filter add dev veth1 \
> egress bpf direct-action obj test_bpf_redirect_ifb.o sec redirect_ifb
> https://patchwork.kernel.org/project/netdevbpf/patch/20211208145459.9590-4-xiangxia.m.yue@gmail.com/
>> BPF_F_INGRESS? What is the concrete/actual rationale for ifb here?
> We load balance the packets to different ifb netdevices at egress. On
> ifb, we install filters, rate limit police,

I guess this part here is what I don't quite follow. Could you walk me through
the packet flow in this case? So you go from bpf@...egress@...s-dev to do the
redirect to bpf@...egress@ifb, and then again to bpf@...egress@...s-dev (same
dev or different one I presume)? Why not doing the load-balancing, applying the
policy, and doing the rate-limiting (e.g. EDT with sch_fq) directly at the initial
bpf@...egress@...s-dev location given bpf is perfectly capable to do all of it
there w/o the extra detour & overhead through ifb? The issue I see here is adding
extra overhead to support such a narrow case that nobody else is using and that
can be achieved already with existing infra as I understood it; the justification
right now to add the extra checks to the critical fast path is very thin..

Thanks,
Daniel