netdev - Re: [net v5 2/3] net: sched: add check tc_skip

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAMDZJNVJqpnPjq3+v7XxTjsRjp-rzRmS-cKuQZufEuoDjUVOBw@mail.gmail.com>
Date:   Mon, 10 Jan 2022 09:34:46 +0800
From:   Tonghao Zhang <xiangxia.m.yue@...il.com>
To:     Daniel Borkmann <daniel@...earbox.net>
Cc:     John Fastabend <john.fastabend@...il.com>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Andrii Nakryiko <andrii@...nel.org>,
        Martin KaFai Lau <kafai@...com>,
        Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
        KP Singh <kpsingh@...nel.org>,
        Eric Dumazet <edumazet@...gle.com>,
        Antoine Tenart <atenart@...nel.org>,
        Alexander Lobakin <alexandr.lobakin@...el.com>,
        Wei Wang <weiwan@...gle.com>, Arnd Bergmann <arnd@...db.de>
Subject: Re: [net v5 2/3] net: sched: add check tc_skip_classify in sch egress

 t

On Fri, Dec 17, 2021 at 11:21 AM Tonghao Zhang <xiangxia.m.yue@...il.com> wrote:
>
> On Thu, Dec 16, 2021 at 8:37 PM Daniel Borkmann <daniel@...earbox.net> wrote:
> >
> > On 12/11/21 1:37 AM, Tonghao Zhang wrote:
> > > On Sat, Dec 11, 2021 at 4:11 AM Daniel Borkmann <daniel@...earbox.net> wrote:
> > >> On 12/10/21 8:54 PM, Tonghao Zhang wrote:
> > >>> On Sat, Dec 11, 2021 at 1:46 AM Tonghao Zhang <xiangxia.m.yue@...il.com> wrote:
> > >>>> On Sat, Dec 11, 2021 at 1:37 AM Tonghao Zhang <xiangxia.m.yue@...il.com> wrote:
> > >>>>> On Sat, Dec 11, 2021 at 12:43 AM John Fastabend
> > >>>>> <john.fastabend@...il.com> wrote:
> > >>>>>> xiangxia.m.yue@ wrote:
> > [...]
> > >>>>> Hi John
> > >>>>> Tx ethx -> __dev_queue_xmit -> sch_handle_egress
> > >>>>> ->  execute BPF program on ethx with bpf_redirect(ifb0) ->
> > >>>>> -> ifb_xmit -> ifb_ri_tasklet -> dev_queue_xmit -> __dev_queue_xmit
> > >>>>> the packets loopbacks, that means bpf_redirect doesn't work with ifb
> > >>>>> netdev, right ?
> > >>>>> so in sch_handle_egress, I add the check skb_skip_tc_classify().
> > >>
> > >> But why would you do that? Usage like this is just broken by design..
> > > As I understand, we can redirect packets to a target device either at
> > > ingress or at *egress
> > >
> > > The commit ID: 3896d655f4d491c67d669a15f275a39f713410f8
> > > Allow eBPF programs attached to classifier/actions to call
> > > bpf_clone_redirect(skb, ifindex, flags) helper which will mirror or
> > > redirect the packet by dynamic ifindex selection from within the
> > > program to a target device either at ingress or at egress. Can be used
> > > for various scenarios, for example, to load balance skbs into veths,
> > > split parts of the traffic to local taps, etc.
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=3896d655f4d491c67d669a15f275a39f713410f8
> > > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=27b29f63058d26c6c1742f1993338280d5a41dc6
> > >
> > > But at egress the bpf_redirect doesn't work with ifb.
> > >> If you need to loop anything back to RX, just use bpf_redirect() with
> > > Not use it to loop packets back. the flags of bpf_redirect is 0. for example:
> > >
> > > tc filter add dev veth1 \
> > > egress bpf direct-action obj test_bpf_redirect_ifb.o sec redirect_ifb
> > > https://patchwork.kernel.org/project/netdevbpf/patch/20211208145459.9590-4-xiangxia.m.yue@gmail.com/
> > >> BPF_F_INGRESS? What is the concrete/actual rationale for ifb here?
> > > We load balance the packets to different ifb netdevices at egress. On
> > > ifb, we install filters, rate limit police,
> Hi Daniel, I try to answer your question. thanks
> > I guess this part here is what I don't quite follow. Could you walk me through
> > the packet flow in this case? So you go from bpf@...egress@...s-dev to do the
> > redirect to bpf@...egress@ifb, and then again to bpf@...egress@...s-dev (same
> ifb doesn't install bpf prog, ifb will redirect packets to netdevice
> which packets from.
> ifb_ri_tasklet
> 1. skb->dev = dev_get_by_index_rcu(dev_net(txp->dev), skb->skb_iif);
> // skb_iif is phys-dev
> 2. dev_queue_xmit
>
> > dev or different one I presume)? Why not doing the load-balancing, applying the
> > policy, and doing the rate-limiting (e.g. EDT with sch_fq) directly at the initial
> > bpf@...egress@...s-dev location given bpf is perfectly capable to do all of it
> > there w/o the extra detour & overhead through ifb? The issue I see here is adding
> This is not easy to explain. We have two solution for our production,
> one is this, other one is my
> other patchset:
> https://patchwork.kernel.org/project/netdevbpf/list/?series=593409&archive=both&state=*
>
> Why I need those solutions, please see my patch 1/2 commit message.
> https://patchwork.kernel.org/project/netdevbpf/patch/20211210023626.20905-2-xiangxia.m.yue@gmail.com/
>
> The different of two solution is that this one use ifb to do the
> rate-limiting/load-blance for applications which
> key is high throughput. Then for  applications which key is  the low
> latency of data access, can use the
> mq or fifo qdisc in the phys-dev. This is very useful in k8s
> environment. Anyway my other patchset is better
> for this one sulotion. but this one solution is easy to used because
> we can use the tc cmd(not bpf) to install filters.
>
> This patch try to fix the bug in bpf.
> > extra overhead to support such a narrow case that nobody else is using and that
> I don't agree that, I think it's very useful in k8s. If the pods of
> k8s are sharing the network resource, this is really a narrow case.
> But part of pods want low latency, and other part of pods want high
> throughput, this solution is selectable.
> > can be achieved already with existing infra as I understood it; the justification
> > right now to add the extra checks to the critical fast path is very thin..
> note that:
> For tc cmd, there is the check "skb_skip_tc_classify" in
> tcf_action_exec, This patch only want to move check to
> sch_handle_egress to avoid the
> unessential tc filter lookup.
> For bpf_redirect, we don't run tcf_action_exec, the packets loopback
> if there is no check. This patch try to fix bug(bpf), and improve
> performance(tc)
>
> I use the netperf/iperf to measure performance, It's not easy to
> figure out performance drop.
Hi Daniel
any process ? Did I answer your questions?
 if you think we should not add(only move skb_skip_tc_classify from
tcf_action_exec to sch_handle_egress) this check in fastpath. I'am OK
to drop this patch.
but we can wait for other users to report this issue in the future. If
so, we can fix it with this patch. Can you help me to review other
patch 1/3 and 3/3.
> > Thanks,
> > Daniel
>
>
>
> --
> Best regards, Tonghao



-- 
Best regards, Tonghao