lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMDZJNUcOo94i0gbDG-a9+Y=x04cZx_7KSedKEUh6gvCsa+k0g@mail.gmail.com>
Date:   Fri, 17 Dec 2021 11:21:07 +0800
From:   Tonghao Zhang <xiangxia.m.yue@...il.com>
To:     Daniel Borkmann <daniel@...earbox.net>
Cc:     John Fastabend <john.fastabend@...il.com>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Andrii Nakryiko <andrii@...nel.org>,
        Martin KaFai Lau <kafai@...com>,
        Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
        KP Singh <kpsingh@...nel.org>,
        Eric Dumazet <edumazet@...gle.com>,
        Antoine Tenart <atenart@...nel.org>,
        Alexander Lobakin <alexandr.lobakin@...el.com>,
        Wei Wang <weiwan@...gle.com>, Arnd Bergmann <arnd@...db.de>
Subject: Re: [net v5 2/3] net: sched: add check tc_skip_classify in sch egress

On Thu, Dec 16, 2021 at 8:37 PM Daniel Borkmann <daniel@...earbox.net> wrote:
>
> On 12/11/21 1:37 AM, Tonghao Zhang wrote:
> > On Sat, Dec 11, 2021 at 4:11 AM Daniel Borkmann <daniel@...earbox.net> wrote:
> >> On 12/10/21 8:54 PM, Tonghao Zhang wrote:
> >>> On Sat, Dec 11, 2021 at 1:46 AM Tonghao Zhang <xiangxia.m.yue@...il.com> wrote:
> >>>> On Sat, Dec 11, 2021 at 1:37 AM Tonghao Zhang <xiangxia.m.yue@...il.com> wrote:
> >>>>> On Sat, Dec 11, 2021 at 12:43 AM John Fastabend
> >>>>> <john.fastabend@...il.com> wrote:
> >>>>>> xiangxia.m.yue@ wrote:
> [...]
> >>>>> Hi John
> >>>>> Tx ethx -> __dev_queue_xmit -> sch_handle_egress
> >>>>> ->  execute BPF program on ethx with bpf_redirect(ifb0) ->
> >>>>> -> ifb_xmit -> ifb_ri_tasklet -> dev_queue_xmit -> __dev_queue_xmit
> >>>>> the packets loopbacks, that means bpf_redirect doesn't work with ifb
> >>>>> netdev, right ?
> >>>>> so in sch_handle_egress, I add the check skb_skip_tc_classify().
> >>
> >> But why would you do that? Usage like this is just broken by design..
> > As I understand, we can redirect packets to a target device either at
> > ingress or at *egress
> >
> > The commit ID: 3896d655f4d491c67d669a15f275a39f713410f8
> > Allow eBPF programs attached to classifier/actions to call
> > bpf_clone_redirect(skb, ifindex, flags) helper which will mirror or
> > redirect the packet by dynamic ifindex selection from within the
> > program to a target device either at ingress or at egress. Can be used
> > for various scenarios, for example, to load balance skbs into veths,
> > split parts of the traffic to local taps, etc.
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=3896d655f4d491c67d669a15f275a39f713410f8
> > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=27b29f63058d26c6c1742f1993338280d5a41dc6
> >
> > But at egress the bpf_redirect doesn't work with ifb.
> >> If you need to loop anything back to RX, just use bpf_redirect() with
> > Not use it to loop packets back. the flags of bpf_redirect is 0. for example:
> >
> > tc filter add dev veth1 \
> > egress bpf direct-action obj test_bpf_redirect_ifb.o sec redirect_ifb
> > https://patchwork.kernel.org/project/netdevbpf/patch/20211208145459.9590-4-xiangxia.m.yue@gmail.com/
> >> BPF_F_INGRESS? What is the concrete/actual rationale for ifb here?
> > We load balance the packets to different ifb netdevices at egress. On
> > ifb, we install filters, rate limit police,
Hi Daniel, I try to answer your question. thanks
> I guess this part here is what I don't quite follow. Could you walk me through
> the packet flow in this case? So you go from bpf@...egress@...s-dev to do the
> redirect to bpf@...egress@ifb, and then again to bpf@...egress@...s-dev (same
ifb doesn't install bpf prog, ifb will redirect packets to netdevice
which packets from.
ifb_ri_tasklet
1. skb->dev = dev_get_by_index_rcu(dev_net(txp->dev), skb->skb_iif);
// skb_iif is phys-dev
2. dev_queue_xmit

> dev or different one I presume)? Why not doing the load-balancing, applying the
> policy, and doing the rate-limiting (e.g. EDT with sch_fq) directly at the initial
> bpf@...egress@...s-dev location given bpf is perfectly capable to do all of it
> there w/o the extra detour & overhead through ifb? The issue I see here is adding
This is not easy to explain. We have two solution for our production,
one is this, other one is my
other patchset:
https://patchwork.kernel.org/project/netdevbpf/list/?series=593409&archive=both&state=*

Why I need those solutions, please see my patch 1/2 commit message.
https://patchwork.kernel.org/project/netdevbpf/patch/20211210023626.20905-2-xiangxia.m.yue@gmail.com/

The different of two solution is that this one use ifb to do the
rate-limiting/load-blance for applications which
key is high throughput. Then for  applications which key is  the low
latency of data access, can use the
mq or fifo qdisc in the phys-dev. This is very useful in k8s
environment. Anyway my other patchset is better
for this one sulotion. but this one solution is easy to used because
we can use the tc cmd(not bpf) to install filters.

This patch try to fix the bug in bpf.
> extra overhead to support such a narrow case that nobody else is using and that
I don't agree that, I think it's very useful in k8s. If the pods of
k8s are sharing the network resource, this is really a narrow case.
But part of pods want low latency, and other part of pods want high
throughput, this solution is selectable.
> can be achieved already with existing infra as I understood it; the justification
> right now to add the extra checks to the critical fast path is very thin..
note that:
For tc cmd, there is the check "skb_skip_tc_classify" in
tcf_action_exec, This patch only want to move check to
sch_handle_egress to avoid the
unessential tc filter lookup.
For bpf_redirect, we don't run tcf_action_exec, the packets loopback
if there is no check. This patch try to fix bug(bpf), and improve
performance(tc)

I use the netperf/iperf to measure performance, It's not easy to
figure out performance drop.
> Thanks,
> Daniel



-- 
Best regards, Tonghao

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ