netdev - Re: [PATCH] net: sched: check tc_skip

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAMDZJNWhZjMe1MSfZYuOWcstzkhjTutxizdzq6S1M9=M_x_VMA@mail.gmail.com>
Date:   Fri, 29 Oct 2021 08:04:13 +0800
From:   Tonghao Zhang <xiangxia.m.yue@...il.com>
To:     Daniel Borkmann <daniel@...earbox.net>
Cc:     Linux Kernel Network Developers <netdev@...r.kernel.org>,
        Willem de Bruijn <willemb@...gle.com>,
        Cong Wang <xiyou.wangcong@...il.com>,
        Jakub Kicinski <kuba@...nel.org>
Subject: Re: [PATCH] net: sched: check tc_skip_classify as far as possible

On Thu, Oct 28, 2021 at 10:28 PM Daniel Borkmann <daniel@...earbox.net> wrote:
>
> On 10/28/21 3:56 PM, xiangxia.m.yue@...il.com wrote:
> > From: Tonghao Zhang <xiangxia.m.yue@...il.com>
> >
> > We look up and then check tc_skip_classify flag in net
> > sched layer, even though skb don't want to be classified.
> > That case may consume a lot of cpu cycles.
> >
> > Install the rules as below:
> > $ for id in $(seq 1 100); do
> > $     tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
> > $ done
>
> Do you actually have such a case in practice or is this just hypothetical?
Hi Daniel, I did some research about this for k8s in production. There
are not so many tc prio(~5 different prio).
butg in this test, I use the 100 prio.

I reviewed the code, for the tx path, I think we check the
tc_skip_classify too later. In the rx path, we check it
in __netif_receive_skb_core.

> Asking as this feels rather broken to begin with.
> > netperf:
> > $ taskset -c 1 netperf -t TCP_RR -H ip -- -r 32,32
> > $ taskset -c 1 netperf -t TCP_STREAM -H ip -- -m 32
> >
> > Without this patch:
> > 10662.33 tps
> > 108.95 Mbit/s
> >
> > With this patch:
> > 12434.48 tps
> > 145.89 Mbit/s
> >
> > For TCP_RR, there are 16.6% improvement, TCP_STREAM 33.9%.
> >
> > Cc: Willem de Bruijn <willemb@...gle.com>
> > Cc: Cong Wang <xiyou.wangcong@...il.com>
> > Cc: Jakub Kicinski <kuba@...nel.org>
> > Signed-off-by: Tonghao Zhang <xiangxia.m.yue@...il.com>
> > ---
> >   net/core/dev.c      | 3 ++-
> >   net/sched/act_api.c | 3 ---
> >   2 files changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index eb61a8821b3a..856ac1fb75b4 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -4155,7 +4155,8 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
> >   #ifdef CONFIG_NET_CLS_ACT
> >       skb->tc_at_ingress = 0;
> >   # ifdef CONFIG_NET_EGRESS
> > -     if (static_branch_unlikely(&egress_needed_key)) {
> > +     if (static_branch_unlikely(&egress_needed_key) &&
> > +         !skb_skip_tc_classify(skb)) {
> >               skb = sch_handle_egress(skb, &rc, dev);
> >               if (!skb)
> >                       goto out;
> > diff --git a/net/sched/act_api.c b/net/sched/act_api.c
> > index 7dd3a2dc5fa4..bd66f27178be 100644
> > --- a/net/sched/act_api.c
> > +++ b/net/sched/act_api.c
> > @@ -722,9 +722,6 @@ int tcf_action_exec(struct sk_buff *skb, struct tc_action **actions,
> >       int i;
> >       int ret = TC_ACT_OK;
> >
> > -     if (skb_skip_tc_classify(skb))
> > -             return TC_ACT_OK;
> > -
>
> I think this might imply a change in behavior which could have the potential
> to break setups in the wild.
we may not change this code, i will send v2, if not comment.
> >   restart_act_graph:
> >       for (i = 0; i < nr_actions; i++) {
> >               const struct tc_action *a = actions[i];
> >
>


-- 
Best regards, Tonghao