netdev - Re: [PATCH v3 net-next] net: sched: refine software bypass handling in tc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CADvbK_fkcw4XkT4zhb0Db5qH9q_yFMWmRgKMrHQvmVH+CMY=7g@mail.gmail.com>
Date: Fri, 17 Jan 2025 13:32:53 -0500
From: Xin Long <lucien.xin@...il.com>
To: Asbjørn Sloth Tønnesen <ast@...erby.net>
Cc: network dev <netdev@...r.kernel.org>, "David S . Miller" <davem@...emloft.net>, 
	Jakub Kicinski <kuba@...nel.org>, Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>, 
	Jamal Hadi Salim <jhs@...atatu.com>, Cong Wang <xiyou.wangcong@...il.com>, 
	Jiri Pirko <jiri@...nulli.us>, Marcelo Ricardo Leitner <marcelo.leitner@...il.com>, 
	Shuang Li <shuali@...hat.com>
Subject: Re: [PATCH v3 net-next] net: sched: refine software bypass handling
 in tc_run

On Fri, Jan 17, 2025 at 10:08 AM Asbjørn Sloth Tønnesen <ast@...erby.net> wrote:
>
> On 1/15/25 2:27 PM, Xin Long wrote:
> > This patch addresses issues with filter counting in block (tcf_block),
> > particularly for software bypass scenarios, by introducing a more
> > accurate mechanism using useswcnt.
> >
> > [...]
> >    The improvement can be demonstrated using the following script:
> >
> >    # cat insert_tc_rules.sh
> >
> >      tc qdisc add dev ens1f0np0 ingress
> >      for i in $(seq 16); do
> >          taskset -c $i tc -b rules_$i.txt &
> >      done
> >      wait
> >
> >    Each of rules_$i.txt files above includes 100000 tc filter rules to a
> >    mlx5 driver NIC ens1f0np0.
> >
> >    Without this patch:
> >
> >    # time sh insert_tc_rules.sh
> >
> >      real    0m50.780s
> >      user    0m23.556s
> >      sys          4m13.032s
> >
> >    With this patch:
> >
> >    # time sh insert_tc_rules.sh
> >
> >      real    0m17.718s
> >      user    0m7.807s
> >      sys     3m45.050s
>
> I assume that you have tested that these numbers are still roughly the same for v3?
Yup, roughly the same.

>
> > [...]
> >   DEFINE_STATIC_KEY_FALSE(netstamp_needed_key);
> > @@ -4045,10 +4045,13 @@ static int tc_run(struct tcx_entry *entry, struct sk_buff *skb,
> >       if (!miniq)
> >               return ret;
> >
> > -     if (static_branch_unlikely(&tcf_bypass_check_needed_key)) {
> > -             if (tcf_block_bypass_sw(miniq->block))
> > -                     return ret;
> > -     }
> > +     /* Global bypass */
> > +     if (!static_branch_likely(&tcf_sw_enabled_key))
> > +             return ret;
>
> I have tested with both static_branch_likely() and static_branch_unlikely(),
> but my results are inconclusive, I don't see a significant difference in my tests,
> but it cases a lot of changes in the object code.
>
> $ diff -Naur <(objdump --no-addresses -d dev_likely.o) \
>               <(objdump --no-addresses -d dev_unlikely.o) | diffstat
>   62 |  156 ++++++++++++++++++++++++++++++++++----------------------------------
>   1 file changed, 79 insertions(+), 77 deletions(-)
>
> > +
> > +     /* Block-wise bypass */
> > +     if (tcf_block_bypass_sw(miniq->block))
> > +             return ret;
> >
> >       tc_skb_cb(skb)->mru = 0;
> >       tc_skb_cb(skb)->post_ct = false;
> > [...]
>
> When I run the benchmark tests from my original bypass patch last year,
> I don't see any significant differences in the forwarding performance.
> (Xeon D-1518, single 8-core CPU, no parallel rule updates).
>
> Reviewed-by: Asbjørn Sloth Tønnesen <ast@...erby.net>
> Tested-by: Asbjørn Sloth Tønnesen <ast@...erby.net>