[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <741cd05e-e62a-4d72-b85f-ebc627b1e4d3@fiberby.net>
Date: Fri, 17 Jan 2025 15:07:57 +0000
From: Asbjørn Sloth Tønnesen <ast@...erby.net>
To: Xin Long <lucien.xin@...il.com>, network dev <netdev@...r.kernel.org>
Cc: "David S . Miller" <davem@...emloft.net>, Jakub Kicinski
<kuba@...nel.org>, Eric Dumazet <edumazet@...gle.com>,
Paolo Abeni <pabeni@...hat.com>, Jamal Hadi Salim <jhs@...atatu.com>,
Cong Wang <xiyou.wangcong@...il.com>, Jiri Pirko <jiri@...nulli.us>,
Marcelo Ricardo Leitner <marcelo.leitner@...il.com>,
Shuang Li <shuali@...hat.com>
Subject: Re: [PATCH v3 net-next] net: sched: refine software bypass handling
in tc_run
On 1/15/25 2:27 PM, Xin Long wrote:
> This patch addresses issues with filter counting in block (tcf_block),
> particularly for software bypass scenarios, by introducing a more
> accurate mechanism using useswcnt.
>
> [...]
> The improvement can be demonstrated using the following script:
>
> # cat insert_tc_rules.sh
>
> tc qdisc add dev ens1f0np0 ingress
> for i in $(seq 16); do
> taskset -c $i tc -b rules_$i.txt &
> done
> wait
>
> Each of rules_$i.txt files above includes 100000 tc filter rules to a
> mlx5 driver NIC ens1f0np0.
>
> Without this patch:
>
> # time sh insert_tc_rules.sh
>
> real 0m50.780s
> user 0m23.556s
> sys 4m13.032s
>
> With this patch:
>
> # time sh insert_tc_rules.sh
>
> real 0m17.718s
> user 0m7.807s
> sys 3m45.050s
I assume that you have tested that these numbers are still roughly the same for v3?
> [...]
> DEFINE_STATIC_KEY_FALSE(netstamp_needed_key);
> @@ -4045,10 +4045,13 @@ static int tc_run(struct tcx_entry *entry, struct sk_buff *skb,
> if (!miniq)
> return ret;
>
> - if (static_branch_unlikely(&tcf_bypass_check_needed_key)) {
> - if (tcf_block_bypass_sw(miniq->block))
> - return ret;
> - }
> + /* Global bypass */
> + if (!static_branch_likely(&tcf_sw_enabled_key))
> + return ret;
I have tested with both static_branch_likely() and static_branch_unlikely(),
but my results are inconclusive, I don't see a significant difference in my tests,
but it cases a lot of changes in the object code.
$ diff -Naur <(objdump --no-addresses -d dev_likely.o) \
<(objdump --no-addresses -d dev_unlikely.o) | diffstat
62 | 156 ++++++++++++++++++++++++++++++++++----------------------------------
1 file changed, 79 insertions(+), 77 deletions(-)
> +
> + /* Block-wise bypass */
> + if (tcf_block_bypass_sw(miniq->block))
> + return ret;
>
> tc_skb_cb(skb)->mru = 0;
> tc_skb_cb(skb)->post_ct = false;
> [...]
When I run the benchmark tests from my original bypass patch last year,
I don't see any significant differences in the forwarding performance.
(Xeon D-1518, single 8-core CPU, no parallel rule updates).
Reviewed-by: Asbjørn Sloth Tønnesen <ast@...erby.net>
Tested-by: Asbjørn Sloth Tønnesen <ast@...erby.net>
Powered by blists - more mailing lists