[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e20ce5c1-9cd4-4719-9c3b-93ca8a947298@redhat.com>
Date: Thu, 9 Jan 2025 11:46:07 +0100
From: Paolo Abeni <pabeni@...hat.com>
To: Xin Long <lucien.xin@...il.com>, network dev <netdev@...r.kernel.org>
Cc: davem@...emloft.net, kuba@...nel.org, Eric Dumazet <edumazet@...gle.com>,
Jamal Hadi Salim <jhs@...atatu.com>, Cong Wang <xiyou.wangcong@...il.com>,
Jiri Pirko <jiri@...nulli.us>,
Marcelo Ricardo Leitner <marcelo.leitner@...il.com>, ast@...erby.net,
Shuang Li <shuali@...hat.com>
Subject: Re: [PATCH net] net: sched: refine software bypass handling in tc_run
On 1/6/25 4:08 PM, Xin Long wrote:
> This patch addresses issues with filter counting in block (tcf_block),
> particularly for software bypass scenarios, by introducing a more
> accurate mechanism using useswcnt.
>
> Previously, filtercnt and skipswcnt were introduced by:
>
> Commit 2081fd3445fe ("net: sched: cls_api: add filter counter") and
> Commit f631ef39d819 ("net: sched: cls_api: add skip_sw counter")
>
> filtercnt tracked all tp (tcf_proto) objects added to a block, and
> skipswcnt counted tp objects with the skipsw attribute set.
>
> The problem is: a single tp can contain multiple filters, some with skipsw
> and others without. The current implementation fails in the case:
>
> When the first filter in a tp has skipsw, both skipswcnt and filtercnt
> are incremented, then adding a second filter without skipsw to the same
> tp does not modify these counters because tp->counted is already set.
>
> This results in bypass software behavior based solely on skipswcnt
> equaling filtercnt, even when the block includes filters without
> skipsw. Consequently, filters without skipsw are inadvertently bypassed.
>
> To address this, the patch introduces useswcnt in block to explicitly count
> tp objects containing at least one filter without skipsw. Key changes
> include:
>
> Whenever a filter without skipsw is added, its tp is marked with usesw
> and counted in useswcnt. tc_run() now uses useswcnt to determine software
> bypass, eliminating reliance on filtercnt and skipswcnt.
>
> This refined approach prevents software bypass for blocks containing
> mixed filters, ensuring correct behavior in tc_run().
>
> Additionally, as atomic operations on useswcnt ensure thread safety and
> tp->lock guards access to tp->usesw and tp->counted, the broader lock
> down_write(&block->cb_lock) is no longer required in tc_new_tfilter(),
> and this resolves a performance regression caused by the filter counting
> mechanism during parallel filter insertions.
>
> The improvement can be demonstrated using the following script:
>
> # cat insert_tc_rules.sh
>
> tc qdisc add dev ens1f0np0 ingress
> for i in $(seq 16); do
> taskset -c $i tc -b rules_$i.txt &
> done
> wait
>
> Each of rules_$i.txt files above includes 100000 tc filter rules to a
> mlx5 driver NIC ens1f0np0.
>
> Without this patch:
>
> # time sh insert_tc_rules.sh
>
> real 0m50.780s
> user 0m23.556s
> sys 4m13.032s
>
> With this patch:
>
> # time sh insert_tc_rules.sh
>
> real 0m17.718s
> user 0m7.807s
> sys 3m45.050s
>
> Fixes: 047f340b36fc ("net: sched: make skip_sw actually skip software")
> Reported-by: Shuang Li <shuali@...hat.com>
> Signed-off-by: Xin Long <lucien.xin@...il.com>
> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@...il.com>
Given the quite large scope of this change and the functional and
performance implications, I think it's more suited for net-next.
> ---
> include/net/pkt_cls.h | 18 +++++++-------
> include/net/sch_generic.h | 5 ++--
> net/core/dev.c | 11 ++-------
> net/sched/cls_api.c | 52 +++++++++------------------------------
> net/sched/cls_bpf.c | 2 ++
> net/sched/cls_flower.c | 2 ++
> net/sched/cls_matchall.c | 2 ++
> net/sched/cls_u32.c | 2 ++
> 8 files changed, 32 insertions(+), 62 deletions(-)
>
> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
> index cf199af85c52..d66cb315a6b5 100644
> --- a/include/net/pkt_cls.h
> +++ b/include/net/pkt_cls.h
> @@ -74,15 +74,6 @@ static inline bool tcf_block_non_null_shared(struct tcf_block *block)
> return block && block->index;
> }
>
> -#ifdef CONFIG_NET_CLS_ACT
> -DECLARE_STATIC_KEY_FALSE(tcf_bypass_check_needed_key);
I think it would be better, if possible, to preserve this static key;
will reduce the delta and avoid additional tests in fast-path for S/W
only setup.
> -
> -static inline bool tcf_block_bypass_sw(struct tcf_block *block)
> -{
> - return block && block->bypass_wanted;
> -}
> -#endif
> -
> static inline struct Qdisc *tcf_block_q(struct tcf_block *block)
> {
> WARN_ON(tcf_block_shared(block));
> @@ -760,6 +751,15 @@ tc_cls_common_offload_init(struct flow_cls_common_offload *cls_common,
> cls_common->extack = extack;
> }
>
> +static inline void tcf_proto_update_usesw(struct tcf_proto *tp, u32 flags)
> +{
> + if (tp->usesw)
> + return;
> + if (tc_skip_sw(flags) && tc_in_hw(flags))
> + return;
> + tp->usesw = true;
> +}
It looks like 'usesw' is never cleared. Can't user-space change the
skipsw flag for an existing tp?
[...]
> diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
> index d3a03c57545b..5e8f191fd820 100644
> --- a/net/sched/cls_u32.c
> +++ b/net/sched/cls_u32.c
> @@ -1164,6 +1164,8 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
> if (!tc_in_hw(n->flags))
> n->flags |= TCA_CLS_FLAGS_NOT_IN_HW;
>
> + tcf_proto_update_usesw(tp, n->flags);
Why don't you need to hook also in the 'key existing' branch on line 909?
Thanks!
Paolo
Powered by blists - more mailing lists