netdev - Re: [PATCH net] net: sched: refine software bypass handling in tc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e20ce5c1-9cd4-4719-9c3b-93ca8a947298@redhat.com>
Date: Thu, 9 Jan 2025 11:46:07 +0100
From: Paolo Abeni <pabeni@...hat.com>
To: Xin Long <lucien.xin@...il.com>, network dev <netdev@...r.kernel.org>
Cc: davem@...emloft.net, kuba@...nel.org, Eric Dumazet <edumazet@...gle.com>,
 Jamal Hadi Salim <jhs@...atatu.com>, Cong Wang <xiyou.wangcong@...il.com>,
 Jiri Pirko <jiri@...nulli.us>,
 Marcelo Ricardo Leitner <marcelo.leitner@...il.com>, ast@...erby.net,
 Shuang Li <shuali@...hat.com>
Subject: Re: [PATCH net] net: sched: refine software bypass handling in tc_run

On 1/6/25 4:08 PM, Xin Long wrote:
> This patch addresses issues with filter counting in block (tcf_block),
> particularly for software bypass scenarios, by introducing a more
> accurate mechanism using useswcnt.
> 
> Previously, filtercnt and skipswcnt were introduced by:
> 
>   Commit 2081fd3445fe ("net: sched: cls_api: add filter counter") and
>   Commit f631ef39d819 ("net: sched: cls_api: add skip_sw counter")
> 
>   filtercnt tracked all tp (tcf_proto) objects added to a block, and
>   skipswcnt counted tp objects with the skipsw attribute set.
> 
> The problem is: a single tp can contain multiple filters, some with skipsw
> and others without. The current implementation fails in the case:
> 
>   When the first filter in a tp has skipsw, both skipswcnt and filtercnt
>   are incremented, then adding a second filter without skipsw to the same
>   tp does not modify these counters because tp->counted is already set.
> 
>   This results in bypass software behavior based solely on skipswcnt
>   equaling filtercnt, even when the block includes filters without
>   skipsw. Consequently, filters without skipsw are inadvertently bypassed.
> 
> To address this, the patch introduces useswcnt in block to explicitly count
> tp objects containing at least one filter without skipsw. Key changes
> include:
> 
>   Whenever a filter without skipsw is added, its tp is marked with usesw
>   and counted in useswcnt. tc_run() now uses useswcnt to determine software
>   bypass, eliminating reliance on filtercnt and skipswcnt.
> 
>   This refined approach prevents software bypass for blocks containing
>   mixed filters, ensuring correct behavior in tc_run().
> 
> Additionally, as atomic operations on useswcnt ensure thread safety and
> tp->lock guards access to tp->usesw and tp->counted, the broader lock
> down_write(&block->cb_lock) is no longer required in tc_new_tfilter(),
> and this resolves a performance regression caused by the filter counting
> mechanism during parallel filter insertions.
> 
>   The improvement can be demonstrated using the following script:
> 
>   # cat insert_tc_rules.sh
> 
>     tc qdisc add dev ens1f0np0 ingress
>     for i in $(seq 16); do
>         taskset -c $i tc -b rules_$i.txt &
>     done
>     wait
> 
>   Each of rules_$i.txt files above includes 100000 tc filter rules to a
>   mlx5 driver NIC ens1f0np0.
> 
>   Without this patch:
> 
>   # time sh insert_tc_rules.sh
> 
>     real    0m50.780s
>     user    0m23.556s
>     sys	    4m13.032s
> 
>   With this patch:
> 
>   # time sh insert_tc_rules.sh
> 
>     real    0m17.718s
>     user    0m7.807s
>     sys     3m45.050s
> 
> Fixes: 047f340b36fc ("net: sched: make skip_sw actually skip software")
> Reported-by: Shuang Li <shuali@...hat.com>
> Signed-off-by: Xin Long <lucien.xin@...il.com>
> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@...il.com>

Given the quite large scope of this change and the functional and
performance implications, I think it's more suited for net-next.

> ---
>  include/net/pkt_cls.h     | 18 +++++++-------
>  include/net/sch_generic.h |  5 ++--
>  net/core/dev.c            | 11 ++-------
>  net/sched/cls_api.c       | 52 +++++++++------------------------------
>  net/sched/cls_bpf.c       |  2 ++
>  net/sched/cls_flower.c    |  2 ++
>  net/sched/cls_matchall.c  |  2 ++
>  net/sched/cls_u32.c       |  2 ++
>  8 files changed, 32 insertions(+), 62 deletions(-)
> 
> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
> index cf199af85c52..d66cb315a6b5 100644
> --- a/include/net/pkt_cls.h
> +++ b/include/net/pkt_cls.h
> @@ -74,15 +74,6 @@ static inline bool tcf_block_non_null_shared(struct tcf_block *block)
>  	return block && block->index;
>  }
>  
> -#ifdef CONFIG_NET_CLS_ACT
> -DECLARE_STATIC_KEY_FALSE(tcf_bypass_check_needed_key);

I think it would be better, if possible, to preserve this static key;
will reduce the delta and avoid additional tests in fast-path for S/W
only setup.

> -
> -static inline bool tcf_block_bypass_sw(struct tcf_block *block)
> -{
> -	return block && block->bypass_wanted;
> -}
> -#endif
> -
>  static inline struct Qdisc *tcf_block_q(struct tcf_block *block)
>  {
>  	WARN_ON(tcf_block_shared(block));
> @@ -760,6 +751,15 @@ tc_cls_common_offload_init(struct flow_cls_common_offload *cls_common,
>  		cls_common->extack = extack;
>  }
>  
> +static inline void tcf_proto_update_usesw(struct tcf_proto *tp, u32 flags)
> +{
> +	if (tp->usesw)
> +		return;
> +	if (tc_skip_sw(flags) && tc_in_hw(flags))
> +		return;
> +	tp->usesw = true;
> +}

It looks like 'usesw' is never cleared. Can't user-space change the
skipsw flag for an existing tp?

[...]
> diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
> index d3a03c57545b..5e8f191fd820 100644
> --- a/net/sched/cls_u32.c
> +++ b/net/sched/cls_u32.c
> @@ -1164,6 +1164,8 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
>  		if (!tc_in_hw(n->flags))
>  			n->flags |= TCA_CLS_FLAGS_NOT_IN_HW;
>  
> +		tcf_proto_update_usesw(tp, n->flags);

Why don't you need to hook also in the 'key existing' branch on line 909?

Thanks!

Paolo