[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CADvbK_chW7898xRLXeJkis3dLnDjP72MZQ__5GB57R1OHW6Z3w@mail.gmail.com>
Date: Thu, 9 Jan 2025 10:47:30 -0500
From: Xin Long <lucien.xin@...il.com>
To: Paolo Abeni <pabeni@...hat.com>
Cc: network dev <netdev@...r.kernel.org>, davem@...emloft.net, kuba@...nel.org,
Eric Dumazet <edumazet@...gle.com>, Jamal Hadi Salim <jhs@...atatu.com>,
Cong Wang <xiyou.wangcong@...il.com>, Jiri Pirko <jiri@...nulli.us>,
Marcelo Ricardo Leitner <marcelo.leitner@...il.com>, ast@...erby.net, Shuang Li <shuali@...hat.com>
Subject: Re: [PATCH net] net: sched: refine software bypass handling in tc_run
On Thu, Jan 9, 2025 at 5:46 AM Paolo Abeni <pabeni@...hat.com> wrote:
>
> On 1/6/25 4:08 PM, Xin Long wrote:
> > This patch addresses issues with filter counting in block (tcf_block),
> > particularly for software bypass scenarios, by introducing a more
> > accurate mechanism using useswcnt.
> >
> > Previously, filtercnt and skipswcnt were introduced by:
> >
> > Commit 2081fd3445fe ("net: sched: cls_api: add filter counter") and
> > Commit f631ef39d819 ("net: sched: cls_api: add skip_sw counter")
> >
> > filtercnt tracked all tp (tcf_proto) objects added to a block, and
> > skipswcnt counted tp objects with the skipsw attribute set.
> >
> > The problem is: a single tp can contain multiple filters, some with skipsw
> > and others without. The current implementation fails in the case:
> >
> > When the first filter in a tp has skipsw, both skipswcnt and filtercnt
> > are incremented, then adding a second filter without skipsw to the same
> > tp does not modify these counters because tp->counted is already set.
> >
> > This results in bypass software behavior based solely on skipswcnt
> > equaling filtercnt, even when the block includes filters without
> > skipsw. Consequently, filters without skipsw are inadvertently bypassed.
> >
> > To address this, the patch introduces useswcnt in block to explicitly count
> > tp objects containing at least one filter without skipsw. Key changes
> > include:
> >
> > Whenever a filter without skipsw is added, its tp is marked with usesw
> > and counted in useswcnt. tc_run() now uses useswcnt to determine software
> > bypass, eliminating reliance on filtercnt and skipswcnt.
> >
> > This refined approach prevents software bypass for blocks containing
> > mixed filters, ensuring correct behavior in tc_run().
> >
> > Additionally, as atomic operations on useswcnt ensure thread safety and
> > tp->lock guards access to tp->usesw and tp->counted, the broader lock
> > down_write(&block->cb_lock) is no longer required in tc_new_tfilter(),
> > and this resolves a performance regression caused by the filter counting
> > mechanism during parallel filter insertions.
> >
> > The improvement can be demonstrated using the following script:
> >
> > # cat insert_tc_rules.sh
> >
> > tc qdisc add dev ens1f0np0 ingress
> > for i in $(seq 16); do
> > taskset -c $i tc -b rules_$i.txt &
> > done
> > wait
> >
> > Each of rules_$i.txt files above includes 100000 tc filter rules to a
> > mlx5 driver NIC ens1f0np0.
> >
> > Without this patch:
> >
> > # time sh insert_tc_rules.sh
> >
> > real 0m50.780s
> > user 0m23.556s
> > sys 4m13.032s
> >
> > With this patch:
> >
> > # time sh insert_tc_rules.sh
> >
> > real 0m17.718s
> > user 0m7.807s
> > sys 3m45.050s
> >
> > Fixes: 047f340b36fc ("net: sched: make skip_sw actually skip software")
> > Reported-by: Shuang Li <shuali@...hat.com>
> > Signed-off-by: Xin Long <lucien.xin@...il.com>
> > Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@...il.com>
>
> Given the quite large scope of this change and the functional and
> performance implications, I think it's more suited for net-next.
>
Sounds fine to me.
> > ---
> > include/net/pkt_cls.h | 18 +++++++-------
> > include/net/sch_generic.h | 5 ++--
> > net/core/dev.c | 11 ++-------
> > net/sched/cls_api.c | 52 +++++++++------------------------------
> > net/sched/cls_bpf.c | 2 ++
> > net/sched/cls_flower.c | 2 ++
> > net/sched/cls_matchall.c | 2 ++
> > net/sched/cls_u32.c | 2 ++
> > 8 files changed, 32 insertions(+), 62 deletions(-)
> >
> > diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
> > index cf199af85c52..d66cb315a6b5 100644
> > --- a/include/net/pkt_cls.h
> > +++ b/include/net/pkt_cls.h
> > @@ -74,15 +74,6 @@ static inline bool tcf_block_non_null_shared(struct tcf_block *block)
> > return block && block->index;
> > }
> >
> > -#ifdef CONFIG_NET_CLS_ACT
> > -DECLARE_STATIC_KEY_FALSE(tcf_bypass_check_needed_key);
>
> I think it would be better, if possible, to preserve this static key;
> will reduce the delta and avoid additional tests in fast-path for S/W
> only setup.
That's difficult. This static key will in/decrement according to
block->useswcnt, and we have to hold down_write(&block->cb_lock)
for its update when adding a filter, and the performance issue
will come back again.
>
> > -
> > -static inline bool tcf_block_bypass_sw(struct tcf_block *block)
> > -{
> > - return block && block->bypass_wanted;
> > -}
> > -#endif
> > -
> > static inline struct Qdisc *tcf_block_q(struct tcf_block *block)
> > {
> > WARN_ON(tcf_block_shared(block));
> > @@ -760,6 +751,15 @@ tc_cls_common_offload_init(struct flow_cls_common_offload *cls_common,
> > cls_common->extack = extack;
> > }
> >
> > +static inline void tcf_proto_update_usesw(struct tcf_proto *tp, u32 flags)
> > +{
> > + if (tp->usesw)
> > + return;
> > + if (tc_skip_sw(flags) && tc_in_hw(flags))
> > + return;
> > + tp->usesw = true;
> > +}
>
> It looks like 'usesw' is never cleared. Can't user-space change the
> skipsw flag for an existing tp?
skipsw flag belongs to a tc rule/filter, and a tp may include multiple
rules/filters, and a tp doesn't have flags for skipsw directly.
Now we are adding a tp->usesw to reflect if any rule without skipsw flag
is ever added in this tp. And yes, this tp->usesw will NOT be cleared
even if this rule without the skipsw flag is deleted. I guess that's
all we can do now.
If we want to use a tp->skipswcnt to dynamically track the skipsw flag
for a tp, some common code/functions for rule adding and deleting will
be needed. However,there's no such code in tc for a rule deleting, so
it's not doable unless we touch all cls_*.c files.
>
> [...]
> > diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
> > index d3a03c57545b..5e8f191fd820 100644
> > --- a/net/sched/cls_u32.c
> > +++ b/net/sched/cls_u32.c
> > @@ -1164,6 +1164,8 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
> > if (!tc_in_hw(n->flags))
> > n->flags |= TCA_CLS_FLAGS_NOT_IN_HW;
> >
> > + tcf_proto_update_usesw(tp, n->flags);
>
> Why don't you need to hook also in the 'key existing' branch on line 909?
>
Good catch, didn't notice that cls_u32 calls to replace hw filter in
two places.
Thanks.
Powered by blists - more mailing lists