netdev - Re: [PATCH net-next 3/3] net: sched: make skip

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAM0EoMngVoBcbX7cqTdbW8dG1v_ysc1SZK+4y-9j-5Tbq6gaYw@mail.gmail.com>
Date: Fri, 16 Feb 2024 10:07:34 -0500
From: Jamal Hadi Salim <jhs@...atatu.com>
To: Jiri Pirko <jiri@...nulli.us>
Cc: Asbjørn Sloth Tønnesen <ast@...erby.net>, 
	Cong Wang <xiyou.wangcong@...il.com>, Daniel Borkmann <daniel@...earbox.net>, netdev@...r.kernel.org, 
	linux-kernel@...r.kernel.org, llu@...erby.dk, Vlad Buslov <vladbu@...dia.com>, 
	Marcelo Ricardo Leitner <mleitner@...hat.com>
Subject: Re: [PATCH net-next 3/3] net: sched: make skip_sw actually skip software

On Fri, Feb 16, 2024 at 7:57 AM Jiri Pirko <jiri@...nulli.us> wrote:
>
> Thu, Feb 15, 2024 at 06:49:05PM CET, jhs@...atatu.com wrote:
> >On Thu, Feb 15, 2024 at 11:06 AM Asbjørn Sloth Tønnesen <ast@...erby.net> wrote:
> >>
> >> TC filters come in 3 variants:
> >> - no flag (no opinion, process wherever possible)
> >> - skip_hw (do not process filter by hardware)
> >> - skip_sw (do not process filter by software)
> >>
> >> However skip_sw is implemented so that the skip_sw
> >> flag can first be checked, after it has been matched.
> >>
> >> IMHO it's common when using skip_sw, to use it on all rules.
> >>
> >> So if all filters in a block is skip_sw filters, then
> >> we can bail early, we can thus avoid having to match
> >> the filters, just to check for the skip_sw flag.
> >>
> >>  +----------------------------+--------+--------+--------+
> >>  | Test description           | Pre    | Post   | Rel.   |
> >>  |                            | kpps   | kpps   | chg.   |
> >>  +----------------------------+--------+--------+--------+
> >>  | basic forwarding + notrack | 1264.9 | 1277.7 |  1.01x |
> >>  | switch to eswitch mode     | 1067.1 | 1071.0 |  1.00x |
> >>  | add ingress qdisc          | 1056.0 | 1059.1 |  1.00x |
> >>  +----------------------------+--------+--------+--------+
> >>  | 1 non-matching rule        |  927.9 | 1057.1 |  1.14x |
> >>  | 10 non-matching rules      |  495.8 | 1055.6 |  2.13x |
> >>  | 25 non-matching rules      |  280.6 | 1053.5 |  3.75x |
> >>  | 50 non-matching rules      |  162.0 | 1055.7 |  6.52x |
> >>  | 100 non-matching rules     |   87.7 | 1019.0 | 11.62x |
> >>  +----------------------------+--------+--------+--------+
> >>
> >> perf top (100 n-m skip_sw rules - pre patch):
> >>   25.57%  [kernel]  [k] __skb_flow_dissect
> >>   20.77%  [kernel]  [k] rhashtable_jhash2
> >>   14.26%  [kernel]  [k] fl_classify
> >>   13.28%  [kernel]  [k] fl_mask_lookup
> >>    6.38%  [kernel]  [k] memset_orig
> >>    3.22%  [kernel]  [k] tcf_classify
> >>
> >> perf top (100 n-m skip_sw rules - post patch):
> >>    4.28%  [kernel]  [k] __dev_queue_xmit
> >>    3.80%  [kernel]  [k] check_preemption_disabled
> >>    3.68%  [kernel]  [k] nft_do_chain
> >>    3.08%  [kernel]  [k] __netif_receive_skb_core.constprop.0
> >>    2.59%  [kernel]  [k] mlx5e_xmit
> >>    2.48%  [kernel]  [k] mlx5e_skb_from_cqe_mpwrq_nonlinear
> >>
> >
> >The concept makes sense - but i am wondering when you have a mix of
> >skip_sw and skip_hw if it makes more sense to just avoid looking up
> >skip_sw at all in the s/w datapath? Potentially by separating the
> >hashes for skip_sw/hw. I know it's a deeper surgery - but would be
>
> Yeah, there could be 2 hashes: skip_sw/rest
> rest is the only one that needs to be looked-up in kernel datapath.
> skip_sw is just for control path.
>
> But is it worth the efford? I mean, since now, nobody seemed to care. If
> this patchset solves the problem for this usecase, I think it is enough.
>

May not be worth the effort - and this is a reasonable use case. The
approach is a hack nonetheless and kills at least some insects. To
address the issues Vlad brought up, perhaps we should wrap it under
some kconfig.

cheers,
jamal

> In that case, I'm fine with this patch:
>
> Reviewed-by: Jiri Pirko <jiri@...dia.com>
>
>
>
> >more general purpose....unless i am missing something
> >
> >> Test setup:
> >>  DUT: Intel Xeon D-1518 (2.20GHz) w/ Nvidia/Mellanox ConnectX-6 Dx 2x100G
> >>  Data rate measured on switch (Extreme X690), and DUT connected as
> >>  a router on a stick, with pktgen and pktsink as VLANs.
> >>  Pktgen was in range 12.79 - 12.95 Mpps across all tests.
> >>
> >
> >Hrm. Those are "tiny" numbers (25G @64B is about 3x that). What are
> >the packet sizes?
> >Perhaps the traffic generator is a limitation here?
> >Also feels like you are doing exact matches? A sample flower rule
> >would have helped.
> >
> >cheers,
> >jamal
> >> Signed-off-by: Asbjørn Sloth Tønnesen <ast@...erby.net>
> >> ---
> >>  include/net/pkt_cls.h | 5 +++++
> >>  net/core/dev.c        | 3 +++
> >>  2 files changed, 8 insertions(+)
> >>
> >> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
> >> index a4ee43f493bb..a065da4df7ff 100644
> >> --- a/include/net/pkt_cls.h
> >> +++ b/include/net/pkt_cls.h
> >> @@ -74,6 +74,11 @@ static inline bool tcf_block_non_null_shared(struct tcf_block *block)
> >>         return block && block->index;
> >>  }
> >>
> >> +static inline bool tcf_block_has_skip_sw_only(struct tcf_block *block)
> >> +{
> >> +       return block && atomic_read(&block->filtercnt) == atomic_read(&block->skipswcnt);
> >> +}
> >> +
> >>  static inline struct Qdisc *tcf_block_q(struct tcf_block *block)
> >>  {
> >>         WARN_ON(tcf_block_shared(block));
> >> diff --git a/net/core/dev.c b/net/core/dev.c
> >> index d8dd293a7a27..7cd014e5066e 100644
> >> --- a/net/core/dev.c
> >> +++ b/net/core/dev.c
> >> @@ -3910,6 +3910,9 @@ static int tc_run(struct tcx_entry *entry, struct sk_buff *skb,
> >>         if (!miniq)
> >>                 return ret;
> >>
> >> +       if (tcf_block_has_skip_sw_only(miniq->block))
> >> +               return ret;
> >> +
> >>         tc_skb_cb(skb)->mru = 0;
> >>         tc_skb_cb(skb)->post_ct = false;
> >>         tcf_set_drop_reason(skb, *drop_reason);
> >> --
> >> 2.43.0
> >>