[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAM0EoM=9CdeZoHnFDAGjtmm07B=QBrrTNzoZVWUXf6+1Y4LdYg@mail.gmail.com>
Date: Mon, 28 Nov 2022 13:26:32 -0500
From: Jamal Hadi Salim <jhs@...atatu.com>
To: Pedro Tammela <pctammela@...il.com>
Cc: netdev@...r.kernel.org, davem@...emloft.net, edumazet@...gle.com,
kuba@...nel.org, pabeni@...hat.com, xiyou.wangcong@...il.com,
jiri@...nulli.us, Pedro Tammela <pctammela@...atatu.com>
Subject: Re: [PATCH RFC net-next 0/3] net/sched: retpoline wrappers for tc
You forgot to add the RFC tag. Also add my reviewed-by:
cheers,
jamal
On Fri, Nov 25, 2022 at 12:52 PM Pedro Tammela <pctammela@...il.com> wrote:
>
> In tc all qdics, classifiers and actions can be compiled as modules.
> This results today in indirect calls in all transitions in the tc hierarchy.
> Due to CONFIG_RETPOLINE, CPUs with mitigations=on might pay an extra cost on
> indirect calls. For newer Intel cpus with IBRS the extra cost is
> nonexistent, but AMD Zen cpus and older x86 cpus still go through the
> retpoline thunk.
>
> Known built-in symbols can be optimized into direct calls, thus
> avoiding the retpoline thunk. So far, tc has not been leveraging this
> build information and leaving out a performance optimization for some
> CPUs. In this series we wire up 'tcf_classify()' and 'tcf_action_exec()'
> with direct calls when known modules are compiled as built-in as an
> opt-in optimization.
>
> We measured these changes in one AMD Zen 3 cpu (Retpoline), one Intel 10th
> Gen CPU (IBRS), one Intel 3rd Gen cpu (Retpoline) and one Intel Xeon CPU (IBRS)
> using pktgen with 64b udp packets. Our test setup is a dummy device with
> clsact and matchall in a kernel compiled with every tc module as built-in.
> We observed a 6-10% speed up on the retpoline CPUs, when going through 1
> tc filter, and a 60-100% speed up when going through 100 filters.
> For the IBRS cpus we observed a 1-2% degradation in both scenarios, we believe
> the extra branches checks introduced a small overhead therefore we added
> a Kconfig option to make these changes opt-in even in CONFIG_RETPOLINE kernels.
>
> We are continuing to test on other hardware variants as we find them:
>
> 1 filter:
> CPU | before (pps) | after (pps) | diff
> R9 5950X | 4237838 | 4412241 | +4.1%
> R9 5950X | 4265287 | 4413757 | +3.4% [*]
> i5-3337U | 1580565 | 1682406 | +6.4%
> i5-10210U | 3006074 | 3006857 | +0.0%
> i5-10210U | 3160245 | 3179945 | +0.6% [*]
> Xeon 6230R | 3196906 | 3197059 | +0.0%
> Xeon 6230R | 3190392 | 3196153 | +0.01% [*]
>
> 100 filters:
> CPU | before (pps) | after (pps) | diff
> R9 5950X | 313469 | 633303 | +102.03%
> R9 5950X | 313797 | 633150 | +101.77% [*]
> i5-3337U | 127454 | 211210 | +65.71%
> i5-10210U | 389259 | 381765 | -1.9%
> i5-10210U | 408812 | 412730 | +0.9% [*]
> Xeon 6230R | 415420 | 406612 | -2.1%
> Xeon 6230R | 416705 | 405869 | -2.6% [*]
>
> [*] In these tests we ran pktgen with clone set to 1000.
>
> Pedro Tammela (3):
> net/sched: add retpoline wrapper for tc
> net/sched: avoid indirect act functions on retpoline kernels
> net/sched: avoid indirect classify functions on retpoline kernels
>
> include/net/tc_wrapper.h | 274 +++++++++++++++++++++++++++++++++++++
> net/sched/Kconfig | 13 ++
> net/sched/act_api.c | 3 +-
> net/sched/act_bpf.c | 6 +-
> net/sched/act_connmark.c | 6 +-
> net/sched/act_csum.c | 6 +-
> net/sched/act_ct.c | 4 +-
> net/sched/act_ctinfo.c | 6 +-
> net/sched/act_gact.c | 6 +-
> net/sched/act_gate.c | 6 +-
> net/sched/act_ife.c | 6 +-
> net/sched/act_ipt.c | 6 +-
> net/sched/act_mirred.c | 6 +-
> net/sched/act_mpls.c | 6 +-
> net/sched/act_nat.c | 7 +-
> net/sched/act_pedit.c | 6 +-
> net/sched/act_police.c | 6 +-
> net/sched/act_sample.c | 6 +-
> net/sched/act_simple.c | 6 +-
> net/sched/act_skbedit.c | 6 +-
> net/sched/act_skbmod.c | 6 +-
> net/sched/act_tunnel_key.c | 6 +-
> net/sched/act_vlan.c | 6 +-
> net/sched/cls_api.c | 3 +-
> net/sched/cls_basic.c | 6 +-
> net/sched/cls_bpf.c | 6 +-
> net/sched/cls_cgroup.c | 6 +-
> net/sched/cls_flow.c | 6 +-
> net/sched/cls_flower.c | 6 +-
> net/sched/cls_fw.c | 6 +-
> net/sched/cls_matchall.c | 6 +-
> net/sched/cls_route.c | 6 +-
> net/sched/cls_rsvp.c | 2 +
> net/sched/cls_rsvp.h | 7 +-
> net/sched/cls_rsvp6.c | 2 +
> net/sched/cls_tcindex.c | 7 +-
> net/sched/cls_u32.c | 6 +-
> 37 files changed, 417 insertions(+), 67 deletions(-)
> create mode 100644 include/net/tc_wrapper.h
>
> --
> 2.34.1
>
Powered by blists - more mailing lists