lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Mon, 28 Nov 2022 13:26:32 -0500
From:   Jamal Hadi Salim <jhs@...atatu.com>
To:     Pedro Tammela <pctammela@...il.com>
Cc:     netdev@...r.kernel.org, davem@...emloft.net, edumazet@...gle.com,
        kuba@...nel.org, pabeni@...hat.com, xiyou.wangcong@...il.com,
        jiri@...nulli.us, Pedro Tammela <pctammela@...atatu.com>
Subject: Re: [PATCH RFC net-next 0/3] net/sched: retpoline wrappers for tc

You forgot to add the RFC tag. Also add my reviewed-by:

cheers,
jamal

On Fri, Nov 25, 2022 at 12:52 PM Pedro Tammela <pctammela@...il.com> wrote:
>
> In tc all qdics, classifiers and actions can be compiled as modules.
> This results today in indirect calls in all transitions in the tc hierarchy.
> Due to CONFIG_RETPOLINE, CPUs with mitigations=on might pay an extra cost on
> indirect calls. For newer Intel cpus with IBRS the extra cost is
> nonexistent, but AMD Zen cpus and older x86 cpus still go through the
> retpoline thunk.
>
> Known built-in symbols can be optimized into direct calls, thus
> avoiding the retpoline thunk. So far, tc has not been leveraging this
> build information and leaving out a performance optimization for some
> CPUs. In this series we wire up 'tcf_classify()' and 'tcf_action_exec()'
> with direct calls when known modules are compiled as built-in as an
> opt-in optimization.
>
> We measured these changes in one AMD Zen 3 cpu (Retpoline), one Intel 10th
> Gen CPU (IBRS), one Intel 3rd Gen cpu (Retpoline) and one Intel Xeon CPU (IBRS)
> using pktgen with 64b udp packets. Our test setup is a dummy device with
> clsact and matchall in a kernel compiled with every tc module as built-in.
> We observed a 6-10% speed up on the retpoline CPUs, when going through 1
> tc filter, and a 60-100% speed up when going through 100 filters.
> For the IBRS cpus we observed a 1-2% degradation in both scenarios, we believe
> the extra branches checks introduced a small overhead therefore we added
> a Kconfig option to make these changes opt-in even in CONFIG_RETPOLINE kernels.
>
> We are continuing to test on other hardware variants as we find them:
>
> 1 filter:
> CPU        | before (pps) | after (pps) | diff
> R9 5950X   | 4237838      | 4412241     | +4.1%
> R9 5950X   | 4265287      | 4413757     | +3.4%   [*]
> i5-3337U   | 1580565      | 1682406     | +6.4%
> i5-10210U  | 3006074      | 3006857     | +0.0%
> i5-10210U  | 3160245      | 3179945     | +0.6%   [*]
> Xeon 6230R | 3196906      | 3197059     | +0.0%
> Xeon 6230R | 3190392      | 3196153     | +0.01%  [*]
>
> 100 filters:
> CPU        | before (pps) | after (pps) | diff
> R9 5950X   | 313469       | 633303      | +102.03%
> R9 5950X   | 313797       | 633150      | +101.77% [*]
> i5-3337U   | 127454       | 211210      | +65.71%
> i5-10210U  | 389259       | 381765      | -1.9%
> i5-10210U  | 408812       | 412730      | +0.9%    [*]
> Xeon 6230R | 415420       | 406612      | -2.1%
> Xeon 6230R | 416705       | 405869      | -2.6%    [*]
>
> [*] In these tests we ran pktgen with clone set to 1000.
>
> Pedro Tammela (3):
>   net/sched: add retpoline wrapper for tc
>   net/sched: avoid indirect act functions on retpoline kernels
>   net/sched: avoid indirect classify functions on retpoline kernels
>
>  include/net/tc_wrapper.h   | 274 +++++++++++++++++++++++++++++++++++++
>  net/sched/Kconfig          |  13 ++
>  net/sched/act_api.c        |   3 +-
>  net/sched/act_bpf.c        |   6 +-
>  net/sched/act_connmark.c   |   6 +-
>  net/sched/act_csum.c       |   6 +-
>  net/sched/act_ct.c         |   4 +-
>  net/sched/act_ctinfo.c     |   6 +-
>  net/sched/act_gact.c       |   6 +-
>  net/sched/act_gate.c       |   6 +-
>  net/sched/act_ife.c        |   6 +-
>  net/sched/act_ipt.c        |   6 +-
>  net/sched/act_mirred.c     |   6 +-
>  net/sched/act_mpls.c       |   6 +-
>  net/sched/act_nat.c        |   7 +-
>  net/sched/act_pedit.c      |   6 +-
>  net/sched/act_police.c     |   6 +-
>  net/sched/act_sample.c     |   6 +-
>  net/sched/act_simple.c     |   6 +-
>  net/sched/act_skbedit.c    |   6 +-
>  net/sched/act_skbmod.c     |   6 +-
>  net/sched/act_tunnel_key.c |   6 +-
>  net/sched/act_vlan.c       |   6 +-
>  net/sched/cls_api.c        |   3 +-
>  net/sched/cls_basic.c      |   6 +-
>  net/sched/cls_bpf.c        |   6 +-
>  net/sched/cls_cgroup.c     |   6 +-
>  net/sched/cls_flow.c       |   6 +-
>  net/sched/cls_flower.c     |   6 +-
>  net/sched/cls_fw.c         |   6 +-
>  net/sched/cls_matchall.c   |   6 +-
>  net/sched/cls_route.c      |   6 +-
>  net/sched/cls_rsvp.c       |   2 +
>  net/sched/cls_rsvp.h       |   7 +-
>  net/sched/cls_rsvp6.c      |   2 +
>  net/sched/cls_tcindex.c    |   7 +-
>  net/sched/cls_u32.c        |   6 +-
>  37 files changed, 417 insertions(+), 67 deletions(-)
>  create mode 100644 include/net/tc_wrapper.h
>
> --
> 2.34.1
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ