lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2e0a2888c89db8226578106b0a7a3eeda7c94582.camel@redhat.com>
Date:   Thu, 01 Dec 2022 12:05:49 +0100
From:   Paolo Abeni <pabeni@...hat.com>
To:     Pedro Tammela <pctammela@...il.com>, netdev@...r.kernel.org
Cc:     davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
        jhs@...atatu.com, xiyou.wangcong@...il.com, jiri@...nulli.us,
        kuniyu@...zon.com, Pedro Tammela <pctammela@...atatu.com>
Subject: Re: [PATCH net-next v2 0/3] net/sched: retpoline wrappers for tc

On Mon, 2022-11-28 at 12:44 -0300, Pedro Tammela wrote:
> In tc all qdics, classifiers and actions can be compiled as modules.
> This results today in indirect calls in all transitions in the tc hierarchy.
> Due to CONFIG_RETPOLINE, CPUs with mitigations=on might pay an extra cost on
> indirect calls. For newer Intel cpus with IBRS the extra cost is
> nonexistent, but AMD Zen cpus and older x86 cpus still go through the
> retpoline thunk.
> 
> Known built-in symbols can be optimized into direct calls, thus
> avoiding the retpoline thunk. So far, tc has not been leveraging this
> build information and leaving out a performance optimization for some
> CPUs. In this series we wire up 'tcf_classify()' and 'tcf_action_exec()'
> with direct calls when known modules are compiled as built-in as an
> opt-in optimization.
> 
> We measured these changes in one AMD Zen 3 cpu (Retpoline), one Intel 10th
> Gen CPU (IBRS), one Intel 3rd Gen cpu (Retpoline) and one Intel Xeon CPU (IBRS)
> using pktgen with 64b udp packets. Our test setup is a dummy device with
> clsact and matchall in a kernel compiled with every tc module as built-in.
> We observed a 3-6% speed up on the retpoline CPUs, when going through 1
> tc filter, 

Do yu have all the existing filters enabled at build time in your test
kernel? the reported figures are quite higher then expected considering
there are 7th new unlikely branch in between.

Also it would be nice to have some figure for the last filter in the if
chain. I fear we could have some regressions there even for 'retpoline'
CPUs - given the long if chain - and u32 is AFAIK (not much actually)
still quite used.

Finally, it looks like the filter order in patch 1/3 is quite relevant,
and it looks like you used the lexicographic order, I guess it should
be better to sort them by 'relevance', if someone could provide a
reasonable 'relevance' order. I personally would move ife, ipt and
simple towards the bottom.

Thanks,

Paolo


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ