lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 01 Dec 2022 12:05:49 +0100 From: Paolo Abeni <pabeni@...hat.com> To: Pedro Tammela <pctammela@...il.com>, netdev@...r.kernel.org Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org, jhs@...atatu.com, xiyou.wangcong@...il.com, jiri@...nulli.us, kuniyu@...zon.com, Pedro Tammela <pctammela@...atatu.com> Subject: Re: [PATCH net-next v2 0/3] net/sched: retpoline wrappers for tc On Mon, 2022-11-28 at 12:44 -0300, Pedro Tammela wrote: > In tc all qdics, classifiers and actions can be compiled as modules. > This results today in indirect calls in all transitions in the tc hierarchy. > Due to CONFIG_RETPOLINE, CPUs with mitigations=on might pay an extra cost on > indirect calls. For newer Intel cpus with IBRS the extra cost is > nonexistent, but AMD Zen cpus and older x86 cpus still go through the > retpoline thunk. > > Known built-in symbols can be optimized into direct calls, thus > avoiding the retpoline thunk. So far, tc has not been leveraging this > build information and leaving out a performance optimization for some > CPUs. In this series we wire up 'tcf_classify()' and 'tcf_action_exec()' > with direct calls when known modules are compiled as built-in as an > opt-in optimization. > > We measured these changes in one AMD Zen 3 cpu (Retpoline), one Intel 10th > Gen CPU (IBRS), one Intel 3rd Gen cpu (Retpoline) and one Intel Xeon CPU (IBRS) > using pktgen with 64b udp packets. Our test setup is a dummy device with > clsact and matchall in a kernel compiled with every tc module as built-in. > We observed a 3-6% speed up on the retpoline CPUs, when going through 1 > tc filter, Do yu have all the existing filters enabled at build time in your test kernel? the reported figures are quite higher then expected considering there are 7th new unlikely branch in between. Also it would be nice to have some figure for the last filter in the if chain. I fear we could have some regressions there even for 'retpoline' CPUs - given the long if chain - and u32 is AFAIK (not much actually) still quite used. Finally, it looks like the filter order in patch 1/3 is quite relevant, and it looks like you used the lexicographic order, I guess it should be better to sort them by 'relevance', if someone could provide a reasonable 'relevance' order. I personally would move ife, ipt and simple towards the bottom. Thanks, Paolo
Powered by blists - more mailing lists