lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Mon, 18 Oct 2021 22:44:20 +0800 From: Luming Yu <luming.yu@...il.com> To: Borislav Petkov <bp@...en8.de> Cc: JY Ni <jiayu.ni@...ux.alibaba.com>, wujinhua <wujinhua@...ux.alibaba.com>, x86 <x86@...nel.org>, "zelin.deng" <zelin.deng@...ux.alibaba.com>, ak <ak@...ux.intel.com>, "luming.yu" <luming.yu@...el.com>, "fan.du" <fan.du@...el.com>, "artie.ding" <artie.ding@...ux.alibaba.com>, "tony.luck" <tony.luck@...el.com>, tglx <tglx@...utronix.de>, linux-kernel <linux-kernel@...r.kernel.org>, "pawan.kumar.gupta" <pawan.kumar.gupta@...ux.intel.com>, "fenghua.yu" <fenghua.yu@...el.com>, hpa <hpa@...or.com>, "ricardo.neri-calderon" <ricardo.neri-calderon@...ux.intel.com>, peterz <peterz@...radead.org> Subject: Re: 回复:[PATCH] perf: optimize clear page in Intel specified model with movq instruction On Mon, Oct 18, 2021 at 8:43 PM Borislav Petkov <bp@...en8.de> wrote: > > On Mon, Oct 18, 2021 at 03:43:46PM +0800, JY Ni wrote: > > _*Precondition:*__*do tests on a Intel CPX server.*_ CPU information of my > > test machine is in backup part._* > > My machine: > > processor : 0 > vendor_id : GenuineIntel > cpu family : 6 > model : 106 > stepping : 4 > > That's a SKYLAKE_X. > > I ran > > ./tools/perf/perf stat --repeat 5 --sync --pre=/root/bin/pre-build-kernel.sh -- make -s -j96 bzImage > > on -rc6, building allmodconfig each of the 10 times. > > pre-build-kernel.sh is > > --- > #!/bin/bash > > make -s clean > echo 3 > /proc/sys/vm/drop_caches > --- > > Results are below but to me that's all "in the noise" with around one > percent if I can trust the stddev. Which is not even close to 40%. > > So basically you're wasting your time. > > 5.15-rc6 > -------- > > # ./tools/perf/perf stat --repeat 5 --sync --pre=/root/bin/pre-build-kernel.sh -- make -s -j96 bzImage > > Performance counter stats for 'make -s -j96 bzImage' (5 runs): > > 3,072,392.92 msec task-clock # 51.109 CPUs utilized ( +- 0.05% ) > 1,351,534 context-switches # 440.257 /sec ( +- 0.99% ) > 224,862 cpu-migrations # 73.248 /sec ( +- 1.39% ) > 85,073,723 page-faults # 27.712 K/sec ( +- 0.01% ) > 8,743,357,421,495 cycles # 2.848 GHz ( +- 0.06% ) > 7,643,946,991,468 instructions # 0.88 insn per cycle ( +- 0.00% ) > 1,705,128,638,240 branches # 555.440 M/sec ( +- 0.00% ) > 37,637,576,027 branch-misses # 2.21% of all branches ( +- 0.03% ) > 22,511,903,971,150 slots # 7.333 G/sec ( +- 0.03% ) > 7,377,211,958,188 topdown-retiring # 32.5% retiring ( +- 0.02% ) > 3,145,247,374,138 topdown-bad-spec # 13.9% bad speculation ( +- 0.27% ) > 8,018,664,899,041 topdown-fe-bound # 35.2% frontend bound ( +- 0.07% ) > 4,167,103,609,622 topdown-be-bound # 18.3% backend bound ( +- 0.09% ) > > 60.114 +- 0.112 seconds time elapsed ( +- 0.19% ) > > > > 5.15-rc6 + patch > ---------------- > > Performance counter stats for 'make -s -j96 bzImage' (5 runs): > > 3,033,250.65 msec task-clock # 51.243 CPUs utilized ( +- 0.05% ) > 1,329,033 context-switches # 438.210 /sec ( +- 0.64% ) > 225,550 cpu-migrations # 74.369 /sec ( +- 1.36% ) > 85,080,938 page-faults # 28.053 K/sec ( +- 0.00% ) > 8,629,663,367,477 cycles # 2.845 GHz ( +- 0.05% ) > 7,696,237,813,803 instructions # 0.89 insn per cycle ( +- 0.00% ) > 1,709,909,494,107 branches # 563.793 M/sec ( +- 0.00% ) > 37,719,552,337 branch-misses # 2.21% of all branches ( +- 0.02% ) > 22,214,249,023,820 slots # 7.325 G/sec ( +- 0.06% ) > 7,412,342,725,008 topdown-retiring # 33.0% retiring ( +- 0.01% ) > 3,141,090,408,028 topdown-bad-spec # 14.1% bad speculation ( +- 0.17% ) > 7,996,077,873,517 topdown-fe-bound # 35.6% frontend bound ( +- 0.03% ) > 3,862,154,886,962 topdown-be-bound # 17.3% backend bound ( +- 0.28% ) > > 59.193 +- 0.302 seconds time elapsed ( +- 0.51% ) I'm trying to duplicate the difference and get noticed that time && perf stat might have a different scale view about the real time spent on the job. And jiayu.ni's time diff showed the best at 32 jobs and the worst at 96 jobs. [linux-5.15-rc6]# time make -s bzImage -j96 real 1m8.922s user 55m25.750s sys 7m30.666s [linux-5.15-rc6]# make -s clean [linux-5.15-rc6]# perf stat make -s bzImage -j96 .. 61.461679693 seconds time elapsed 2756.927852000 seconds user 369.365209000 seconds sys If kbuild time that jiayu.ni has shared is not a solid proof for the optimization idea can be accepted, we can try other clear_page heavy workloads. > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette
Powered by blists - more mailing lists