linux-kernel - Re: [RFC patch v3 00/20] Cache aware scheduling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <9dbdfec2-0a2d-4110-b8ef-b9d16900ff87@intel.com>
Date: Thu, 19 Jun 2025 21:21:37 +0800
From: "Chen, Yu C" <yu.c.chen@...el.com>
To: Yangyu Chen <cyy@...self.name>, Tim Chen <tim.c.chen@...ux.intel.com>,
	Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, "K
 Prateek Nayak" <kprateek.nayak@....com>, "Gautham R . Shenoy"
	<gautham.shenoy@....com>
CC: Juri Lelli <juri.lelli@...hat.com>, Dietmar Eggemann
	<dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>, Ben Segall
	<bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, Valentin Schneider
	<vschneid@...hat.com>, Tim Chen <tim.c.chen@...el.com>, Vincent Guittot
	<vincent.guittot@...aro.org>, Libo Chen <libo.chen@...cle.com>, Abel Wu
	<wuyun.abel@...edance.com>, Madadi Vineeth Reddy <vineethr@...ux.ibm.com>,
	Hillf Danton <hdanton@...a.com>, Len Brown <len.brown@...el.com>,
	<linux-kernel@...r.kernel.org>
Subject: Re: [RFC patch v3 00/20] Cache aware scheduling

On 6/19/2025 2:39 PM, Yangyu Chen wrote:
> Nice work!
> 
> I've tested your patch based on commit fb4d33ab452e and found it
> incredibly helpful for Verilator with large RTL simulations like
> XiangShan [1] on AMD EPYC Geona.
> 
> I've created a simple benchmark [2] using a static build of an
> 8-thread Verilator of XiangShan. Simply clone the repository and
> run `make run`.
> 
> In a static allocated 8-CCX KVM (with a total of 128 vCPUs) on EPYC
> 9T24, before the patch, we have a simulation time of 49.348ms. This
> was because each thread was distributed across every CCX, resulting
> in extremely high core-to-core latency. However, after applying the
> patch, the entire 8-thread Verilator is allocated to a single CCX.
> Consequently, the simulation time was reduced to 24.196ms, which
> is a remarkable 2.03x faster than before. We don't need numactl
> anymore!
> 
> [1] https://github.com/OpenXiangShan/XiangShan
> [2] https://github.com/cyyself/chacha20-xiangshan
> 
> Tested-by: Yangyu Chen <cyy@...self.name>
> 

Thanks Yangyu for your test. May I know if these 8-threads have any
data sharing with each other, or each thread has their dedicated
data? Or, there is 1 main thread, the other 7 threads do the
chacha20 rotate and put the result to the main thread?
Anyway I tested it on a Xeon EMR with turbo-disabled and saw ~20%
reduction in the total time.

Thanks,
Chenyu