lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9e7b0c92-5a3b-8099-8c69-83a9d62aced4@amd.com>
Date:   Wed, 20 Oct 2021 08:12:51 -0500
From:   Tom Lendacky <thomas.lendacky@....com>
To:     linux-kernel@...r.kernel.org, linux-tip-commits@...r.kernel.org
Cc:     Tim Chen <tim.c.chen@...ux.intel.com>,
        Barry Song <song.bao.hua@...ilicon.com>,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>, x86@...nel.org
Subject: Re: [tip: sched/core] sched: Add cluster scheduler level for x86

On 10/15/21 4:44 AM, tip-bot2 for Tim Chen wrote:
> The following commit has been merged into the sched/core branch of tip:
> 
> Commit-ID:     66558b730f2533cc2bf2b74d51f5f80b81e2bad0
> Gitweb:        https://git.kernel.org/tip/66558b730f2533cc2bf2b74d51f5f80b81e2bad0
> Author:        Tim Chen <tim.c.chen@...ux.intel.com>
> AuthorDate:    Fri, 24 Sep 2021 20:51:04 +12:00
> Committer:     Peter Zijlstra <peterz@...radead.org>
> CommitterDate: Fri, 15 Oct 2021 11:25:16 +02:00
> 
> sched: Add cluster scheduler level for x86
> 
> There are x86 CPU architectures (e.g. Jacobsville) where L2 cahce is
> shared among a cluster of cores instead of being exclusive to one
> single core.
> 
> To prevent oversubscription of L2 cache, load should be balanced
> between such L2 clusters, especially for tasks with no shared data.
> On benchmark such as SPECrate mcf test, this change provides a boost
> to performance especially on medium load system on Jacobsville.  on a
> Jacobsville that has 24 Atom cores, arranged into 6 clusters of 4
> cores each, the benchmark number is as follow:
> 
>   Improvement over baseline kernel for mcf_r
>   copies		run time	base rate
>   1		-0.1%		-0.2%
>   6		25.1%		25.1%
>   12		18.8%		19.0%
>   24		0.3%		0.3%
> 
> So this looks pretty good. In terms of the system's task distribution,
> some pretty bad clumping can be seen for the vanilla kernel without
> the L2 cluster domain for the 6 and 12 copies case. With the extra
> domain for cluster, the load does get evened out between the clusters.
> 
> Note this patch isn't an universal win as spreading isn't necessarily
> a win, particually for those workload who can benefit from packing.
> 
> Signed-off-by: Tim Chen <tim.c.chen@...ux.intel.com>
> Signed-off-by: Barry Song <song.bao.hua@...ilicon.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
> Link: https://lore.kernel.org/r/20210924085104.44806-4-21cnbao@gmail.com

I've bisected to this patch which now results in my EPYC systems issuing a 
lot of:

[    4.788480] BUG: arch topology borken
[    4.789578]      the SMT domain not a subset of the CLS domain

messages (one for each CPU in the system).

I haven't had a chance to dig deeper and understand everything, does 
anyone have some quick insights/ideas?

Thanks,
Tom

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ