lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 2 Jul 2014 15:41:21 +0900
From:	Yasuaki Ishimatsu <isimatu.yasuaki@...fujitsu.com>
To:	<tglx@...utronix.de>, <mingo@...hat.com>, <hpa@...or.com>
CC:	<x86@...nel.org>, <toshi.kani@...com>, <imammedo@...hat.com>,
	<bp@...en8.de>, <huawei.libin@...wei.com>,
	<paul.gortmaker@...driver.com>, <linux-kernel@...r.kernel.org>,
	<srivatsa.bhat@...ux.vnet.ibm.com>
Subject: [PATCH] x86,cpu-hotplug: clear llc_shared_mask at CPU hotplug

llc_shared_mask is not cleared even if cpu is offline or hot removed.
So when hot-plugging CPU, the mask has wrong value. The mask is used
by CSF schduler. So it breaks CFS scheduler.

Here is a example on my system.
My system has 4 sockets and each socket has 15 cores and HT is enabled.
In this case, each core of sockes is numbered as follows:

         | CPU#
Socket#0 | 0-14 , 60-74
Socket#1 | 15-29, 75-89
Socket#2 | 30-44, 90-104
Socket#3 | 45-59, 105-119

Then llc_shared_mask of CPU#30 has 0x3fff80000001fffc0000000.
It means that cache of Socket#2 is shared with CPU#30-44 and 90-104.

When hot-removing socket#2 and #3, each core of sockets is numbered
as follows:

         | CPU#
Socket#0 | 0-14 , 60-74
Socket#1 | 15-29, 75-89

But llc_shared_mask is not cleared. So llc_shared_mask of CPU#30 remains
having 0x3fff80000001fffc0000000.

After that, when hot-adding socket#2 and #3, each core of sockets is
numbered as follows:

         | CPU#
Socket#0 | 0-14 , 60-74
Socket#1 | 15-29, 75-89
Socket#2 | 30-59
Socket#3 | 90-119

Then llc_shared_mask of CPU#30 becomes 0x3fff8000fffffffc0000000.
It means that cache of Socket#2 is shared with CPU#30-59 and 90-104.
So the mask has wrong value.

This patch fixes above problem by clearing llc_shared_mask bit of
offlined cpu.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@...fujitsu.com>
---
 arch/x86/kernel/smpboot.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 5492798..893cd2b 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1279,6 +1279,7 @@ __init void prefill_possible_map(void)
 static void remove_siblinginfo(int cpu)
 {
 	int sibling;
+	int llc_shared;
 	struct cpuinfo_x86 *c = &cpu_data(cpu);

 	for_each_cpu(sibling, cpu_core_mask(cpu)) {
@@ -1290,9 +1291,12 @@ static void remove_siblinginfo(int cpu)
 			cpu_data(sibling).booted_cores--;
 	}

+	for_each_cpu(llc_shared, cpu_llc_shared_mask(cpu))
+		cpumask_clear_cpu(cpu, cpu_llc_shared_mask(llc_shared));
 	for_each_cpu(sibling, cpu_sibling_mask(cpu))
 		cpumask_clear_cpu(cpu, cpu_sibling_mask(sibling));
 	cpumask_clear(cpu_sibling_mask(cpu));
+	cpumask_clear(cpu_llc_shared_mask(cpu));
 	cpumask_clear(cpu_core_mask(cpu));
 	c->phys_proc_id = 0;
 	c->cpu_core_id = 0;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ