lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2CE44BD3DBCF9541909CCB42F11CA3921C6FAACA@SFO1EXC-MBXP06.nbttech.com>
Date:	Fri, 10 May 2013 18:42:28 +0000
From:	Ming Lei <Ming.Lei@...erbed.com>
To:	"Luck, Tony" <tony.luck@...el.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC:	"mchehab@...hat.com" <mchehab@...hat.com>,
	"bp@...en8.de" <bp@...en8.de>
Subject: RE: x86_mce: mce_start uses number of phsical cores instead of
 logical cores

With hyperthread turns on, the num_online_cpus reports the number of all logical cores. What I found in testing is only half the cores receives the mce broadcast, so I assume only the physical cores get broadcast. I have two sockets 5646 onboard. num_online_cpus() returns 24 and I only get 12 cores enter do_machine_check. I used both edac error injection and hardware edac error injector as well in my testing.

cpumask_weight(cpu_core_mask(0)) / cpu_data(0).booted_cores returns the ratio between logical cores and physical cores. In my case it is two.

Here is intel spec:
Processor Number  E5645 
# of Cores  6 
# of Threads  12

Ming

-----Original Message-----
From: Luck, Tony [mailto:tony.luck@...el.com] 
Sent: Friday, May 10, 2013 11:14 AM
To: Ming Lei; linux-kernel@...r.kernel.org
Cc: mchehab@...hat.com; bp@...en8.de
Subject: RE: x86_mce: mce_start uses number of phsical cores instead of logical cores

> +#if NR_CPUS > 1
> +	cpus /= cpumask_weight(cpu_core_mask(0)) / cpu_data(0).booted_cores; 
> +#endif

Not entirely sure what you are trying to do here (apart from making "cpus"
be a smaller number).  What is the reasoning behind the right hand side of this expression?

Is this problem more related to how EDAC is injecting an error?  When I've used other methods (e.g. ACPI/EINJ) I end up with a machine check that is broadcast to all processors ... so "cpus = num_online_cpus()" is the correct[1] number of processors to wait for.

-Tony

[1] Andi may point me (again) to a fix to help deal with the case that Linux has taken some cpus offline. In that case this code is wrong as the "offline"
cpus will still show up for machine checks.  But there are troubling corner cases with the fix.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ