[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2CE44BD3DBCF9541909CCB42F11CA3921C6FAACA@SFO1EXC-MBXP06.nbttech.com>
Date: Fri, 10 May 2013 18:42:28 +0000
From: Ming Lei <Ming.Lei@...erbed.com>
To: "Luck, Tony" <tony.luck@...el.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC: "mchehab@...hat.com" <mchehab@...hat.com>,
"bp@...en8.de" <bp@...en8.de>
Subject: RE: x86_mce: mce_start uses number of phsical cores instead of
logical cores
With hyperthread turns on, the num_online_cpus reports the number of all logical cores. What I found in testing is only half the cores receives the mce broadcast, so I assume only the physical cores get broadcast. I have two sockets 5646 onboard. num_online_cpus() returns 24 and I only get 12 cores enter do_machine_check. I used both edac error injection and hardware edac error injector as well in my testing.
cpumask_weight(cpu_core_mask(0)) / cpu_data(0).booted_cores returns the ratio between logical cores and physical cores. In my case it is two.
Here is intel spec:
Processor Number E5645
# of Cores 6
# of Threads 12
Ming
-----Original Message-----
From: Luck, Tony [mailto:tony.luck@...el.com]
Sent: Friday, May 10, 2013 11:14 AM
To: Ming Lei; linux-kernel@...r.kernel.org
Cc: mchehab@...hat.com; bp@...en8.de
Subject: RE: x86_mce: mce_start uses number of phsical cores instead of logical cores
> +#if NR_CPUS > 1
> + cpus /= cpumask_weight(cpu_core_mask(0)) / cpu_data(0).booted_cores;
> +#endif
Not entirely sure what you are trying to do here (apart from making "cpus"
be a smaller number). What is the reasoning behind the right hand side of this expression?
Is this problem more related to how EDAC is injecting an error? When I've used other methods (e.g. ACPI/EINJ) I end up with a machine check that is broadcast to all processors ... so "cpus = num_online_cpus()" is the correct[1] number of processors to wait for.
-Tony
[1] Andi may point me (again) to a fix to help deal with the case that Linux has taken some cpus offline. In that case this code is wrong as the "offline"
cpus will still show up for machine checks. But there are troubling corner cases with the fix.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists