lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <51C41D94.1030201@linux.vnet.ibm.com>
Date:	Fri, 21 Jun 2013 15:02:04 +0530
From:	"Naveen N. Rao" <naveen.n.rao@...ux.vnet.ibm.com>
To:	Borislav Petkov <bp@...en8.de>
CC:	tony.luck@...el.com, ananth@...ibm.com, masbock@...ux.vnet.ibm.com,
	lcm@...ux.vnet.ibm.com, linux-kernel@...r.kernel.org,
	linux-acpi@...r.kernel.org, ying.huang@...el.com
Subject: Re: [PATCH v2 1/2] mce: acpi/apei: Honour Firmware First for MCA
 banks listed in APEI HEST CMC

On 06/21/2013 02:06 PM, Borislav Petkov wrote:
> On Fri, Jun 21, 2013 at 01:16:50PM +0530, Naveen N. Rao wrote:
>> Yes, but I'm afraid this won't work either - mce_banks_owned is
>> cleared during cpu offline. This is necessary since a cmci
>> rediscover is triggered on cpu offline, so that if this bank is
>> shared across cores, a different cpu can claim ownership of this
>> bank.
>
> What for? Sounds strange to me.

Look at section "15.5.1 CMCI Local APIC Interface" from Intel SDM Vol. 
3, and the subsequent section on "System Software Recommendation for 
Managing CMCI and Machine Check Resources":
"For example, if a corrected bit error in a cache shared by two logical 
processors caused a CMCI, the interrupt will be delivered to both 
logical processors sharing that microarchitectural sub-system."

In other words, some of the MC banks are shared across logical cpus in a 
core and some across all cores in a package. During initialization, the 
first cpu in a core ends up owning most of the banks specific to the 
core/package. When this cpu is offlined, we would want the second cpu in 
that core to discover and enable CMCI for those MC banks which it shares 
with the first cpu.

As an example, consider a hypothetical single-core Intel processor with 
Hyperthreading. On init, let's say the first cpu ends up owning banks 1, 
2, 3 and 4; and the second cpu ends up owning banks 1 and 2. This would 
mean that MC banks 1 and 2 are "hyperthread"-specific, while banks 3 and 
4 are shared. Now, if we offline the first cpu, it disables CMCI on all 
4 banks. However, banks 3 and 4 are shared. So, if we now do a cmci 
rediscovery, the second cpu will see that banks 3 and 4 don't have CMCI 
enabled and will then claim ownership of those so that we can continue 
to receive and process CMCIs from those subsystems.

Makes sense now?


Thanks,
Naveen

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ