lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 24 Apr 2009 09:27:38 +0900 From: Hidetoshi Seto <seto.hidetoshi@...fujitsu.com> To: Huang Ying <ying.huang@...el.com> CC: Andi Kleen <andi@...stfloor.org>, "hpa@...or.com" <hpa@...or.com>, "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "mingo@...e.hu" <mingo@...e.hu>, "tglx@...utronix.de" <tglx@...utronix.de> Subject: Re: [PATCH] [4/4] x86: MCE: Fix EIPV behaviour with !PCC Huang Ying wrote: > Add some description for the patch, hope that to be more clear. > > Best Regards, > Huang Ying > --------------------------------------------------> > Impact: Spec compliance > > Tolerant level 0 means: always panic on uncorrected errors, that is, > panic even for recoverable uncorrected errors. This is a useful option > for someone think panic is the better hardware error containment > mechanism than trying to recover. > > Current implementation does not comply with the tolerant == 0 spec, > that is, it tries to recover (by killing related processes) for > recoverable uncorrected errors (errors triggered in userspace) when > tolerant == 0. This patch fixes this by going panic for that case. > > Signed-off-by: Huang Ying <ying.huang@...el.com> > Signed-off-by: Andi Kleen <ak@...ux.intel.com> > > --- > arch/x86/kernel/cpu/mcheck/mce_64.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > --- a/arch/x86/kernel/cpu/mcheck/mce_64.c > +++ b/arch/x86/kernel/cpu/mcheck/mce_64.c > @@ -400,7 +400,7 @@ void do_machine_check(struct pt_regs * r > * force_sig() takes an awful lot of locks and has a slight > * risk of deadlocking. > */ > - if (user_space) { > + if (user_space && tolerant > 0) { > force_sig(SIGBUS, current); > } else if (panic_on_oops || tolerant < 2) { > mce_panic("Uncorrected machine check", > Wait, I want confirmation. Given: * Tolerant levels: * 0: always panic on uncorrected errors, log corrected errors Let's walk do_machine_check(): 266 void do_machine_check(struct pt_regs * regs, long error_code) 267 { : 302 for (i = 0; i < banks; i++) { : 311 rdmsrl(MSR_IA32_MC0_STATUS + i*4, m.status); 312 if ((m.status & MCI_STATUS_VAL) == 0) 313 continue; : 319 if ((m.status & MCI_STATUS_UC) == 0) 320 continue; : # Now we start checking status with VAL and UC : 329 if (m.status & MCI_STATUS_EN) { 330 /* if PCC was set, there's no way out */ 331 no_way_out |= !!(m.status & MCI_STATUS_PCC); 332 /* 333 * If this error was uncorrectable and there was 334 * an overflow, we're in trouble. If no overflow, 335 * we might get away with just killing a task. 336 */ 337 if (m.status & MCI_STATUS_UC) { 338 if (tolerant < 1 || m.status & MCI_STATUS_OVER) 339 no_way_out = 1; 340 kill_it = 1; 341 } 342 } else { 343 /* 344 * Machine check event was not enabled. Clear, but 345 * ignore. 346 */ 347 continue; 348 } : # Humm, second UC check should be removed... # Anyway, in case of tolerant == 0, no_way_out == 1 if the event is enabled. # And kill_it == 1 unless there are no event enabled. # Therefore, in case of tolerant == 0, always "no_way_out == kill_it". : 364 } 365 } : 376 if (no_way_out && tolerant < 3) 377 mce_panic("Machine check", &panicm, mcestart); : # in case of tolerant == 0, we usually hit here. : 385 if (kill_it && tolerant < 3) { 386 int user_space = 0; 387 388 /* 389 * If the EIPV bit is set, it means the saved IP is the 390 * instruction which caused the MCE. 391 */ 392 if (m.mcgstatus & MCG_STATUS_EIPV) 393 user_space = panicm.ip && (panicm.cs & 3); 394 395 /* 396 * If we know that the error was in user space, send a 397 * SIGBUS. Otherwise, panic if tolerance is low. 398 * 399 * force_sig() takes an awful lot of locks and has a slight 400 * risk of deadlocking. 401 */ 402 if (user_space) { 403 force_sig(SIGBUS, current); 404 } else if (panic_on_oops || tolerant < 2) { 405 mce_panic("Uncorrected machine check", 406 &panicm, mcestart); 407 } 408 } : # Then, when we enter here with tolerant == 0 ? : 421 } Or, should this patch be applied after committing some of Andi's patches? It means this patch targets a bug in Andi's patch set and the bug is not in 2.6.30-rc* yet. Thanks, H.Seto -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists