lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20141112173048.GI16807@pd.tnic>
Date:	Wed, 12 Nov 2014 18:30:48 +0100
From:	Borislav Petkov <bp@...en8.de>
To:	"Luck, Tony" <tony.luck@...el.com>
Cc:	Andy Lutomirski <luto@...capital.net>,
	Andi Kleen <andi@...stfloor.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	X86 ML <x86@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
	Oleg Nesterov <oleg@...hat.com>
Subject: Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from
 userspace

On Wed, Nov 12, 2014 at 05:17:55PM +0000, Luck, Tony wrote:
> > Not that easy for testing the #MC path - there we have to inject real
> > MCEs and then noodle through the memory_failure() code. I'd be very much
> > interested to see what would happen if two MCEs happen back-to-back with
> > your change, the second one being raised when we're on the kernel stack
> > and in memory_failure()...
> 
> If the second one hits before we clear MCG_STATUS, then the processor resets.
> 
> If the second one is caused by the recovery thread somewhere in memory_failure(),
> then Andy won't switch stacks - but we will declare this a fatal error an panic (we have
> no recovery from machine checks in the kernel).
> 
> Otherwise the memory_failure() thread is the innocent bystander. If the affected thread
> decides to do recovery, then the first thread will be allowed to return and continue.
> 
> I might worry a bit if the second error is another thread hitting the *same* page which
> hasn't finished processing yet ... then the second will chase along behind the first trying
> to fix the same problem.  I *think* the first will complete and the second will just end
> up here:
> 
> 	if (TestSetPageHWPoison(p)) {
> 		printk(KERN_ERR "MCE %#lx: already hardware poisoned\n", pfn);
> 		return 0;
> 	}
> 
> which is really early in memory_failure().

Yeah, I meant this case: when we have switched stacks, exited
do_machine_check() and running the recovery code. Exactly then we get
another MCE. And the code might handle it, as you say, but I'd like to
see this in action first to be sure - it is not trivial code.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ