linux-kernel - Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrWsWGf+WRf7gUWs5x=6Ts5KwqWpecrp-Q61jL7usTOvAQ@mail.gmail.com>
Date:	Wed, 12 Nov 2014 07:48:15 -0800
From:	Andy Lutomirski <luto@...capital.net>
To:	Borislav Petkov <bp@...en8.de>
Cc:	Andi Kleen <andi@...stfloor.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	X86 ML <x86@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
	Tony Luck <tony.luck@...el.com>,
	Oleg Nesterov <oleg@...hat.com>
Subject: Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

On Nov 12, 2014 2:30 AM, "Borislav Petkov" <bp@...en8.de> wrote:
>
> On Tue, Nov 11, 2014 at 06:06:48PM -0800, Tony Luck wrote:
> > > Innocent bystanders have RIPV=1, EIPV=0 in MCG_STATUS ... so they
> > > are quite easy to spot.
> >
> > Bother ... except for the SRAO cases where *everyone* is an innocent
> > bystander - but someone should go look for the error and queue up
> > a page offline event.  Perhaps for this we'd do the self-ipi trick and
> > then scan the banks to look for SRAO signatures to process as well
> > as clearing MCIP
>
> First of all, good morning :-)
>
> So I'm reading all those notes again and I can't say I'm convinced about
> the change. And all those "but but, what might happen if" cases are what
> bother me.
>
> First, the stack: switching to a possibly almost full kernel stack
> where we have other stuff already allocated is a bad idea IMO. There's
> a good reason to have a known-good, solely-for-our-purpose stack which
> we get to use. It is always there and it gives us the optimal conditions
> possible for recovery. Switching back to the possibly crowded kernel
> stack and causing stack overflows while handling an MCE is not an
> improvement in my book.

I only switch stacks on entry from userspace, and the kernel stack is
completely empty if that happens.

>
> Tony:
> > The current code has an ugly hole at the moment. End of
> > do_machine_check() clears MCG_STATUS. At that point we are still
> > running on the magic stack for machine check exceptions ... if we take
> > a machine check in the small window from there until we get off this
> > stack (iret) ... then we will enter the handler back on the same stack
> > that we haven't finished using yet. Bad things ensue.
>
> I'm not crazy about it either but how often does this actually happen in
> practice? And if it does happen, then, well, we die, so be it. It can
> happen with other MCEs coming in back-to-back like poisoned data (1st
> MCE) being consumed on another core (2nd MCE). I know, Intel does the
> broadcasting but that is going to change, I hear and AMD raises an MCE
> on the core which detects it only.
>
> Another big issue I see with this is verifying the locking works and
> generally testing this serious change. MCA is core kernel infrastructure
> and we cannot break this left and right. And changing stacks we're
> running on and the whole dynamics of the code might open a whole other
> can of worms.

One nice thing for testing is that my patch applies to int3 from
userspace as well, and that's easy to test.

>
> Oh, and we shouldn't forget why we're doing this: just to save two
> members to task_struct and getting rid of paranoid_userspace. Yep, still
> not convinced the effort is worth it.

I think I want to make this change anyway, though, since it may
simplify fsgsbase support enough to justify it solely on that account.
I don't think that the machine check code needs to change at all to
accommodate a stack switch, but I think it makes some simplifications
possible.

>
> Now, I think Andy had a much better/simpler/cleaner suggestion
> yesterday: task_work_add(). It is basically what _TIF_MCE_NOTIFY does
> and it is called in the same path - do_notify_resume() - but generic and
> we can drop _TIF_MCE_NOTIFY then and use the generic thing.
>
> Much less intrusive change.

Less intrusive is certainly true.

--Andy

>
> --
> Regards/Gruss,
>     Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/