[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <53272D79.5050605@eu.citrix.com>
Date: Mon, 17 Mar 2014 17:14:33 +0000
From: George Dunlap <george.dunlap@...citrix.com>
To: Jan Beulich <JBeulich@...e.com>, "H. Peter Anvin" <hpa@...or.com>
CC: David Vrabel <david.vrabel@...rix.com>,
Thomas Gleixner <tglx@...utronix.de>,
"xen-devel@...ts.xen.org" <xen-devel@...ts.xen.org>,
Sarah Newman <srn@...mr.com>, Ingo Molnar <mingo@...hat.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception
On 03/17/2014 05:05 PM, Jan Beulich wrote:
>>>> On 17.03.14 at 17:55, "H. Peter Anvin" <hpa@...or.com> wrote:
>> On 03/17/2014 05:19 AM, George Dunlap wrote:
>>> On Mon, Mar 17, 2014 at 3:33 AM, H. Peter Anvin <hpa@...or.com> wrote:
>>>> No, the right thing is to unf*ck the Xen braindamage and use eagerfpu as a
>> workaround for the legacy hypervisor versions.
>>> The interface wasn't an accident. In the most common case you'll want
>>> to clear the bit anyway. In PV mode clearing it would require an extra
>>> trip up into the hypervisor. So this saves one trip up into the
>>> hypervisor on every context switch which involves an FPU, at the
>>> expense of not being able to context-switch away when handling the
>>> trap.
>> The interface was a complete faceplant, because it caused failures.
>> You're not infinitely unconstrained since you want to play in the same
>> sandbox as the native architecture, and if you want to have a hope of
>> avoiding these kinds of failures you really need to avoid making random
>> "improvements", certainly not without an explicit guest opt-in (the same
>> we do for the native CPU architecture when adding new features.)
>>
>> So if this interface wasn't an accident it was active negligence and
>> incompetence.
> I don't think so - while it (as we now see) disallows certain things
> inside the guest, back at the time when this was designed there was
> no sign of any sort of allocation/scheduling being done inside the
> #NM handler. And furthermore, a PV specification is by its nature
> allowed to define deviations from real hardware behavior, or else it
> wouldn't be needed in the first place.
But it's certainly the case that deviating from the hardware in *this*
way by default was always very likely to case the exact kind of bug
we've seen here. It is an "interface trap" that was bound to be tripped
over (much like Intel's infamous sysret vulnerability).
Making it opt-in would have been a much better idea. But the people who
made that decision are long gone, and we now need to deal with the
situation as we have it.
-George
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists