[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <0630921a-a99f-4577-bc8e-0aaf08b3175d@app.fastmail.com>
Date: Thu, 30 Oct 2025 10:35:09 -0700
From: "Andy Lutomirski" <luto@...nel.org>
To: "H. Peter Anvin" <hpa@...or.com>, "Dave Hansen" <dave.hansen@...el.com>,
 "Rick P Edgecombe" <rick.p.edgecombe@...el.com>,
 "Sohil Mehta" <sohil.mehta@...el.com>,
 "Thomas Gleixner" <tglx@...utronix.de>, "Ingo Molnar" <mingo@...hat.com>,
 "Borislav Petkov" <bp@...en8.de>,
 "the arch/x86 maintainers" <x86@...nel.org>,
 "Dave Hansen" <dave.hansen@...ux.intel.com>
Cc: "Jonathan Corbet" <corbet@....net>, "Ard Biesheuvel" <ardb@...nel.org>,
 "david.laight.linux@...il.com" <david.laight.linux@...il.com>,
 "jpoimboe@...nel.org" <jpoimboe@...nel.org>,
 "Andrew Cooper" <andrew.cooper3@...rix.com>,
 "Tony Luck" <tony.luck@...el.com>,
 "Alexander Shishkin" <alexander.shishkin@...ux.intel.com>,
 "Kirill A . Shutemov" <kas@...nel.org>,
 "Sean Christopherson" <seanjc@...gle.com>,
 "Randy Dunlap" <rdunlap@...radead.org>,
 "David Woodhouse" <dwmw@...zon.co.uk>,
 "Vegard Nossum" <vegard.nossum@...cle.com>, "Xin Li" <xin@...or.com>,
 "Linux Kernel Mailing List" <linux-kernel@...r.kernel.org>,
 "linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
 "Kees Cook" <kees@...nel.org>,
 "Peter Zijlstra (Intel)" <peterz@...radead.org>,
 "linux-efi@...r.kernel.org" <linux-efi@...r.kernel.org>,
 "Geert Uytterhoeven" <geert@...ux-m68k.org>
Subject: Re: [PATCH v10 08/15] x86/vsyscall: Reorganize the page fault emulation code
On Thu, Oct 30, 2025, at 10:22 AM, H. Peter Anvin wrote:
> On October 30, 2025 9:58:02 AM PDT, Andy Lutomirski <luto@...nel.org> wrote:
>>
>>
>>On Tue, Oct 7, 2025, at 11:48 AM, Dave Hansen wrote:
>>> On 10/7/25 11:37, Edgecombe, Rick P wrote:
>>>>>  	/*
>>>>>  	 * No point in checking CS -- the only way to get here is a user mode
>>>>>  	 * trap to a high address, which means that we're in 64-bit user code.
>>>> I don't know. Is this as true any more? We are now sometimes guessing based on
>>>> regs->ip of a #GP. What if the kernel accidentally tries to jump to the vsyscall
>>>> address? Then we are reading the kernel stack and strange things. Maybe it's
>>>> worth replacing the comment with a check? Feel free to call this paranoid.
>>>
>>> The first check in emulate_vsyscall() is:
>>>
>>>        /* Write faults or kernel-privilege faults never get fixed up. */
>>>        if ((error_code & (X86_PF_WRITE | X86_PF_USER)) != X86_PF_USER)
>>>                return false;
>>>
>>> If the kernel jumped to the vsyscall page, it would end up there, return
>>> false, and never reach the code near the "No point in checking CS" comment.
>>>
>>> Right? Or am I misunderstanding the scenario you're calling out?
>>>
>>> If I'm understanding it right, I'd be a bit reluctant to add a CS check
>>> as well.
>>
>>IMO it should boil down to exactly the same thing as the current code for the #PF case and, for #GP, there are two logical conditions that we care about:
>>
>>1. Are we in user mode?
>>
>>2. Are we using a 64-bit CS such that vsyscall emulation makes sense.
>>
>>Now I'd be a tiny bit surprised if a CPU allows you to lretq or similar to a 32-bit CS with >2^63 RIP, but what do I know?  One could test this on a variety of machines, both Intel and AMD, to see what actually happens.
>>
>>But the kernel wraps all this up as user_64bit_mode(regs).  If user_64bit_mode(regs) is true and RIP points to a vsyscall, then ISTM there aren't a whole lot of options.  Somehow we're in user mode, either via an exit from kernel mode or via CALL/JMP/whatever from user mode, and RIP is pointing at the vsyscall page, and CS is such that, in the absence of LASS, we would execute the vsyscall.  I suppose the #GP could be from some other cause than a LASS violation, but that doesn't seem worth worrying about.
>>
>>So I think all that's needed is to update "[PATCH v10 10/15] x86/vsyscall: Add vsyscall emulation for #GP" to check user_64bit_mode(regs) for the vsyscall case.  (As submitted, unless I missed something while composing the patches in my head, it's only checking user_mode(regs), and I think it's worth the single extra line of code to make the result a tiny bit more robust.)
>
> user_64bit_mode() is a CS check :)
>
> There is that one extra check for PARAVIRT_XXL that *could* be gotten 
> rid of by making the PV code report its 64-bit selector and patching it 
> into the test, but it is on the error path anyway...
In the hopefully unlikely event that anyone cares about #GP performance, they should probably care far, far more about the absurd PASID fix up than anything else :)
Powered by blists - more mailing lists
 
