linux-kernel - Re: [PATCH] x86/sev-es: Do not use copy_from_kernel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0d8a35c9-82ca-188a-529d-65fd01c40149@amd.com>
Date:   Fri, 8 Sep 2023 08:13:57 -0500
From:   Tom Lendacky <thomas.lendacky@....com>
To:     Dave Hansen <dave.hansen@...el.com>,
        Adam Dunlap <acdunlap@...gle.com>
Cc:     linux-kernel@...r.kernel.org, x86@...nel.org,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        "H. Peter Anvin" <hpa@...or.com>,
        David Hildenbrand <david@...hat.com>,
        Mike Rapoport <rppt@...nel.org>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Nikunj A Dadhania <nikunj@....com>,
        Dionna Glaze <dionnaglaze@...gle.com>,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>,
        Joerg Roedel <jroedel@...e.de>, Jacob Xu <jacobhxu@...gle.com>
Subject: Re: [PATCH] x86/sev-es: Do not use copy_from_kernel_nofault in early
 #VC handler

On 9/7/23 14:12, Dave Hansen wrote:
> On 9/7/23 11:03, Adam Dunlap wrote:
>> On Thu, Sep 7, 2023 at 10:06 AM Dave Hansen <dave.hansen@...el.com> wrote:
>>> How about something like the attached patch?
>>
>> I think that's a much better idea, but unfortunately we can't rely on
>> boot_cpu_data being 0'd out. While it is a static global variable that C says
>> should be 0'd, the early interrupt happens before .bss is cleared (Note if it
>> happens to be 0, then the __is_canonical_address check succeeds anyway -- the
>> boot failure only happens when that variable happens to be other random values).
>> If there's another check we could do, I'd agree that'd end up being a much
>> better patch. For example, maybe we could add a field to cpuinfo_x86 is_valid
>> that is manually (i.e. not part of the regular .bss clearing) set to false and
>> is only set to true after early_identify_cpu is finished. Or the
>> simplest thing would  be to just manually set x86_virt_bits to 0 somewhere super early.
> 
> Gah, what a mess.  So the guilty CPUID in question is this:
> 
>>          /* Setup and Load IDT */
>>          call    early_setup_idt
>>
>>          /* Check if nx is implemented */
>>          movl    $0x80000001, %eax
>>          cpuid
>>          movl    %edx,%edi
> 
> which is _barely_ after we have an IDT with which to generate
> exceptions.  What happens before this?  This isn't the first CPUID
> invocation.  Does this one just happen to #VC and all the others before
> don't?
> 
> In any case, the most straightforward way out of this mess is to just
> move boot_cpu_data out of .bss and explicitly initialize it along with
> some documentation explaining the situation.

Agreed, putting this in the .data section will ensure it is zero from the 
start.

> 
>>> Also, what's the root cause here?  What's causing the early exception?
>>> It is some silly CPUID leaf?  Should we be more careful to just avoid
>>> these exceptions?
>>
>> Yes, it is a CPUID instruction in secondary_startup_64 with the comment /* Check
>> if nx is implemented */. It appears to be pretty important. Potentially we could
>> paravirtualize the CPUID direclty (i.e. mess with the GHCB and make the vmgexit)
>> instead of taking the #VC, but I'm not sure that's a great idea.
> 
> What TDX does here is actually pretty nice.  It doesn't generate a #VE
> for these.
> 
> But seriously, is it even *possible* to spin up a SEV-SNP VM what
> doesn't have NX?

It is a common path, so while an SEV guest would have NX support, you 
would first have to determine that it is an SEV guest. That would take 
issuing a CPUID instruction in order to determine if a particular MSR can 
be read...

Ultimately, we could probably pass the encryption mask from the 
decompressor to the kernel and avoid some of the checks during early boot 
of the kernel proper. Is it possible to boot an x86 kernel without going 
through the decompressor?

Thanks,
Tom

> 
>> There's a couple of other CPUIDs that are called in early_identify_cpu between
>> when x86_virt_bits is set to 48 and when it is set to its real value (IIUC, it
>> may be set to 57 if 5 level paging is enabled), which could potentially cause
>> spurious failures in those later calls.
> 
> That should be easy enough to fix.
> 
> These things where we initialize a value and then write over it are
> always fragile.  Let's just make one function that does it right and
> does it once.
> 
> Totally untested patch attached.