linux-kernel - Re: [PATCH v5 22/34] x86/fred: FRED initialization code

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5f81c066-d82f-b66f-9c6d-a9e6a0d5aa4f@citrix.com>
Date:   Tue, 21 Mar 2023 01:02:14 +0000
From:   "andrew.cooper3@...rix.com" <andrew.cooper3@...rix.com>
To:     "Li, Xin3" <xin3.li@...el.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Lai Jiangshan <jiangshanlai@...il.com>
Cc:     "H. Peter Anvin" <hpa@...or.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "x86@...nel.org" <x86@...nel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "bp@...en8.de" <bp@...en8.de>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        "Christopherson,, Sean" <seanjc@...gle.com>,
        "pbonzini@...hat.com" <pbonzini@...hat.com>,
        "Shankar, Ravi V" <ravi.v.shankar@...el.com>
Subject: Re: [PATCH v5 22/34] x86/fred: FRED initialization code

On 21/03/2023 12:12 am, Li, Xin3 wrote:
>>> If there is no other concrete reason other than overflowing for
>>> assigning NMI and #DB with a stack level > 0, #VE should also be
>>> assigned with a stack level > 0, and #BP too. #VE can happen anytime
>>> and anywhere, so it is subject to overflowing too.
>> So #BP needs the stack-gap (redzone) for text_poke_bp().
>>
>> #BP can end up in kprobes which can then end up in ftrace/perf, depending on
>> how it's all wired up.
>>
>> #VE is currently a trainwreck vs NMI/MCE, but I think FRED solves the worst of
>> that. I'm not exactly sure how deep the #VE handler goes.
>>
> VE under IDT is *not* using an IST, we need some solid rationales here.

#VE, and #VC on AMD, are borderline unusable.  Both under IDT and FRED.

The reason #VE is not IST is because there are plenty of real cases
where a non-malicious outer hypervisor could create reentrant faults
that lose program state.  e.g. hitting an IO instruction, then hitting
an emulated MSR.

There are fewer cases where a non-IST #VE ends up in a re-entrant fault
(IIRC, you can still manage it by unmapping the entry stack), but you're
still trusting the outer hypervisor to not e.g. unmap the SYSCALL entry
point.

FRED gets rid of the "reentrant fault overwriting it on the stack" case,
and removes the syscall gap case, replacing them instead with a stack
overflow in the worst case because there is still no upper bound to how
many times #VE can actually be delivered in the course of servicing a
single #VE.

~Andrew

P.S. While I hate to cite myself, if you haven't read
https://docs.google.com/document/d/1hWejnyDkjRRAW-JEsRjA5c9CKLOPc6VKJQsuvODlQEI/edit?usp=sharing
yet, do so.  It did feed into some of the FRED design.