linux-kernel - RE: [PATCH v5 22/34] x86/fred: FRED initialization code

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <SA1PR11MB673408E50152DFAB60800DA0A8819@SA1PR11MB6734.namprd11.prod.outlook.com>
Date:   Tue, 21 Mar 2023 07:49:37 +0000
From:   "Li, Xin3" <xin3.li@...el.com>
To:     "andrew.cooper3@...rix.com" <andrew.cooper3@...rix.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Lai Jiangshan <jiangshanlai@...il.com>
CC:     "H. Peter Anvin" <hpa@...or.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "x86@...nel.org" <x86@...nel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "bp@...en8.de" <bp@...en8.de>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        "Christopherson,, Sean" <seanjc@...gle.com>,
        "pbonzini@...hat.com" <pbonzini@...hat.com>,
        "Shankar, Ravi V" <ravi.v.shankar@...el.com>
Subject: RE: [PATCH v5 22/34] x86/fred: FRED initialization code

> >>> If there is no other concrete reason other than overflowing for
> >>> assigning NMI and #DB with a stack level > 0, #VE should also be
> >>> assigned with a stack level > 0, and #BP too. #VE can happen anytime
> >>> and anywhere, so it is subject to overflowing too.
> >> So #BP needs the stack-gap (redzone) for text_poke_bp().
> >>
> >> #BP can end up in kprobes which can then end up in ftrace/perf, depending
> on
> >> how it's all wired up.
> >>
> >> #VE is currently a trainwreck vs NMI/MCE, but I think FRED solves the worst of
> >> that. I'm not exactly sure how deep the #VE handler goes.
> >>
> > VE under IDT is *not* using an IST, we need some solid rationales here.
> 
> #VE, and #VC on AMD, are borderline unusable.  Both under IDT and FRED.

Oops!

> The reason #VE is not IST is because there are plenty of real cases
> where a non-malicious outer hypervisor could create reentrant faults
> that lose program state.  e.g. hitting an IO instruction, then hitting
> an emulated MSR.
>
> There are fewer cases where a non-IST #VE ends up in a re-entrant fault
> (IIRC, you can still manage it by unmapping the entry stack), but you're
> still trusting the outer hypervisor to not e.g. unmap the SYSCALL entry
> point.
> 
> FRED gets rid of the "reentrant fault overwriting it on the stack" case,
> and removes the syscall gap case, replacing them instead with a stack
> overflow in the worst case because there is still no upper bound to how
> many times #VE can actually be delivered in the course of servicing a
> single #VE.

Exactly, FRED stack levels can make use of the whole regular stack space.

I guess you don't seem to support #VE on a higher stack level? 

> ~Andrew
> 
> P.S. While I hate to cite myself, if you haven't read
> https://docs.google.com/document/d/1hWejnyDkjRRAW-
> JEsRjA5c9CKLOPc6VKJQsuvODlQEI/edit?usp=sharing
> yet, do so.  It did feed into some of the FRED design.