[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJhGHyC0_1xJD2R03-NoRVpMXFTHR4v8CdzyJOZe_k0rdv=NfQ@mail.gmail.com>
Date: Sat, 18 Mar 2023 14:33:30 +0800
From: Lai Jiangshan <jiangshanlai@...il.com>
To: "H. Peter Anvin" <hpa@...or.com>
Cc: Xin Li <xin3.li@...el.com>, linux-kernel@...r.kernel.org,
x86@...nel.org, kvm@...r.kernel.org, tglx@...utronix.de,
mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com,
peterz@...radead.org, andrew.cooper3@...rix.com, seanjc@...gle.com,
pbonzini@...hat.com, ravi.v.shankar@...el.com
Subject: Re: [PATCH v5 22/34] x86/fred: FRED initialization code
On Sat, Mar 18, 2023 at 5:32 AM H. Peter Anvin <hpa@...or.com> wrote:
>
> On March 17, 2023 6:35:57 AM PDT, Lai Jiangshan <jiangshanlai@...il.com> wrote:
> >Hello
> >
> >
> >Comments in cpu_init_fred_exceptions() seem scarce for understanding.
> >
> >On Tue, Mar 7, 2023 at 11:07 AM Xin Li <xin3.li@...el.com> wrote:
> >
> >> +/*
> >> + * Initialize FRED on this CPU. This cannot be __init as it is called
> >> + * during CPU hotplug.
> >> + */
> >> +void cpu_init_fred_exceptions(void)
> >> +{
> >> + wrmsrl(MSR_IA32_FRED_CONFIG,
> >> + FRED_CONFIG_ENTRYPOINT(fred_entrypoint_user) |
> >> + FRED_CONFIG_REDZONE(8) | /* Reserve for CALL emulation */
> >> + FRED_CONFIG_INT_STKLVL(0));
> >
> >What is it about "Reserve for CALL emulation"?
> >
> >I guess it relates to X86_TRAP_BP. In entry_64.S:
> >
> > .if \vector == X86_TRAP_BP
> > /*
> > * If coming from kernel space, create a 6-word gap to allow the
> > * int3 handler to emulate a call instruction.
> > */
> >
> >> +
> >> + wrmsrl(MSR_IA32_FRED_STKLVLS,
> >> + FRED_STKLVL(X86_TRAP_DB, 1) |
> >> + FRED_STKLVL(X86_TRAP_NMI, 2) |
> >> + FRED_STKLVL(X86_TRAP_MC, 2) |
> >> + FRED_STKLVL(X86_TRAP_DF, 3));
> >
> >Why each exception here needs a stack level > 0?
> >Especially for X86_TRAP_DB and X86_TRAP_NMI.
> >
> >Why does or why does not X86_TRAP_VE have a stack level > 0?
> >
> >X86_TRAP_DF is the highest stack level, is it accidental
> >or deliberate?
> >
> >Thanks
> >Lai
> >
>
> Yes, the extra redzone space is there to allow for the call emulation without having to adjust the stack frame "manually".
>
> In theory we could enable it only while code patching is in progress, but that would probably just result in stack overflows becoming utterly impossible to debug as we have to consider the worst case.
>
> The purpose of separate stacks for NMI, #DB and #MC *in the kernel* (remember that user space faults are always taken on stack level 0) is to avoid overflowing the kernel stack. #DB in the kernel would imply the use of a kernel debugger.
Could you add it to the code, please? I think it can help other reviewers.
If there is no other concrete reason other than overflowing for
assigning NMI and #DB with a stack level > 0, #VE should also
be assigned with a stack level > 0, and #BP too. #VE can happen
anytime and anywhere, so it is subject to overflowing too.
>
>
> #DF is the highest level because a #DF means "something went wrong *while delivering an exception*." The number of cases for which that can happen with FRED is drastically reduced and basically amount to "the stack you pointed me to is broken."
>
> Thus, you basically always want to change stacks on #DF, which means it should be at the highest level.
Powered by blists - more mailing lists