[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6aa4a193-e596-9c01-6b36-0c25410ebb82@linux.alibaba.com>
Date: Wed, 2 Jun 2021 08:09:25 +0800
From: Lai Jiangshan <laijs@...ux.alibaba.com>
To: Steven Rostedt <rostedt@...dmis.org>,
Lai Jiangshan <jiangshanlai@...il.com>
Cc: linux-kernel@...r.kernel.org, Andy Lutomirski <luto@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>,
Juergen Gross <jgross@...e.com>,
"Peter Zijlstra (Intel)" <peterz@...radead.org>,
Al Viro <viro@...iv.linux.org.uk>,
Arvind Sankar <nivedita@...m.mit.edu>
Subject: Re: [RFC PATCH 1/4] x86/entry/nmi: Switch to the entry stack before
switching to the thread stack
On 2021/6/2 01:05, Steven Rostedt wrote:
> On Tue, 1 Jun 2021 14:52:14 +0800
> Lai Jiangshan <jiangshanlai@...il.com> wrote:
>
>> From: Lai Jiangshan <laijs@...ux.alibaba.com>
>>
>> Current kernel has no code to enforce data breakpoint not on the thread
>> stack. If there is any data breakpoint on the top area of the thread
>> stack, there might be problem.
>>
>> For example, when NMI hits on userspace in this setting, the code copies
>> the exception frame from the NMI stack to the thread stack and it will
>> cause #DB and after #DB is handled, the not yet copied portion on the
>> NMI stack is in danger of corruption because the NMI is unmasked.
>>
>> Stashing the exception frame on the entry stack before touching the
>> entry stack can fix the problem.
>>
>> Signed-off-by: Lai Jiangshan <laijs@...ux.alibaba.com>
>> ---
>> arch/x86/entry/entry_64.S | 22 ++++++++++++++++++++++
>> arch/x86/kernel/asm-offsets.c | 1 +
>> 2 files changed, 23 insertions(+)
>>
>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>> index a5f02d03c585..4190e668f346 100644
>> --- a/arch/x86/entry/entry_64.S
>> +++ b/arch/x86/entry/entry_64.S
>> @@ -1121,8 +1121,30 @@ SYM_CODE_START(asm_exc_nmi)
>> *
>> * We also must not push anything to the stack before switching
>> * stacks lest we corrupt the "NMI executing" variable.
>> + *
>> + * Before switching to the thread stack, it switches to the entry
>> + * stack first lest there is any data breakpoint in the thread
>> + * stack and the iret of #DB will cause NMI unmasked before
>> + * finishing switching.
>> */
>>
>> + /* Switch stack to entry stack */
>> + movq %rsp, %rdx
>> + addq $(+6*8 /* to NMI stack top */ \
>> + -EXCEPTION_STKSZ /* to NMI stack bottom */ \
>> + -CPU_ENTRY_AREA_nmi_stack /* to entry area */ \
>
> Just so that I understand this correctly. This "entry area" is not part
> of the NMI stack, but just at the bottom of it? That is, this part of
> the stack will never be touched by an NMI coming in from kernel space,
> correct?
This "entry area" is the pointer of current CPU's struct cpu_entry_area.
This instruction puts %rsp onto the top of the entry/trampoline stack
which is not touched by an NMI coming in from kernel space.
>
> -- Steve
>
>
>> + +CPU_ENTRY_AREA_entry_stack /* to entry stack bottom */\
>> + +SIZEOF_entry_stack /* to entry stack top */ \
>> + ), %rsp
>> +
>> + /* Stash exception frame and %rdx to entry stack */
>> + pushq 5*8(%rdx) /* pt_regs->ss */
>> + pushq 4*8(%rdx) /* pt_regs->rsp */
>> + pushq 3*8(%rdx) /* pt_regs->flags */
>> + pushq 2*8(%rdx) /* pt_regs->cs */
>> + pushq 1*8(%rdx) /* pt_regs->rip */
>> + pushq 0*8(%rdx) /* %rdx */
>> +
>> swapgs
>> cld
>> FENCE_SWAPGS_USER_ENTRY
>> diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
>> index ecd3fd6993d1..dfafa0c7e887 100644
>> --- a/arch/x86/kernel/asm-offsets.c
>> +++ b/arch/x86/kernel/asm-offsets.c
>> @@ -88,6 +88,7 @@ static void __used common(void)
>> OFFSET(CPU_ENTRY_AREA_entry_stack, cpu_entry_area, entry_stack_page);
>> DEFINE(SIZEOF_entry_stack, sizeof(struct entry_stack));
>> DEFINE(MASK_entry_stack, (~(sizeof(struct entry_stack) - 1)));
>> + OFFSET(CPU_ENTRY_AREA_nmi_stack, cpu_entry_area, estacks.NMI_stack);
>>
>> /* Offset for fields in tss_struct */
>> OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);
Powered by blists - more mailing lists