lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2cdc163a-12fc-49e1-ab87-bba6df0ae345@zytor.com>
Date: Wed, 27 Aug 2025 15:18:51 -0700
From: Xin Li <xin@...or.com>
To: Dave Hansen <dave.hansen@...el.com>, linux-kernel@...r.kernel.org,
        kvm@...r.kernel.org, linux-doc@...r.kernel.org
Cc: pbonzini@...hat.com, seanjc@...gle.com, corbet@....net, tglx@...utronix.de,
        mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com,
        x86@...nel.org, hpa@...or.com, luto@...nel.org, peterz@...radead.org,
        andrew.cooper3@...rix.com, chao.gao@...el.com, hch@...radead.org
Subject: Re: [PATCH v6 04/20] x86/cea: Export an API to get per CPU exception
 stacks for KVM to use

>> Suggested-by: Christoph Hellwig <hch@...radead.org>
>> Suggested-by: Dave Hansen <dave.hansen@...el.com>
> 
> Nit: I wouldn't use Suggested-by unless the person basically asked for
> the *entire* patch. Christoph and I were asking for specific bits of
> this, but neither of us asked for this patch as a whole.

I did it because the patch is almost rewritten to export accessors instead
of raw data, IOW, the way of doing it is completely changed.

But I will remove Suggested-by.

> 
>> diff --git a/arch/x86/coco/sev/sev-nmi.c b/arch/x86/coco/sev/sev-nmi.c
>> index d8dfaddfb367..73e34ad7a1a9 100644
>> --- a/arch/x86/coco/sev/sev-nmi.c
>> +++ b/arch/x86/coco/sev/sev-nmi.c
>> @@ -30,7 +30,7 @@ static __always_inline bool on_vc_stack(struct pt_regs *regs)
>>   	if (ip_within_syscall_gap(regs))
>>   		return false;
>>   
>> -	return ((sp >= __this_cpu_ist_bottom_va(VC)) && (sp < __this_cpu_ist_top_va(VC)));
>> +	return ((sp >= __this_cpu_ist_bottom_va(ESTACK_VC)) && (sp < __this_cpu_ist_top_va(ESTACK_VC)));
>>   }
> 
> This rename is one of those things that had me scratching my head for a
> minute. It wasn't obvious at _all_ why the VC=>ESTACK_VC "rename" is
> necessary.
> 
> This needs to have been mentioned in the changelog.
> 
> Better yet would have been to do this in a separate patch because a big
> chunk of this patch is just rename noise.

Sure, will do.

>> diff --git a/arch/x86/include/asm/cpu_entry_area.h b/arch/x86/include/asm/cpu_entry_area.h
>> index 462fc34f1317..8e17f0ca74e6 100644
>> --- a/arch/x86/include/asm/cpu_entry_area.h
>> +++ b/arch/x86/include/asm/cpu_entry_area.h
>> @@ -46,7 +46,7 @@ struct cea_exception_stacks {
>>    * The exception stack ordering in [cea_]exception_stacks
>>    */
>>   enum exception_stack_ordering {
>> -	ESTACK_DF,
>> +	ESTACK_DF = 0,
>>   	ESTACK_NMI,
>>   	ESTACK_DB,
>>   	ESTACK_MCE,
> 
> Is this really required? I thought the first enum was always 0? Is this
> just trying to ensure that ESTACKS_MEMBERS() defines a matching number
> of N_EXCEPTION_STACKS stacks?
> 
> If that's the case, shouldn't this be represented with a BUILD_BUG_ON()?

Will do BUILD_BUG_ON().

> 
>> @@ -58,18 +58,15 @@ enum exception_stack_ordering {
>>   #define CEA_ESTACK_SIZE(st)					\
>>   	sizeof(((struct cea_exception_stacks *)0)->st## _stack)
>>   
>> -#define CEA_ESTACK_BOT(ceastp, st)				\
>> -	((unsigned long)&(ceastp)->st## _stack)
>> -
>> -#define CEA_ESTACK_TOP(ceastp, st)				\
>> -	(CEA_ESTACK_BOT(ceastp, st) + CEA_ESTACK_SIZE(st))
>> -
>>   #define CEA_ESTACK_OFFS(st)					\
>>   	offsetof(struct cea_exception_stacks, st## _stack)
>>   
>>   #define CEA_ESTACK_PAGES					\
>>   	(sizeof(struct cea_exception_stacks) / PAGE_SIZE)
>>   
>> +extern unsigned long __this_cpu_ist_top_va(enum exception_stack_ordering stack);
>> +extern unsigned long __this_cpu_ist_bottom_va(enum exception_stack_ordering stack);
>> +
>>   #endif
>>   
>>   #ifdef CONFIG_X86_32
>> @@ -144,10 +141,4 @@ static __always_inline struct entry_stack *cpu_entry_stack(int cpu)
>>   	return &get_cpu_entry_area(cpu)->entry_stack_page.stack;
>>   }
>>   
>> -#define __this_cpu_ist_top_va(name)					\
>> -	CEA_ESTACK_TOP(__this_cpu_read(cea_exception_stacks), name)
>> -
>> -#define __this_cpu_ist_bottom_va(name)					\
>> -	CEA_ESTACK_BOT(__this_cpu_read(cea_exception_stacks), name)
>> -
>>   #endif
>> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
>> index 34a054181c4d..cb14919f92da 100644
>> --- a/arch/x86/kernel/cpu/common.c
>> +++ b/arch/x86/kernel/cpu/common.c
>> @@ -2307,12 +2307,12 @@ static inline void setup_getcpu(int cpu)
>>   static inline void tss_setup_ist(struct tss_struct *tss)
>>   {
>>   	/* Set up the per-CPU TSS IST stacks */
>> -	tss->x86_tss.ist[IST_INDEX_DF] = __this_cpu_ist_top_va(DF);
>> -	tss->x86_tss.ist[IST_INDEX_NMI] = __this_cpu_ist_top_va(NMI);
>> -	tss->x86_tss.ist[IST_INDEX_DB] = __this_cpu_ist_top_va(DB);
>> -	tss->x86_tss.ist[IST_INDEX_MCE] = __this_cpu_ist_top_va(MCE);
>> +	tss->x86_tss.ist[IST_INDEX_DF] = __this_cpu_ist_top_va(ESTACK_DF);
>> +	tss->x86_tss.ist[IST_INDEX_NMI] = __this_cpu_ist_top_va(ESTACK_NMI);
>> +	tss->x86_tss.ist[IST_INDEX_DB] = __this_cpu_ist_top_va(ESTACK_DB);
>> +	tss->x86_tss.ist[IST_INDEX_MCE] = __this_cpu_ist_top_va(ESTACK_MCE);
> 
> If you respin this, please vertically align these.

NP.

> 
>> +/*
>> + * FRED introduced new fields in the host-state area of the VMCS for
>> + * stack levels 1->3 (HOST_IA32_FRED_RSP[123]), each respectively
>> + * corresponding to per CPU stacks for #DB, NMI and #DF.  KVM must
>> + * populate these each time a vCPU is loaded onto a CPU.
>> + *
>> + * Called from entry code, so must be noinstr.
>> + */
>> +noinstr unsigned long __this_cpu_ist_top_va(enum exception_stack_ordering stack)
>> +{
>> +	unsigned long base = (unsigned long)&(__this_cpu_read(cea_exception_stacks)->DF_stack);
>> +	return base + EXCEPTION_STKSZ + stack * (EXCEPTION_STKSZ + PAGE_SIZE);
>> +}
>> +EXPORT_SYMBOL(__this_cpu_ist_top_va);
>> +
>> +noinstr unsigned long __this_cpu_ist_bottom_va(enum exception_stack_ordering stack)
>> +{
>> +	unsigned long base = (unsigned long)&(__this_cpu_read(cea_exception_stacks)->DF_stack);
>> +	return base + stack * (EXCEPTION_STKSZ + PAGE_SIZE);
>> +}
> 
> These are basically treating 'struct exception_stacks' like an array.
> There's no type safety or anything here. It's just an open-coded array
> access.
> 
> Also, starting with ->DF_stack is a bit goofy looking. It's not obvious
> (or enforced) that it is stack #0 or at the beginning of the structure.
> 
> Shouldn't we be _trying_ to make this look like:
> 
> 	struct cea_exception_stacks *s;
> 	s = __this_cpu_read(cea_exception_stacks);
> 
> 	return &s[stack_nr].stack;
> 
> ?
> 
> Where 'cea_exception_stacks' is an actual array:
> 
> 	struct cea_exception_stacks[N_EXCEPTION_STACKS];
> 
> which might need to be embedded in a larger structure to get the
> 'IST_top_guard' without wasting allocating space for an extra full stack.
> 

Good suggestion!

Thanks!
     Xin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ