[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <34600873-3716-eedd-84d0-dada88dc1a70@intel.com>
Date: Tue, 10 Jan 2023 17:01:59 -0800
From: "Chen, Yian" <yian.chen@...el.com>
To: Peter Zijlstra <peterz@...radead.org>
CC: <linux-kernel@...r.kernel.org>, <x86@...nel.org>,
Andy Lutomirski <luto@...nel.org>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Ravi Shankar <ravi.v.shankar@...el.com>,
Tony Luck <tony.luck@...el.com>,
Sohil Mehta <sohil.mehta@...el.com>,
Paul Lai <paul.c.lai@...el.com>
Subject: Re: [PATCH 3/7] x86/cpu: Disable kernel LASS when patching kernel
alternatives
On 1/10/2023 1:04 PM, Peter Zijlstra wrote:
> On Mon, Jan 09, 2023 at 09:52:00PM -0800, Yian Chen wrote:
>> Most of the kernel is mapped at virtual addresses
>> in the upper half of the address range. But kernel
>> deliberately initialized a temporary mm area
>> within the lower half of the address range
>> for text poking, see commit 4fc19708b165
>> ("x86/alternatives: Initialize temporary mm
>> for patching").
>>
>> LASS stops access to a lower half address in kernel,
>> and this can be deactivated if AC bit in EFLAGS
>> register is set. Hence use stac and clac instructions
>> around access to the address to avoid triggering a
>> LASS #GP fault.
>>
>> Kernel objtool validation warns if the binary calls
>> to a non-whitelisted function that exists outside of
>> the stac/clac guard, or references any function with a
>> dynamic function pointer inside the guard; see section
>> 9 in the document tools/objtool/Documentation/objtool.txt.
>>
>> For these reasons, also considering text poking size is
>> usually small, simple modifications have been done
>> in function text_poke_memcpy() and text_poke_memset() to
>> avoid non-whitelisted function calls inside the stac/clac
>> guard.
>>
>> Gcc may detect and replace the target with its built-in
>> functions. However, the replacement would break the
>> objtool validation criteria. Hence, add compiler option
>> -fno-builtin for the file.
>
> Please reflow to 72 characters consistently, this is silly.
>
Sure. I will format the commit msg guideline.
>> Co-developed-by: Tony Luck <tony.luck@...el.com>
>> Signed-off-by: Tony Luck <tony.luck@...el.com>
>> Signed-off-by: Yian Chen <yian.chen@...el.com>
>> ---
>> arch/x86/include/asm/smap.h | 13 +++++++++++++
>> arch/x86/kernel/Makefile | 2 ++
>> arch/x86/kernel/alternative.c | 21 +++++++++++++++++++--
>> tools/objtool/arch/x86/special.c | 2 ++
>> 4 files changed, 36 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
>> index bab490379c65..6f7ac0839b10 100644
>> --- a/arch/x86/include/asm/smap.h
>> +++ b/arch/x86/include/asm/smap.h
>> @@ -39,6 +39,19 @@ static __always_inline void stac(void)
>> alternative("", __ASM_STAC, X86_FEATURE_SMAP);
>> }
>>
>> +/* Deactivate/activate LASS via AC bit in EFLAGS register */
>> +static __always_inline void low_addr_access_begin(void)
>> +{
>> + /* Note: a barrier is implicit in alternative() */
>> + alternative("", __ASM_STAC, X86_FEATURE_LASS);
>> +}
>> +
>> +static __always_inline void low_addr_access_end(void)
>> +{
>> + /* Note: a barrier is implicit in alternative() */
>> + alternative("", __ASM_CLAC, X86_FEATURE_LASS);
>> +}
>
> Can't say I like the name.
Indeed, there are alternative ways to name the functions. for example,
enable_kernel_lass()/disable_kernel_lass(), or simply keep no change to
use stac()/clac().
I choose this name because it is straight forward to the purpose and
helps in understanding when to use the functions.
Also if you look at bit 63 as a sign bit,
> it's actively wrong since -1 is lower than 0.
>This could be a trade-off choice. While considering address manipulation
and calculation, it is likely an unsigned. I would be happy to get input
for better naming.
>> +
>> static __always_inline unsigned long smap_save(void)
>> {
>> unsigned long flags;
>> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
>> index 96d51bbc2bd4..f8a455fc56a2 100644
>> --- a/arch/x86/kernel/Makefile
>> +++ b/arch/x86/kernel/Makefile
>> @@ -7,6 +7,8 @@ extra-y += vmlinux.lds
>>
>> CPPFLAGS_vmlinux.lds += -U$(UTS_MACHINE)
>>
>> +CFLAGS_alternative.o += -fno-builtin
>> +
>> ifdef CONFIG_FUNCTION_TRACER
>> # Do not profile debug and lowlevel utilities
>> CFLAGS_REMOVE_tsc.o = -pg
>> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
>> index 7d8c3cbde368..4de8b54fb5f2 100644
>> --- a/arch/x86/kernel/alternative.c
>> +++ b/arch/x86/kernel/alternative.c
>> @@ -1530,14 +1530,31 @@ __ro_after_init unsigned long poking_addr;
>>
>> static void text_poke_memcpy(void *dst, const void *src, size_t len)
>> {
>> - memcpy(dst, src, len);
>> + const char *s = src;
>> + char *d = dst;
>> +
>> + /* The parameter dst ends up referencing to the global variable
>> + * poking_addr, which is mapped to the low half address space.
>> + * In kernel, accessing the low half address range is prevented
>> + * by LASS. So relax LASS prevention while accessing the memory
>> + * range.
>> + */
>> + low_addr_access_begin();
>> + while (len-- > 0)
>> + *d++ = *s++;
>> + low_addr_access_end();
>> }
>>
>> static void text_poke_memset(void *dst, const void *src, size_t len)
>> {
>> int c = *(const int *)src;
>> + char *d = dst;
>>
>> - memset(dst, c, len);
>> + /* The same comment as it is in function text_poke_memcpy */
>> + low_addr_access_begin();
>> + while (len-- > 0)
>> + *d++ = c;
>> + low_addr_access_end();
>> }
>
> This is horrific tinkering :/
>
This part seems difficult to have a perfect solution since function call
or function pointer inside the guard of instruction stac and clac will
trigger objtool warning (stated the reasons in the commit msg)
To avoid the warning, I considered this might be okay since the poking
text usually seems a few bytes.
> Also, what about the EFI mm? IIRC EFI also lives in the user address
> space.
I didn't encounter EFI mm related problem while I tested the
implementation. I will update you later after I investigate more around
the EFI mm.
Thanks,
Yian
Powered by blists - more mailing lists