[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aAp_87-Xr6gn_hD7@debug.ba.rivosinc.com>
Date: Thu, 24 Apr 2025 11:16:19 -0700
From: Deepak Gupta <debug@...osinc.com>
To: Radim Krčmář <rkrcmar@...tanamicro.com>
Cc: Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
"H. Peter Anvin" <hpa@...or.com>,
Andrew Morton <akpm@...ux-foundation.org>,
"Liam R. Howlett" <Liam.Howlett@...cle.com>,
Vlastimil Babka <vbabka@...e.cz>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Paul Walmsley <paul.walmsley@...ive.com>,
Palmer Dabbelt <palmer@...belt.com>,
Albert Ou <aou@...s.berkeley.edu>, Conor Dooley <conor@...nel.org>,
Rob Herring <robh@...nel.org>,
Krzysztof Kozlowski <krzk+dt@...nel.org>,
Arnd Bergmann <arnd@...db.de>,
Christian Brauner <brauner@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Oleg Nesterov <oleg@...hat.com>,
Eric Biederman <ebiederm@...ssion.com>, Kees Cook <kees@...nel.org>,
Jonathan Corbet <corbet@....net>, Shuah Khan <shuah@...nel.org>,
Jann Horn <jannh@...gle.com>, Conor Dooley <conor+dt@...nel.org>,
linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-mm@...ck.org, linux-riscv@...ts.infradead.org,
devicetree@...r.kernel.org, linux-arch@...r.kernel.org,
linux-doc@...r.kernel.org, linux-kselftest@...r.kernel.org,
alistair.francis@....com, richard.henderson@...aro.org,
jim.shu@...ive.com, andybnac@...il.com, kito.cheng@...ive.com,
charlie@...osinc.com, atishp@...osinc.com, evan@...osinc.com,
cleger@...osinc.com, alexghiti@...osinc.com,
samitolvanen@...gle.com, broonie@...nel.org,
rick.p.edgecombe@...el.com,
linux-riscv <linux-riscv-bounces@...ts.infradead.org>
Subject: Re: [PATCH v12 12/28] riscv: Implements arch agnostic shadow stack
prctls
On Thu, Apr 24, 2025 at 03:36:54PM +0200, Radim Krčmář wrote:
>2025-04-23T21:44:09-07:00, Deepak Gupta <debug@...osinc.com>:
>> On Thu, Apr 10, 2025 at 11:45:58AM +0200, Radim Krčmář wrote:
>>>2025-03-14T14:39:31-07:00, Deepak Gupta <debug@...osinc.com>:
>>>> diff --git a/arch/riscv/include/asm/usercfi.h b/arch/riscv/include/asm/usercfi.h
>>>> @@ -14,7 +15,8 @@ struct kernel_clone_args;
>>>> struct cfi_status {
>>>> unsigned long ubcfi_en : 1; /* Enable for backward cfi. */
>>>> - unsigned long rsvd : ((sizeof(unsigned long) * 8) - 1);
>>>> + unsigned long ubcfi_locked : 1;
>>>> + unsigned long rsvd : ((sizeof(unsigned long) * 8) - 2);
>>>
>>>The rsvd field shouldn't be necessary as the container for the bitfield
>>>is 'unsigned long' sized.
>>>
>>>Why don't we use bools here, though?
>>>It might produce a better binary and we're not hurting for struct size.
>>
>> If you remember one of the previous patch discussion, this goes into
>> `thread_info` Don't want to bloat it. Even if we end shoving into task_struct,
>> don't want to bloat that either. I can just convert it into bitmask if
>> bitfields are an eyesore here.
>
> "unsigned long rsvd : ((sizeof(unsigned long) * 8) - 2);"
>
>is an eyesore that defines exactly the same as the two lines alone
>
> unsigned long ubcfi_en : 1;
> unsigned long ubcfi_locked : 1;
>
>That one should be removed.
>
>If we have only 4 bits in 4/8 bytes, then bitfields do generate worse
>code than 4 bools and a 0/4 byte hole. The struct size stays the same.
>
>I don't care much about the switch to bools, though, because this code
>is not called often.
I'll remove the bitfields, have single `unsigned long cfi_control_state`
And do `#define RISCV_UBCFI_EN 1` and so on.
>
>>>> @@ -262,3 +292,83 @@ void shstk_release(struct task_struct *tsk)
>>>> +int arch_set_shadow_stack_status(struct task_struct *t, unsigned long status)
>>>> +{
>>>> + /* Request is to enable shadow stack and shadow stack is not enabled already */
>>>> + if (enable_shstk && !is_shstk_enabled(t)) {
>>>> + /* shadow stack was allocated and enable request again
>>>> + * no need to support such usecase and return EINVAL.
>>>> + */
>>>> + if (is_shstk_allocated(t))
>>>> + return -EINVAL;
>>>> +
>>>> + size = calc_shstk_size(0);
>>>> + addr = allocate_shadow_stack(0, size, 0, false);
>>>
>>>Why don't we use the userspace-allocated stack?
>>>
>>>I'm completely missing the design idea here... Userspace has absolute
>>>over the shadow stack pointer CSR, so we don't need to do much in Linux:
>>>
>>>1. interface to set up page tables with -W- PTE and
>>>2. interface to control senvcfg.SSE.
>>>
>>>Userspace can do the rest.
>>
>> Design is like following:
>>
>> When a user task wants to enable shadow stack for itself, it has to issue
>> a syscall to kernel (like this prctl). Now it can be done independently by
>> user task by first issuing `map_shadow_stack`, then asking kernel to light
>> up envcfg bit and eventually when return to usermode happens, it can write
>> to CSR. It is no different from doing all of the above together in single
>> `prctl` call. They are equivalent in that nature.
>>
>> Background is that x86 followed this because x86 had workloads/binaries/
>> functions with (deep)recursive functions and thus by default were forced
>> to always allocate shadow stack to be of the same size as data stack. To
>> reduce burden on userspace for determining and then allocating same size
>> (size of data stack) shadow stack, prctl would do the job of calculating
>> default shadow stack size (and reduce programming error in usermode). arm64
>> followed the suite. I don't want to find out what's the compatiblity issues
>> we will see and thus just following the suite (given that both approaches
>> are equivalent). Take a look at static `calc_shstk_size(unsigned long size)`.
>>
>> Coming back to your question of why not allowing userspace to manage its
>> own shadow stack. Answer is that it can manage its own shadow stack. If it
>> does, it just have to be aware of size its allocating for shadow stack.
>
>It's just that userspace cannot prevent allocation of the default stack
>when enabling it, which is the weird part to me.
>The allocate and enable syscalls could have been nicely composable.
>
>> There is already a patch series going on to manage this using clone3.
>> https://lore.kernel.org/all/20250408-clone3-shadow-stack-v15-4-3fa245c6e3be@kernel.org/
>
>A new ioctl does seem to solve most of the practical issues, thanks.
>
>> I fully expect green thread implementations in rust/go or swapcontext
>> based thread management doing this on their own.
>>
>> Current design is to ensure existing apps dont have to change a lot in
>> userspace and by default kernel gives compatibility. Anyone else wanting
>> to optimize the usage of shadow stack can do so with current design.
>
>Right, changing rlimit_stack around shadow stack allocation is not the
>most elegant way, but it does work.
>
>>>> +int arch_lock_shadow_stack_status(struct task_struct *task,
>>>> + unsigned long arg)
>>>> +{
>>>> + /* If shtstk not supported or not enabled on task, nothing to lock here */
>>>> + if (!cpu_supports_shadow_stack() ||
>>>> + !is_shstk_enabled(task) || arg != 0)
>>>> + return -EINVAL;
>>>
>>>The task might want to prevent shadow stack from being enabled?
>>
>> But Why would it want to do that? Task can simply not issue the prctl. There
>> are glibc tunables as well using which it can be disabled.
>
>The task might do it as some last resort to prevent a buggy code from
>enabling shadow stacks that would just crash. Or whatever complicated
>reason userspace can think of.
>
>It's more the other way around. I wonder why we're removing this option
>when we don't really care what userspace does to itself.
>I think it's complicating the kernel without an obvious gain.
It just feels wierd. There isn't anything like this for other features lit-up
via envcfg. Does hwprobe allow this on per-task basis? I'll look into it.
Powered by blists - more mailing lists