[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b5624bba-9917-421b-8ef0-4515d442f80b@ghiti.fr>
Date: Tue, 19 Mar 2024 17:51:54 +0100
From: Alexandre Ghiti <alex@...ti.fr>
To: Samuel Holland <samuel.holland@...ive.com>,
Alexandre Ghiti <alexghiti@...osinc.com>
Cc: Palmer Dabbelt <palmer@...belt.com>, linux-riscv@...ts.infradead.org,
Albert Ou <aou@...s.berkeley.edu>, Andrew Morton
<akpm@...ux-foundation.org>, Charlie Jenkins <charlie@...osinc.com>,
Guo Ren <guoren@...nel.org>, Jisheng Zhang <jszhang@...nel.org>,
Kemeng Shi <shikemeng@...weicloud.com>,
"Matthew Wilcox (Oracle)" <willy@...radead.org>,
"Mike Rapoport (IBM)" <rppt@...nel.org>,
Paul Walmsley <paul.walmsley@...ive.com>, Xiao Wang <xiao.w.wang@...el.com>,
Yangyu Chen <cyy@...self.name>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] riscv: Define TASK_SIZE_MAX for __access_ok()
Hi Samuel,
On 18/03/2024 22:29, Samuel Holland wrote:
> Hi Alex,
>
> On 2024-03-18 3:50 PM, Alexandre Ghiti wrote:
>> On Wed, Mar 13, 2024 at 7:00 PM Samuel Holland
>> <samuel.holland@...ive.com> wrote:
>>> TASK_SIZE_MAX should be set to the largest userspace address under any
>>> runtime configuration. This optimizes the check in __access_ok(), which
>>> no longer needs to compute the current value of TASK_SIZE. It is still
>>> safe because addresses between TASK_SIZE and TASK_SIZE_MAX are invalid
>>> at the hardware level.
>>>
>>> This removes about half of the references to pgtable_l[45]_enabled.
>>>
>>> Signed-off-by: Samuel Holland <samuel.holland@...ive.com>
>>> ---
>>>
>>> arch/riscv/include/asm/pgtable-64.h | 1 +
>>> arch/riscv/include/asm/pgtable.h | 1 +
>>> 2 files changed, 2 insertions(+)
>>>
>>> diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h
>>> index b99bd66107a6..a677ef3c0fe2 100644
>>> --- a/arch/riscv/include/asm/pgtable-64.h
>>> +++ b/arch/riscv/include/asm/pgtable-64.h
>>> @@ -17,6 +17,7 @@ extern bool pgtable_l5_enabled;
>>> #define PGDIR_SHIFT_L4 39
>>> #define PGDIR_SHIFT_L5 48
>>> #define PGDIR_SIZE_L3 (_AC(1, UL) << PGDIR_SHIFT_L3)
>>> +#define PGDIR_SIZE_L5 (_AC(1, UL) << PGDIR_SHIFT_L5)
>>>
>>> #define PGDIR_SHIFT (pgtable_l5_enabled ? PGDIR_SHIFT_L5 : \
>>> (pgtable_l4_enabled ? PGDIR_SHIFT_L4 : PGDIR_SHIFT_L3))
>>> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
>>> index 6066822e7396..2032f8ac5fc5 100644
>>> --- a/arch/riscv/include/asm/pgtable.h
>>> +++ b/arch/riscv/include/asm/pgtable.h
>>> @@ -867,6 +867,7 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
>>> #ifdef CONFIG_64BIT
>>> #define TASK_SIZE_64 (PGDIR_SIZE * PTRS_PER_PGD / 2)
>>> #define TASK_SIZE_MIN (PGDIR_SIZE_L3 * PTRS_PER_PGD / 2)
>>> +#define TASK_SIZE_MAX (PGDIR_SIZE_L5 * PTRS_PER_PGD / 2)
>>>
>>> #ifdef CONFIG_COMPAT
>>> #define TASK_SIZE_32 (_AC(0x80000000, UL))
>>> --
>>> 2.43.1
>>>
>> I think you also need to change the check in handle_page_fault() by
>> using TASK_SIZE_MAX instead of TASK_SIZE, otherwise the fixup can't
>> happen (https://elixir.bootlin.com/linux/latest/source/arch/riscv/mm/fault.c#L273).
> It is not necessary to change that check in fault.c unless we expect to handle
> exceptions (outside of userspace access routines) for addresses between
> TASK_SIZE and TASK_SIZE_MAX.
Which I think could be the case if some code is only using the "new"
access_ok() without the uaccess routines (which is wrong yes, but such
code is at the origin of this check).
> It looks like the call to fixup_exception() [added
> in 416721ff05fd ("riscv, mm: Perform BPF exhandler fixup on page fault")] is
> only intended to catch null pointer dereferences. So making the change wouldn't
> have any functional impact, but it would still be a valid optimization.
>
>> Or I was wondering if it would not be better to do like x86 and use an
>> alternative, it would be more correct (even though I believe your
>> solution works)
>> https://elixir.bootlin.com/linux/latest/source/arch/x86/include/asm/page_64.h#L82.
> What would be the benefit of using an alternative? Any access to an address
> between TASK_SIZE and TASK_SIZE_MAX is guaranteed to generate a page fault, so
> the only benefit I see is returning -EFAULT slightly faster at the cost of
> applying a few hundred alternatives at boot. But it's possible I'm missing
> something.
The use of alternatives allows to return right away if the buffer is
beyond the usable user address space, and it's not just "slightly
faster" for some cases (a very large buffer with only a few bytes being
beyond the limit or someone could fault-in all the user pages and fail
very late...etc). access_ok() is here to guarantee that such situations
don't happen, so actually it makes more sense to use an alternative to
avoid that.
Alex
> Regards,
> Samuel
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@...ts.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Powered by blists - more mailing lists