[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <757b0f1c-b8ff-4a1a-8edc-8dc651a348fb@sifive.com>
Date: Mon, 25 Mar 2024 14:20:46 -0500
From: Samuel Holland <samuel.holland@...ive.com>
To: Mark Rutland <mark.rutland@....com>, Arnd Bergmann <arnd@...db.de>
Cc: Alexandre Ghiti <alex@...ti.fr>, David Laight <David.Laight@...lab.com>,
Alexandre Ghiti <alexghiti@...osinc.com>, Palmer Dabbelt
<palmer@...belt.com>,
"linux-riscv@...ts.infradead.org" <linux-riscv@...ts.infradead.org>,
Albert Ou <aou@...s.berkeley.edu>, Andrew Morton
<akpm@...ux-foundation.org>, Charlie Jenkins <charlie@...osinc.com>,
guoren <guoren@...nel.org>, Jisheng Zhang <jszhang@...nel.org>,
Kemeng Shi <shikemeng@...weicloud.com>, Matthew Wilcox
<willy@...radead.org>, Mike Rapoport <rppt@...nel.org>,
Paul Walmsley <paul.walmsley@...ive.com>, Xiao W Wang
<xiao.w.wang@...el.com>, Yangyu Chen <cyy@...self.name>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] riscv: Define TASK_SIZE_MAX for __access_ok()
On 2024-03-25 1:30 PM, Mark Rutland wrote:
> On Mon, Mar 25, 2024 at 07:02:13PM +0100, Arnd Bergmann wrote:
>> On Mon, Mar 25, 2024, at 17:39, Mark Rutland wrote:
>>
>>> Using a compile-time constant TASK_SIZE_MAX allows the compiler to generate
>>> much better code for access_ok(), and on arm64 we use a compile-time constant
>>> even when our page table depth can change at runtime (and when native/compat
>>> task sizes differ). The only abosolute boundary that needs to be maintained is
>>> that access_ok() fails for kernel addresses.
>>
>> As I understand, this works on arm64 and x86 because the kernel
>> mapping starts on negative 64-bit addresses, so the highest user
>> address (TASK_SIZE = 0x000fffffffffffff) is still smaller than the
>> lowest kernel address (PAGE_OFFSET = 0xfff0000000000000).
>
> Yep; the highest posible user address is always below the lowest possible
> kernel address, and any "non-canonical" address between the two ranges faults.
> There's some fun with TBI (Top Byte Ignore) and MTE, but that only affects how
> to mangle the pointer before the check, and doesn't affect the definition of
> the VA boundary.
>
> In general, so long as TASK_SIZE_MAX is <= the lowest possible kernel address
> and TASK_SIZE_MAX > the highest possible user address, it all works out.
>
>> If an architecture ignores all the top bits of a virtual address,
>> the largest TASK_SIZE would be higher than the smallest (positive,
>> unsigned) PAGE_OFFSET, so you need TASK_SIZE_MAX to be dynamic.
>
> Agreed, but do we even support such architectures within Linux?
>
>> It doesn't look like this is the case on riscv, but I'm not sure
>> about this part.
>
> It looks like riscv is in the same bucket as arm64 and x86 per:
>
> https://www.kernel.org/doc/html/next/riscv/vm-layout.html
>
> ... which says:
>
> | The RISC-V privileged architecture document states that the 64bit addresses
> | "must have bits 63-48 all equal to bit 47, or else a page-fault exception
> | will occur.": that splits the virtual address space into 2 halves separated
> | by a very big hole, the lower half is where the userspace resides, the upper
> | half is where the RISC-V Linux Kernel resides.
Right, and while RISC-V has a pointer masking extension[1] similar to arm64's
TBI, it will be handled[2] the same way: by sign extending the address prior to
checking against TASK_SIZE_MAX. So we maintain the property that userspace
addresses are always "positive" and kernel addresses are always "negative".
Regards,
Samuel
[1]: https://github.com/riscv/riscv-j-extension/raw/a1e68469c60/zjpm-spec.pdf
[2]:
https://lore.kernel.org/linux-riscv/20240319215915.832127-1-samuel.holland@sifive.com/
Powered by blists - more mailing lists