[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZowD3LQT_KTz2g4X@J2N7QTR9R3>
Date: Mon, 8 Jul 2024 16:21:00 +0100
From: Mark Rutland <mark.rutland@....com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Will Deacon <will@...nel.org>, Arnd Bergmann <arnd@...db.de>,
Catalin Marinas <catalin.marinas@....com>,
Jisheng Zhang <jszhang@...nel.org>,
Paul Walmsley <paul.walmsley@...ive.com>,
Palmer Dabbelt <palmer@...belt.com>,
Albert Ou <aou@...s.berkeley.edu>,
Linux-Arch <linux-arch@...r.kernel.org>,
linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/4] riscv: uaccess: optimizations
On Fri, Jul 05, 2024 at 10:58:29AM -0700, Linus Torvalds wrote:
> On Fri, 5 Jul 2024 at 04:25, Will Deacon <will@...nel.org> wrote:
> >
> > we'd probably want to use an address that lives between the two TTBRs
> > (i.e. in the "guard region" you mentioned above), just in case somebody
> > has fscked around with /proc/sys/vm/mmap_min_addr.
>
> Yes, I don't want to use a NULL pointer and rely on mmap_min_addr.
>
> For x86-64, we have two "guard regions" that can be used to generate
> an address that is guaranteed to fault:
>
> - the kernel always lives in the "top bit set" part of the address
> space (and any address tagging bits don't touch that part), and does
> not map the highest virtual address because that's used for error
> pointers, so the "all bits set" address always faults
The same should be true on arm64, though I'm not immediately sure if we
explicitly reserve that VA region -- if we don't, then we should.
> - the region between valid user addresses and kernel addresses is
> also always going to fault, and we don't have them adjacent to each
> other (unlike, for example, 32-bit i386, where the kernel address
> space is directly adjacent to the top of user addresses)
Today we have a gap between the TTBR0 and TTBR1 VA ranges in all
configurations, but in future (with the new FEAT_D128 page table format)
we will have configurations where there's no gap between the two ranges.
> So on x86-64, the simple solution is to just say "we know if the top
> bit is clear, it cannot ever touch kernel code, and if the top bit is
> set we have to make the address fault". So just duplicating the top
> bit (with an arithmetic shift) and or'ing it with the low bits, we get
> exactly what we want.
>
> But my knowledge of arm64 is weak enough that while I am reading
> assembly language and I know that instead of the top bit, it's bit55,
> I don't know what the actual rules for the translation table registers
> are.
>
> If the all-bits-set address is guaranteed to always trap, then arm64
> could just use the same thing x86 does (just duplicating bit 55
> instead of the sign bit)?
I think something of that shape can work (see below). There are a couple
of things that make using all-ones unsafe:
1) Non-faulting parts of a misaligned load/store can occur *before* the
fault is raised. If you have two pages where one of which is writable
and the other of which is not writeable (in either order), a store
which straddles those pages can write to the writeable page before
raising a fault on the non-writeable page.
I've seen this behaviour on real HW, and IIUC this is fairly common.
2) Loads/stores which wrap past 0xFFFF_FFFF_FFFF_FFFF access bytes at
UNKNOWN addresses. An N-byte store at 0xFFFF_FFFF_FFFF_FFFF may write
to N-1 bytes at an arbitrary address which is not
0x0000_0000_0000_0000.
In the latest ARM ARM (K.a), this is described tersely in section
K1.2.9 "Out of range virtual address".
That can be found at:
https://developer.arm.com/documentation/ddi0487/ka/?lang=en
I'm aware of implementation styles where that address is not zero and
can be a TTBR1 (kernel) address.
Given that, we'd need to avoid all-ones, but provided we know that the
first access using the pointer will be limited to PAGE_SIZE bytes past
the pointer, we could round down the bad pointer to be somewhere within
the error pointer page, e.g.
SBFX <mask>, <ptr>, #55, #1
ORR <ptr>, <ptr>, <mask>
BIC <ptr>, <ptr>, <mask>, lsr #(64 - PAGE_SHIFT)
That last `BIC` instructions is "BIt Clear" AKA "AND NOT". When bit 55
is one that will clear the lower bits to round down to a page boundary,
and when bit 55 is zero it will have no effect (as it'll be an AND with
all-ones).
Thanks,
Mark.
Powered by blists - more mailing lists