[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zs+ZK6Q2U9dm19yR@ghost>
Date: Wed, 28 Aug 2024 14:39:55 -0700
From: Charlie Jenkins <charlie@...osinc.com>
To: "Liam R. Howlett" <Liam.Howlett@...cle.com>
Cc: Arnd Bergmann <arnd@...db.de>, Paul Walmsley <paul.walmsley@...ive.com>,
Palmer Dabbelt <palmer@...belt.com>,
Albert Ou <aou@...s.berkeley.edu>,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>,
Michael Ellerman <mpe@...erman.id.au>,
Nicholas Piggin <npiggin@...il.com>,
Christophe Leroy <christophe.leroy@...roup.eu>,
Naveen N Rao <naveen@...nel.org>,
Muchun Song <muchun.song@...ux.dev>,
Andrew Morton <akpm@...ux-foundation.org>,
Vlastimil Babka <vbabka@...e.cz>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
"H. Peter Anvin" <hpa@...or.com>,
Huacai Chen <chenhuacai@...nel.org>,
WANG Xuerui <kernel@...0n.name>,
Russell King <linux@...linux.org.uk>,
Thomas Bogendoerfer <tsbogend@...ha.franken.de>,
"James E.J. Bottomley" <James.Bottomley@...senpartnership.com>,
Helge Deller <deller@....de>,
Alexander Gordeev <agordeev@...ux.ibm.com>,
Gerald Schaefer <gerald.schaefer@...ux.ibm.com>,
Heiko Carstens <hca@...ux.ibm.com>,
Vasily Gorbik <gor@...ux.ibm.com>,
Christian Borntraeger <borntraeger@...ux.ibm.com>,
Sven Schnelle <svens@...ux.ibm.com>,
Yoshinori Sato <ysato@...rs.sourceforge.jp>,
Rich Felker <dalias@...c.org>,
John Paul Adrian Glaubitz <glaubitz@...sik.fu-berlin.de>,
"David S. Miller" <davem@...emloft.net>,
Andreas Larsson <andreas@...sler.com>,
Shuah Khan <shuah@...nel.org>,
Alexandre Ghiti <alexghiti@...osinc.com>,
linux-arch@...r.kernel.org, linux-kernel@...r.kernel.org,
Palmer Dabbelt <palmer@...osinc.com>,
linux-riscv@...ts.infradead.org,
linux-arm-kernel@...ts.infradead.org, linuxppc-dev@...ts.ozlabs.org,
linux-mm@...ck.org, loongarch@...ts.linux.dev,
linux-mips@...r.kernel.org, linux-parisc@...r.kernel.org,
linux-s390@...r.kernel.org, linux-sh@...r.kernel.org,
sparclinux@...r.kernel.org, linux-kselftest@...r.kernel.org
Subject: Re: [PATCH 00/16] mm: Introduce MAP_BELOW_HINT
On Wed, Aug 28, 2024 at 01:59:18PM -0700, Charlie Jenkins wrote:
> On Wed, Aug 28, 2024 at 02:31:42PM -0400, Liam R. Howlett wrote:
> > * Charlie Jenkins <charlie@...osinc.com> [240828 01:49]:
> > > Some applications rely on placing data in free bits addresses allocated
> > > by mmap. Various architectures (eg. x86, arm64, powerpc) restrict the
> > > address returned by mmap to be less than the maximum address space,
> > > unless the hint address is greater than this value.
> >
> > Wait, what arch(s) allows for greater than the max? The passed hint
> > should be where we start searching, but we go to the lower limit then
> > start at the hint and search up (or vice-versa on the directions).
> >
>
> I worded this awkwardly. On arm64 there is a page-table boundary at 48
> bits and at 52 bits. On x86 the boundaries are at 48 bits and 57 bits.
> The max value mmap is able to return on arm64 is 48 bits if the hint
> address uses 48 bits or less, even if the architecture supports 5-level
> paging and thus addresses can be 52 bits. Applications can opt-in to
> using up to 52-bits in an address by using a hint address greater than
> 48 bits. x86 has the same behavior but with 57 bits instead of 52.
>
> This reason this exists is because some applications arbitrarily replace
> bits in virtual addresses with data with an assumption that the address
> will not be using any of the bits above bit 48 in the virtual address.
> As hardware with larger address spaces was released, x86 decided to
> build safety guards into the kernel to allow the applications that made
> these assumptions to continue to work on this different hardware.
>
> This causes all application that use a hint address to silently be
> restricted to 48-bit addresses. The goal of this flag is to have a way
> for applications to explicitly request how many bits they want mmap to
> use.
>
> > I don't understand how unmapping works on a higher address; we would
> > fail to free it on termination of the application.
> >
> > Also, there are archs that map outside of the VMAs, which are freed by
> > freeing from the prev->vm_end to next->vm_start, so I don't understand
> > what that looks like in this reality as well.
> >
> > >
> > > On arm64 this barrier is at 52 bits and on x86 it is at 56 bits. This
> > > flag allows applications a way to specify exactly how many bits they
> > > want to be left unused by mmap. This eliminates the need for
> > > applications to know the page table hierarchy of the system to be able
> > > to reason which addresses mmap will be allowed to return.
> >
> > But, why do they need to know today? We have a limit for this don't we?
>
> The limit is different for different architectures. On x86 the limit is
> 57 bits, and on arm64 it is 52 bits. So in the theoretical case that an
> application requires 10 bits free in a virtual address, the application
> would always work on arm64 regardless of the hint address, but on x86 if
> the hint address is greater than 48 bits then the application will not
> work.
>
> The goal of this flag is to have consistent and tunable behavior of
> mmap() when it is desired to ensure that mmap() only returns addresses
> that use some number of bits.
>
> >
> > Also, these upper limits are how some archs use the upper bits that you
> > are trying to use.
> >
>
> It does not eliminate the existing behavior of the architectures to
> place this upper limits, it instead provides a way to have consistent
> behavior across all architectures.
>
> > >
> > > ---
> > > riscv made this feature of mmap returning addresses less than the hint
> > > address the default behavior. This was in contrast to the implementation
> > > of x86/arm64 that have a single boundary at the 5-level page table
> > > region. However this restriction proved too great -- the reduced
> > > address space when using a hint address was too small.
> >
> > Yes, the hint is used to group things close together so it would
> > literally be random chance on if you have enough room or not (aslr and
> > all).
> >
> > >
> > > A patch for riscv [1] reverts the behavior that broke userspace. This
> > > series serves to make this feature available to all architectures.
> >
> > I don't fully understand this statement, you say it broke userspace so
> > now you are porting it to everyone? This reads as if you are braking
> > the userspace on all architectures :)
>
> It was the default for mmap on riscv. The difference here is that it is now
> enabled by a flag instead. Instead of making the flag specific to riscv,
> I figured that other architectures might find it useful as well.
>
> >
> > If you fail to find room below, then your application fails as there is
> > no way to get the upper bits you need. It would be better to fix this
> > in userspace - if your application is returned too high an address, then
> > free it and exit because it's going to fail anyways.
> >
>
> This flag is trying to define an API that is more robust than the
> current behavior on that x86 and arm64 which implicitly restricts mmap()
> addresses to 48 bits. A solution could be to just write in the docs that
> mmap() will always exhaust all addresses below the hint address before
> returning an address that is above the hint address. However a flag that
> defines this behavior seems more intuitive.
>
> > >
> > > I have only tested on riscv and x86.
> >
> > This should be an RFC then.
>
> Fair enough.
>
> >
> > > There is a tremendous amount of
> > > duplicated code in mmap so the implementations across architectures I
> > > believe should be mostly consistent. I added this feature to all
> > > architectures that implement either
> > > arch_get_mmap_end()/arch_get_mmap_base() or
> > > arch_get_unmapped_area_topdown()/arch_get_unmapped_area(). I also added
> > > it to the default behavior for arch_get_mmap_end()/arch_get_mmap_base().
> >
> > Way too much duplicate code. We should be figuring out how to make this
> > all work with the same code.
> >
> > This is going to make the cloned code problem worse.
>
> That would require standardizing every architecture with the generic
> mmap() framework that arm64 has developed. That is far outside the scope
> of this patch, but would be a great area to research for each of the
> architectures that do not use the generic framework.
Thinking about this again, I could drop support for all architectures
that do not implement arch_get_mmap_base()/arch_get_mmap_end().
>
> - Charlie
>
> >
> > >
> > > Link: https://lore.kernel.org/lkml/20240826-riscv_mmap-v1-2-cd8962afe47f@rivosinc.com/T/ [1]
> > >
> > > To: Arnd Bergmann <arnd@...db.de>
> > > To: Paul Walmsley <paul.walmsley@...ive.com>
> > > To: Palmer Dabbelt <palmer@...belt.com>
> > > To: Albert Ou <aou@...s.berkeley.edu>
> > > To: Catalin Marinas <catalin.marinas@....com>
> > > To: Will Deacon <will@...nel.org>
> > > To: Michael Ellerman <mpe@...erman.id.au>
> > > To: Nicholas Piggin <npiggin@...il.com>
> > > To: Christophe Leroy <christophe.leroy@...roup.eu>
> > > To: Naveen N Rao <naveen@...nel.org>
> > > To: Muchun Song <muchun.song@...ux.dev>
> > > To: Andrew Morton <akpm@...ux-foundation.org>
> > > To: Liam R. Howlett <Liam.Howlett@...cle.com>
> > > To: Vlastimil Babka <vbabka@...e.cz>
> > > To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> > > To: Thomas Gleixner <tglx@...utronix.de>
> > > To: Ingo Molnar <mingo@...hat.com>
> > > To: Borislav Petkov <bp@...en8.de>
> > > To: Dave Hansen <dave.hansen@...ux.intel.com>
> > > To: x86@...nel.org
> > > To: H. Peter Anvin <hpa@...or.com>
> > > To: Huacai Chen <chenhuacai@...nel.org>
> > > To: WANG Xuerui <kernel@...0n.name>
> > > To: Russell King <linux@...linux.org.uk>
> > > To: Thomas Bogendoerfer <tsbogend@...ha.franken.de>
> > > To: James E.J. Bottomley <James.Bottomley@...senPartnership.com>
> > > To: Helge Deller <deller@....de>
> > > To: Alexander Gordeev <agordeev@...ux.ibm.com>
> > > To: Gerald Schaefer <gerald.schaefer@...ux.ibm.com>
> > > To: Heiko Carstens <hca@...ux.ibm.com>
> > > To: Vasily Gorbik <gor@...ux.ibm.com>
> > > To: Christian Borntraeger <borntraeger@...ux.ibm.com>
> > > To: Sven Schnelle <svens@...ux.ibm.com>
> > > To: Yoshinori Sato <ysato@...rs.sourceforge.jp>
> > > To: Rich Felker <dalias@...c.org>
> > > To: John Paul Adrian Glaubitz <glaubitz@...sik.fu-berlin.de>
> > > To: David S. Miller <davem@...emloft.net>
> > > To: Andreas Larsson <andreas@...sler.com>
> > > To: Shuah Khan <shuah@...nel.org>
> > > To: Alexandre Ghiti <alexghiti@...osinc.com>
> > > Cc: linux-arch@...r.kernel.org
> > > Cc: linux-kernel@...r.kernel.org
> > > Cc: Palmer Dabbelt <palmer@...osinc.com>
> > > Cc: linux-riscv@...ts.infradead.org
> > > Cc: linux-arm-kernel@...ts.infradead.org
> > > Cc: linuxppc-dev@...ts.ozlabs.org
> > > Cc: linux-mm@...ck.org
> > > Cc: loongarch@...ts.linux.dev
> > > Cc: linux-mips@...r.kernel.org
> > > Cc: linux-parisc@...r.kernel.org
> > > Cc: linux-s390@...r.kernel.org
> > > Cc: linux-sh@...r.kernel.org
> > > Cc: sparclinux@...r.kernel.org
> > > Cc: linux-kselftest@...r.kernel.org
> > > Signed-off-by: Charlie Jenkins <charlie@...osinc.com>
> > >
> > > ---
> > > Charlie Jenkins (16):
> > > mm: Add MAP_BELOW_HINT
> > > riscv: mm: Do not restrict mmap address based on hint
> > > mm: Add flag and len param to arch_get_mmap_base()
> > > mm: Add generic MAP_BELOW_HINT
> > > riscv: mm: Support MAP_BELOW_HINT
> > > arm64: mm: Support MAP_BELOW_HINT
> > > powerpc: mm: Support MAP_BELOW_HINT
> > > x86: mm: Support MAP_BELOW_HINT
> > > loongarch: mm: Support MAP_BELOW_HINT
> > > arm: mm: Support MAP_BELOW_HINT
> > > mips: mm: Support MAP_BELOW_HINT
> > > parisc: mm: Support MAP_BELOW_HINT
> > > s390: mm: Support MAP_BELOW_HINT
> > > sh: mm: Support MAP_BELOW_HINT
> > > sparc: mm: Support MAP_BELOW_HINT
> > > selftests/mm: Create MAP_BELOW_HINT test
> > >
> > > arch/arm/mm/mmap.c | 10 ++++++++
> > > arch/arm64/include/asm/processor.h | 34 ++++++++++++++++++++++----
> > > arch/loongarch/mm/mmap.c | 11 +++++++++
> > > arch/mips/mm/mmap.c | 9 +++++++
> > > arch/parisc/include/uapi/asm/mman.h | 1 +
> > > arch/parisc/kernel/sys_parisc.c | 9 +++++++
> > > arch/powerpc/include/asm/task_size_64.h | 36 +++++++++++++++++++++++-----
> > > arch/riscv/include/asm/processor.h | 32 -------------------------
> > > arch/s390/mm/mmap.c | 10 ++++++++
> > > arch/sh/mm/mmap.c | 10 ++++++++
> > > arch/sparc/kernel/sys_sparc_64.c | 8 +++++++
> > > arch/x86/kernel/sys_x86_64.c | 25 ++++++++++++++++---
> > > fs/hugetlbfs/inode.c | 2 +-
> > > include/linux/sched/mm.h | 34 ++++++++++++++++++++++++--
> > > include/uapi/asm-generic/mman-common.h | 1 +
> > > mm/mmap.c | 2 +-
> > > tools/arch/parisc/include/uapi/asm/mman.h | 1 +
> > > tools/include/uapi/asm-generic/mman-common.h | 1 +
> > > tools/testing/selftests/mm/Makefile | 1 +
> > > tools/testing/selftests/mm/map_below_hint.c | 29 ++++++++++++++++++++++
> > > 20 files changed, 216 insertions(+), 50 deletions(-)
> > > ---
> > > base-commit: 5be63fc19fcaa4c236b307420483578a56986a37
> > > change-id: 20240827-patches-below_hint_mmap-b13d79ae1c55
> > > --
> > > - Charlie
> > >
Powered by blists - more mailing lists