lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zs+ZK6Q2U9dm19yR@ghost>
Date: Wed, 28 Aug 2024 14:39:55 -0700
From: Charlie Jenkins <charlie@...osinc.com>
To: "Liam R. Howlett" <Liam.Howlett@...cle.com>
Cc: Arnd Bergmann <arnd@...db.de>, Paul Walmsley <paul.walmsley@...ive.com>,
	Palmer Dabbelt <palmer@...belt.com>,
	Albert Ou <aou@...s.berkeley.edu>,
	Catalin Marinas <catalin.marinas@....com>,
	Will Deacon <will@...nel.org>,
	Michael Ellerman <mpe@...erman.id.au>,
	Nicholas Piggin <npiggin@...il.com>,
	Christophe Leroy <christophe.leroy@...roup.eu>,
	Naveen N Rao <naveen@...nel.org>,
	Muchun Song <muchun.song@...ux.dev>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Vlastimil Babka <vbabka@...e.cz>,
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
	Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
	"H. Peter Anvin" <hpa@...or.com>,
	Huacai Chen <chenhuacai@...nel.org>,
	WANG Xuerui <kernel@...0n.name>,
	Russell King <linux@...linux.org.uk>,
	Thomas Bogendoerfer <tsbogend@...ha.franken.de>,
	"James E.J. Bottomley" <James.Bottomley@...senpartnership.com>,
	Helge Deller <deller@....de>,
	Alexander Gordeev <agordeev@...ux.ibm.com>,
	Gerald Schaefer <gerald.schaefer@...ux.ibm.com>,
	Heiko Carstens <hca@...ux.ibm.com>,
	Vasily Gorbik <gor@...ux.ibm.com>,
	Christian Borntraeger <borntraeger@...ux.ibm.com>,
	Sven Schnelle <svens@...ux.ibm.com>,
	Yoshinori Sato <ysato@...rs.sourceforge.jp>,
	Rich Felker <dalias@...c.org>,
	John Paul Adrian Glaubitz <glaubitz@...sik.fu-berlin.de>,
	"David S. Miller" <davem@...emloft.net>,
	Andreas Larsson <andreas@...sler.com>,
	Shuah Khan <shuah@...nel.org>,
	Alexandre Ghiti <alexghiti@...osinc.com>,
	linux-arch@...r.kernel.org, linux-kernel@...r.kernel.org,
	Palmer Dabbelt <palmer@...osinc.com>,
	linux-riscv@...ts.infradead.org,
	linux-arm-kernel@...ts.infradead.org, linuxppc-dev@...ts.ozlabs.org,
	linux-mm@...ck.org, loongarch@...ts.linux.dev,
	linux-mips@...r.kernel.org, linux-parisc@...r.kernel.org,
	linux-s390@...r.kernel.org, linux-sh@...r.kernel.org,
	sparclinux@...r.kernel.org, linux-kselftest@...r.kernel.org
Subject: Re: [PATCH 00/16] mm: Introduce MAP_BELOW_HINT

On Wed, Aug 28, 2024 at 01:59:18PM -0700, Charlie Jenkins wrote:
> On Wed, Aug 28, 2024 at 02:31:42PM -0400, Liam R. Howlett wrote:
> > * Charlie Jenkins <charlie@...osinc.com> [240828 01:49]:
> > > Some applications rely on placing data in free bits addresses allocated
> > > by mmap. Various architectures (eg. x86, arm64, powerpc) restrict the
> > > address returned by mmap to be less than the maximum address space,
> > > unless the hint address is greater than this value.
> > 
> > Wait, what arch(s) allows for greater than the max?  The passed hint
> > should be where we start searching, but we go to the lower limit then
> > start at the hint and search up (or vice-versa on the directions).
> > 
> 
> I worded this awkwardly. On arm64 there is a page-table boundary at 48
> bits and at 52 bits. On x86 the boundaries are at 48 bits and 57 bits.
> The max value mmap is able to return on arm64 is 48 bits if the hint
> address uses 48 bits or less, even if the architecture supports 5-level
> paging and thus addresses can be 52 bits. Applications can opt-in to
> using up to 52-bits in an address by using a hint address greater than
> 48 bits. x86 has the same behavior but with 57 bits instead of 52.
> 
> This reason this exists is because some applications arbitrarily replace
> bits in virtual addresses with data with an assumption that the address
> will not be using any of the bits above bit 48 in the virtual address.
> As hardware with larger address spaces was released, x86 decided to
> build safety guards into the kernel to allow the applications that made
> these assumptions to continue to work on this different hardware.
> 
> This causes all application that use a hint address to silently be
> restricted to 48-bit addresses. The goal of this flag is to have a way
> for applications to explicitly request how many bits they want mmap to
> use.
> 
> > I don't understand how unmapping works on a higher address; we would
> > fail to free it on termination of the application.
> > 
> > Also, there are archs that map outside of the VMAs, which are freed by
> > freeing from the prev->vm_end to next->vm_start, so I don't understand
> > what that looks like in this reality as well.
> > 
> > > 
> > > On arm64 this barrier is at 52 bits and on x86 it is at 56 bits. This
> > > flag allows applications a way to specify exactly how many bits they
> > > want to be left unused by mmap. This eliminates the need for
> > > applications to know the page table hierarchy of the system to be able
> > > to reason which addresses mmap will be allowed to return.
> > 
> > But, why do they need to know today?  We have a limit for this don't we?
> 
> The limit is different for different architectures. On x86 the limit is
> 57 bits, and on arm64 it is 52 bits. So in the theoretical case that an
> application requires 10 bits free in a virtual address, the application
> would always work on arm64 regardless of the hint address, but on x86 if
> the hint address is greater than 48 bits then the application will not
> work.
> 
> The goal of this flag is to have consistent and tunable behavior of
> mmap() when it is desired to ensure that mmap() only returns addresses
> that use some number of bits.
> 
> > 
> > Also, these upper limits are how some archs use the upper bits that you
> > are trying to use.
> > 
> 
> It does not eliminate the existing behavior of the architectures to
> place this upper limits, it instead provides a way to have consistent
> behavior across all architectures.
> 
> > > 
> > > ---
> > > riscv made this feature of mmap returning addresses less than the hint
> > > address the default behavior. This was in contrast to the implementation
> > > of x86/arm64 that have a single boundary at the 5-level page table
> > > region. However this restriction proved too great -- the reduced
> > > address space when using a hint address was too small.
> > 
> > Yes, the hint is used to group things close together so it would
> > literally be random chance on if you have enough room or not (aslr and
> > all).
> > 
> > > 
> > > A patch for riscv [1] reverts the behavior that broke userspace. This
> > > series serves to make this feature available to all architectures.
> > 
> > I don't fully understand this statement, you say it broke userspace so
> > now you are porting it to everyone?  This reads as if you are braking
> > the userspace on all architectures :)
> 
> It was the default for mmap on riscv. The difference here is that it is now
> enabled by a flag instead. Instead of making the flag specific to riscv,
> I figured that other architectures might find it useful as well.
> 
> > 
> > If you fail to find room below, then your application fails as there is
> > no way to get the upper bits you need.  It would be better to fix this
> > in userspace - if your application is returned too high an address, then
> > free it and exit because it's going to fail anyways.
> > 
> 
> This flag is trying to define an API that is more robust than the
> current behavior on that x86 and arm64 which implicitly restricts mmap()
> addresses to 48 bits. A solution could be to just write in the docs that
> mmap() will always exhaust all addresses below the hint address before
> returning an address that is above the hint address. However a flag that
> defines this behavior seems more intuitive.
> 
> > > 
> > > I have only tested on riscv and x86.
> > 
> > This should be an RFC then.
> 
> Fair enough.
> 
> > 
> > > There is a tremendous amount of
> > > duplicated code in mmap so the implementations across architectures I
> > > believe should be mostly consistent. I added this feature to all
> > > architectures that implement either
> > > arch_get_mmap_end()/arch_get_mmap_base() or
> > > arch_get_unmapped_area_topdown()/arch_get_unmapped_area(). I also added
> > > it to the default behavior for arch_get_mmap_end()/arch_get_mmap_base().
> > 
> > Way too much duplicate code.  We should be figuring out how to make this
> > all work with the same code.
> > 
> > This is going to make the cloned code problem worse.
> 
> That would require standardizing every architecture with the generic
> mmap() framework that arm64 has developed. That is far outside the scope
> of this patch, but would be a great area to research for each of the
> architectures that do not use the generic framework.

Thinking about this again, I could drop support for all architectures
that do not implement arch_get_mmap_base()/arch_get_mmap_end().

> 
> - Charlie
> 
> > 
> > > 
> > > Link: https://lore.kernel.org/lkml/20240826-riscv_mmap-v1-2-cd8962afe47f@rivosinc.com/T/ [1]
> > > 
> > > To: Arnd Bergmann <arnd@...db.de>
> > > To: Paul Walmsley <paul.walmsley@...ive.com>
> > > To: Palmer Dabbelt <palmer@...belt.com>
> > > To: Albert Ou <aou@...s.berkeley.edu>
> > > To: Catalin Marinas <catalin.marinas@....com>
> > > To: Will Deacon <will@...nel.org>
> > > To: Michael Ellerman <mpe@...erman.id.au>
> > > To: Nicholas Piggin <npiggin@...il.com>
> > > To: Christophe Leroy <christophe.leroy@...roup.eu>
> > > To: Naveen N Rao <naveen@...nel.org>
> > > To: Muchun Song <muchun.song@...ux.dev>
> > > To: Andrew Morton <akpm@...ux-foundation.org>
> > > To: Liam R. Howlett <Liam.Howlett@...cle.com>
> > > To: Vlastimil Babka <vbabka@...e.cz>
> > > To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> > > To: Thomas Gleixner <tglx@...utronix.de>
> > > To: Ingo Molnar <mingo@...hat.com>
> > > To: Borislav Petkov <bp@...en8.de>
> > > To: Dave Hansen <dave.hansen@...ux.intel.com>
> > > To: x86@...nel.org
> > > To: H. Peter Anvin <hpa@...or.com>
> > > To: Huacai Chen <chenhuacai@...nel.org>
> > > To: WANG Xuerui <kernel@...0n.name>
> > > To: Russell King <linux@...linux.org.uk>
> > > To: Thomas Bogendoerfer <tsbogend@...ha.franken.de>
> > > To: James E.J. Bottomley <James.Bottomley@...senPartnership.com>
> > > To: Helge Deller <deller@....de>
> > > To: Alexander Gordeev <agordeev@...ux.ibm.com>
> > > To: Gerald Schaefer <gerald.schaefer@...ux.ibm.com>
> > > To: Heiko Carstens <hca@...ux.ibm.com>
> > > To: Vasily Gorbik <gor@...ux.ibm.com>
> > > To: Christian Borntraeger <borntraeger@...ux.ibm.com>
> > > To: Sven Schnelle <svens@...ux.ibm.com>
> > > To: Yoshinori Sato <ysato@...rs.sourceforge.jp>
> > > To: Rich Felker <dalias@...c.org>
> > > To: John Paul Adrian Glaubitz <glaubitz@...sik.fu-berlin.de>
> > > To: David S. Miller <davem@...emloft.net>
> > > To: Andreas Larsson <andreas@...sler.com>
> > > To: Shuah Khan <shuah@...nel.org>
> > > To: Alexandre Ghiti <alexghiti@...osinc.com>
> > > Cc: linux-arch@...r.kernel.org
> > > Cc: linux-kernel@...r.kernel.org
> > > Cc: Palmer Dabbelt <palmer@...osinc.com>
> > > Cc: linux-riscv@...ts.infradead.org
> > > Cc: linux-arm-kernel@...ts.infradead.org
> > > Cc: linuxppc-dev@...ts.ozlabs.org
> > > Cc: linux-mm@...ck.org
> > > Cc: loongarch@...ts.linux.dev
> > > Cc: linux-mips@...r.kernel.org
> > > Cc: linux-parisc@...r.kernel.org
> > > Cc: linux-s390@...r.kernel.org
> > > Cc: linux-sh@...r.kernel.org
> > > Cc: sparclinux@...r.kernel.org
> > > Cc: linux-kselftest@...r.kernel.org
> > > Signed-off-by: Charlie Jenkins <charlie@...osinc.com>
> > > 
> > > ---
> > > Charlie Jenkins (16):
> > >       mm: Add MAP_BELOW_HINT
> > >       riscv: mm: Do not restrict mmap address based on hint
> > >       mm: Add flag and len param to arch_get_mmap_base()
> > >       mm: Add generic MAP_BELOW_HINT
> > >       riscv: mm: Support MAP_BELOW_HINT
> > >       arm64: mm: Support MAP_BELOW_HINT
> > >       powerpc: mm: Support MAP_BELOW_HINT
> > >       x86: mm: Support MAP_BELOW_HINT
> > >       loongarch: mm: Support MAP_BELOW_HINT
> > >       arm: mm: Support MAP_BELOW_HINT
> > >       mips: mm: Support MAP_BELOW_HINT
> > >       parisc: mm: Support MAP_BELOW_HINT
> > >       s390: mm: Support MAP_BELOW_HINT
> > >       sh: mm: Support MAP_BELOW_HINT
> > >       sparc: mm: Support MAP_BELOW_HINT
> > >       selftests/mm: Create MAP_BELOW_HINT test
> > > 
> > >  arch/arm/mm/mmap.c                           | 10 ++++++++
> > >  arch/arm64/include/asm/processor.h           | 34 ++++++++++++++++++++++----
> > >  arch/loongarch/mm/mmap.c                     | 11 +++++++++
> > >  arch/mips/mm/mmap.c                          |  9 +++++++
> > >  arch/parisc/include/uapi/asm/mman.h          |  1 +
> > >  arch/parisc/kernel/sys_parisc.c              |  9 +++++++
> > >  arch/powerpc/include/asm/task_size_64.h      | 36 +++++++++++++++++++++++-----
> > >  arch/riscv/include/asm/processor.h           | 32 -------------------------
> > >  arch/s390/mm/mmap.c                          | 10 ++++++++
> > >  arch/sh/mm/mmap.c                            | 10 ++++++++
> > >  arch/sparc/kernel/sys_sparc_64.c             |  8 +++++++
> > >  arch/x86/kernel/sys_x86_64.c                 | 25 ++++++++++++++++---
> > >  fs/hugetlbfs/inode.c                         |  2 +-
> > >  include/linux/sched/mm.h                     | 34 ++++++++++++++++++++++++--
> > >  include/uapi/asm-generic/mman-common.h       |  1 +
> > >  mm/mmap.c                                    |  2 +-
> > >  tools/arch/parisc/include/uapi/asm/mman.h    |  1 +
> > >  tools/include/uapi/asm-generic/mman-common.h |  1 +
> > >  tools/testing/selftests/mm/Makefile          |  1 +
> > >  tools/testing/selftests/mm/map_below_hint.c  | 29 ++++++++++++++++++++++
> > >  20 files changed, 216 insertions(+), 50 deletions(-)
> > > ---
> > > base-commit: 5be63fc19fcaa4c236b307420483578a56986a37
> > > change-id: 20240827-patches-below_hint_mmap-b13d79ae1c55
> > > -- 
> > > - Charlie
> > > 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ