[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2025100920-riverbank-congress-c7ee@gregkh>
Date: Thu, 9 Oct 2025 07:00:46 +0200
From: Greg KH <gregkh@...uxfoundation.org>
To: Vivian Wang <wangruikang@...as.ac.cn>
Cc: stable@...r.kernel.org, Paul Walmsley <pjw@...nel.org>,
Palmer Dabbelt <palmer@...belt.com>,
Albert Ou <aou@...s.berkeley.edu>,
Paul Walmsley <paul.walmsley@...ive.com>,
linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
Guo Ren <guoren@...nel.org>, Charlie Jenkins <charlie@...osinc.com>,
Yangyu Chen <cyy@...self.name>, Han Gao <rabenda.cn@...il.com>,
Icenowy Zheng <uwu@...nowy.me>, Inochi Amaoto <inochiama@...il.com>,
Yao Zi <ziyao@...root.org>, Palmer Dabbelt <palmer@...osinc.com>,
Meng Zhuo <mengzhuo@...as.ac.cn>
Subject: Re: [PATCH 6.6.y 0/2] riscv: mm: Backport of mmap hint address fixes
On Thu, Oct 09, 2025 at 12:19:46PM +0800, Vivian Wang wrote:
>
> On 10/8/25 18:20, Greg KH wrote:
> > On Wed, Oct 08, 2025 at 03:50:15PM +0800, Vivian Wang wrote:
> >> Backport of the two riscv mmap patches from master. In effect, these two
> >> patches removes arch_get_mmap_{base,end} for riscv.
> > Why is this needed? What bug does this fix?
>
> The behavior of mmap hint address in current 6.6.y is broken when > 39
> bits of virtual address is available (i.e. Sv48 or Sv57, having 48 and
> 57 bits of VA available, respectively). The man-pages mmap(2) page
> states, for the hint address [1]:
>
> If addr is NULL, then the kernel chooses the (page-aligned)
> address at which to create the mapping; this is the most portable
> method of creating a new mapping. If addr is not NULL, then the
> kernel takes it as a hint about where to place the mapping; on
> Linux, the kernel will pick a nearby page boundary (but always
> above or equal to the value specified by
> /proc/sys/vm/mmap_min_addr) and attempt to create the mapping
> there. If another mapping already exists there, the kernel picks
> a new address that may or may not depend on the hint. The address
> of the new mapping is returned as the result of the call.
>
> Therefore, if a userspace program specifies a large hint address of e.g.
> 1<<50, and both the kernel and the hardware supports it, it should be
> used even if MAP_FIXED is not specified. This is also the behavior
> implemented in x86_64, arm64, and, on a recent enough (> 6.10) kernel,
> riscv64.
>
> However, current 6.6.y for riscv64 implements a bizarre behavior, where
> the hint address is treated as an upper bound instead. Therefore,
> passing 1<<50 would actually return a VA in 48-bit space.
>
> To reproduce, call mmap with arguments like:
>
> mmap(hint, 4096, PROT_READ, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
>
> Comparison:
>
> hint = 0x4000000000000 i.e. 1 << 50
>
> 6.6.106 6.6.106 + patch
> sv48 0x7fff90223000 0x7fff93b4e000
> sv57 0x7fffb7d49000 0x4000000000000
>
> When the hint is not used, the exact address is of course random, which
> is expected. However, since the address 1<<50 is supported under Sv57,
> it should be usable by mmap, but with current 6.6.y behavior it is not
> used, and some other address from 48-bit space used instead.
>
> There's not yet real riscv64 hardware with Sv57, but an analogous
> problem arises on Sv48 with an address like 1<<40.
As this issue has been fixed for many years now, why is it just showing
up now? Shouldn't you be using 6.12.y for new hardware?
> One real userspace program that runs into this is the Go programming
> language runtime with TSAN enabled. Excerpt from a test log [2], which
> was run on an Eswin EIC7700x, which supports Sv48:
>
> fatal error: too many address space collisions for -race mode
> runtime stack:
> runtime.throw({0x257eaa?, 0x4000000?})
> /home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/panic.go:1246 +0x38 fp=0x7ffff84af758 sp=0x7ffff84af730 pc=0xc9310
> runtime.(*mheap).sysAlloc(0x3e3c20, 0x81cc8?, 0x3f3e28, 0x3f3e50)
> /home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/malloc.go:799 +0x56c fp=0x7ffff84af7f8 sp=0x7ffff84af758 pc=0x67944
> runtime.(*mheap).grow(0x3e3c20, 0x7fffb69fee00?)
> /home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/mheap.go:1568 +0x9c fp=0x7ffff84af870 sp=0x7ffff84af7f8 pc=0x824c4
> runtime.(*mheap).allocSpan(0x3e3c20, 0x1, 0x0, 0x10)
> [...]
> FAIL runtime/race 0.285s
>
> With TSAN enabled, the Go runtime allocates a lot of virtual address
> space. As the message suggests, if the return value of mmap is not equal
> to a non-zero hint, the runtime assumes that mmap is failing to allocate
> the address because some other mapping is already there (in other words,
> it assumes the man-pages documented behavior), and unmaps it and tries a
> different address, until it tries too many times and gives up. This
> means Go with TSAN fails to initialize on Sv48 and current 6.6.y.
>
> (cc Meng Zhuo, in case of any questions about the Go runtime here.)
>
> Patch 1 here addresses the above issue, but introduced regressions (see
> replies in "Link"). Patch 2 addresses those regressions.
Ok, that makes a bit more sense, but again, why is this just showing up
now? What changed to cause this to be noticed at and needed to be fixed
at this moment in time and not before?
thanks,
greg k-h
Powered by blists - more mailing lists