lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <187fe5a3-99b9-49b6-be49-3d4f6f1fb16b@iscas.ac.cn>
Date: Thu, 9 Oct 2025 12:19:46 +0800
From: Vivian Wang <wangruikang@...as.ac.cn>
To: Greg KH <gregkh@...uxfoundation.org>
Cc: stable@...r.kernel.org, Paul Walmsley <pjw@...nel.org>,
 Palmer Dabbelt <palmer@...belt.com>, Albert Ou <aou@...s.berkeley.edu>,
 Paul Walmsley <paul.walmsley@...ive.com>, linux-riscv@...ts.infradead.org,
 linux-kernel@...r.kernel.org, Guo Ren <guoren@...nel.org>,
 Charlie Jenkins <charlie@...osinc.com>, Yangyu Chen <cyy@...self.name>,
 Han Gao <rabenda.cn@...il.com>, Icenowy Zheng <uwu@...nowy.me>,
 Inochi Amaoto <inochiama@...il.com>, Yao Zi <ziyao@...root.org>,
 Palmer Dabbelt <palmer@...osinc.com>, Meng Zhuo <mengzhuo@...as.ac.cn>
Subject: Re: [PATCH 6.6.y 0/2] riscv: mm: Backport of mmap hint address fixes


On 10/8/25 18:20, Greg KH wrote:
> On Wed, Oct 08, 2025 at 03:50:15PM +0800, Vivian Wang wrote:
>> Backport of the two riscv mmap patches from master. In effect, these two
>> patches removes arch_get_mmap_{base,end} for riscv.
> Why is this needed?  What bug does this fix?

The behavior of mmap hint address in current 6.6.y is broken when > 39
bits of virtual address is available (i.e. Sv48 or Sv57, having 48 and
57 bits of VA available, respectively). The man-pages mmap(2) page
states, for the hint address [1]:

       If addr is NULL, then the kernel chooses the (page-aligned)
       address at which to create the mapping; this is the most portable
       method of creating a new mapping.  If addr is not NULL, then the
       kernel takes it as a hint about where to place the mapping; on
       Linux, the kernel will pick a nearby page boundary (but always
       above or equal to the value specified by
       /proc/sys/vm/mmap_min_addr) and attempt to create the mapping
       there.  If another mapping already exists there, the kernel picks
       a new address that may or may not depend on the hint.  The address
       of the new mapping is returned as the result of the call.

Therefore, if a userspace program specifies a large hint address of e.g.
1<<50, and both the kernel and the hardware supports it, it should be
used even if MAP_FIXED is not specified. This is also the behavior
implemented in x86_64, arm64, and, on a recent enough (> 6.10) kernel,
riscv64.

However, current 6.6.y for riscv64 implements a bizarre behavior, where
the hint address is treated as an upper bound instead. Therefore,
passing 1<<50 would actually return a VA in 48-bit space.

To reproduce, call mmap with arguments like:

       mmap(hint, 4096, PROT_READ, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);

Comparison:

        hint = 0x4000000000000 i.e. 1 << 50

                    6.6.106             6.6.106 + patch
            sv48    0x7fff90223000      0x7fff93b4e000
            sv57    0x7fffb7d49000      0x4000000000000

When the hint is not used, the exact address is of course random, which
is expected. However, since the address 1<<50 is supported under Sv57,
it should be usable by mmap, but with current 6.6.y behavior it is not
used, and some other address from 48-bit space used instead.

There's not yet real riscv64 hardware with Sv57, but an analogous
problem arises on Sv48 with an address like 1<<40.

One real userspace program that runs into this is the Go programming
language runtime with TSAN enabled. Excerpt from a test log [2], which
was run on an Eswin EIC7700x, which supports Sv48:

fatal error: too many address space collisions for -race mode
runtime stack:
runtime.throw({0x257eaa?, 0x4000000?})
    /home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/panic.go:1246 +0x38 fp=0x7ffff84af758 sp=0x7ffff84af730 pc=0xc9310
runtime.(*mheap).sysAlloc(0x3e3c20, 0x81cc8?, 0x3f3e28, 0x3f3e50)
    /home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/malloc.go:799 +0x56c fp=0x7ffff84af7f8 sp=0x7ffff84af758 pc=0x67944
runtime.(*mheap).grow(0x3e3c20, 0x7fffb69fee00?)
    /home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/mheap.go:1568 +0x9c fp=0x7ffff84af870 sp=0x7ffff84af7f8 pc=0x824c4
runtime.(*mheap).allocSpan(0x3e3c20, 0x1, 0x0, 0x10)
[...]
FAIL    runtime/race    0.285s

With TSAN enabled, the Go runtime allocates a lot of virtual address
space. As the message suggests, if the return value of mmap is not equal
to a non-zero hint, the runtime assumes that mmap is failing to allocate
the address because some other mapping is already there (in other words,
it assumes the man-pages documented behavior), and unmaps it and tries a
different address, until it tries too many times and gives up. This
means Go with TSAN fails to initialize on Sv48 and current 6.6.y.

(cc Meng Zhuo, in case of any questions about the Go runtime here.)

Patch 1 here addresses the above issue, but introduced regressions (see
replies in "Link"). Patch 2 addresses those regressions.

Thanks,
Vivian "dramforever" Wang

[1]: https://man7.org/linux/man-pages/man2/mmap.2.html
[2]: https://logs.chromium.org/logs/golang/buildbucket/cr-buildbucket/8708301310656989281/+/u/step/22/log/2


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ