lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 2 Mar 2023 16:32:44 +0100
From:   Snild Dolkow <snild@...y.com>
To:     "Liam R. Howlett" <Liam.Howlett@...cle.com>,
        "Matthew Wilcox (Oracle)" <willy@...radead.org>
Cc:     "regressions@...ts.linux.dev" <regressions@...ts.linux.dev>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>,
        "maple-tree@...ts.infradead.org" <maple-tree@...ts.infradead.org>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: [Regression] mmap with MAP_32BIT randomly fails since 6.1

After upgrading a machine from 5.17.4 to 6.1.12 a couple of weeks ago, I 
started getting (inconsistent) failures when building Android:

> dex2oatd F 02-28 11:49:44 40098 40098 mem_map_arena_pool.cc:65] Check failed: map.IsValid() Failed anonymous mmap((nil), 131072, 0x3, 0x22, -1, 0): Cannot allocate memory. See process maps in the log.

While it claims to be using 0x22 (MAP_PRIVATE | MAP_ANONYMOUS) for the 
flags, it really uses 0x40 (MAP_32BIT) as well, as shown by strace:

> mmap(NULL, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0) = 0x40720000
> mmap(NULL, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0) = 0x4124e000
> mmap(NULL, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0) = -1 ENOMEM (Cannot allocate memory)
> dex2oatd F 03-01 10:32:33 74063 74063 mem_map_arena_pool.cc:65] Check failed: map.IsValid() Failed anonymous mmap((nil), 131072, 0x3, 0x22, -1, 0): Cannot allocate memory. See process maps in the log.

Here's a simple reproducer, which (if my math is correct) tries to mmap 
a total of ~600MiB in increasing chunk sizes:

#include <sys/mman.h>
#include <stdio.h>
#include <errno.h>

int main() {
     size_t total_leaks = 0;
     for (int shift=12; shift<=16; shift++) {
         size_t size = ((size_t)1)<<shift;
         for (int i=0; i<5000; ++i) {
             void* m = mmap(NULL, size, PROT_READ | PROT_WRITE,
                     MAP_PRIVATE | MAP_ANONYMOUS | MAP_32BIT, -1, 0);
             if (m == MAP_FAILED || m == NULL) {
                 printf(
                     "Failed. m=%p size=%zd (1<<%d) i=%d "
                     " errno=%d total_leaks=%zd (%zd MiB)\n",
                     m, size, shift, i, errno,
                     total_leaks, total_leaks / 1024 / 1024);
                 return 1;
             }
             total_leaks += size;
         }
     }
     printf("Success.\n");
     return 0;
}

Older kernels fail very consistently at almost exactly 1GiB total_leaks, 
if you change the test program to go that far. On 6.1.12, it fails much 
earlier, after an arbitrary amount of successful mmaps:

> $ ./mmap-test 
> Failed. m=0xffffffffffffffff size=4096 (1<<12) i=1500  errno=12 total_leaks=6144000 (5 MiB)
> $ ./mmap-test 
> Failed. m=0xffffffffffffffff size=4096 (1<<12) i=620  errno=12 total_leaks=2539520 (2 MiB)
> $ ./mmap-test 
> Failed. m=0xffffffffffffffff size=4096 (1<<12) i=2408  errno=12 total_leaks=9863168 (9 MiB)
> $ ./mmap-test 
> Failed. m=0xffffffffffffffff size=4096 (1<<12) i=774  errno=12 total_leaks=3170304 (3 MiB)
> $ ./mmap-test 
> Failed. m=0xffffffffffffffff size=4096 (1<<12) i=1648  errno=12 total_leaks=6750208 (6 MiB)
> $ ./mmap-test 


I have checked a more recent master commit (ee3f96b1, from March 1st), 
and the problem is still there. Bisecting shows that e15e06a8 is the 
last good commit, and that 524e00b3 is the first one failing in this 
way. The 10 or so commits in between run into a page fault BUG down in 
vma_merge() instead.

This range of commits is about the same as mentioned in 
https://lore.kernel.org/lkml/0b9f5425-08d4-8013-aa4c-e620c3b10bb2@leemhuis.info/, 
so I assume that my problem, too, was introduced with the Maple Tree 
changes. Sending this to the same people and lists.

//Snild

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ