linux-kernel - Re: Regression bisected to fa3354e4ea39 (mm: free_area_init: use maximal zone PFNs rather than zone sizes)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YP8Vxt0xuV1m5EPS@linux.ibm.com>
Date:   Mon, 26 Jul 2021 23:06:30 +0300
From:   Mike Rapoport <rppt@...ux.ibm.com>
To:     Matt Turner <mattst88@...il.com>
Cc:     Michael Cree <mcree@...on.net.nz>, linux-mm@...ck.org,
        linux-alpha@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: Regression bisected to fa3354e4ea39 (mm: free_area_init: use
 maximal zone PFNs rather than zone sizes)

Hi Matt,

On Mon, Jul 26, 2021 at 12:27:50PM -0700, Matt Turner wrote:
> Reply-To:
> 
> Hi Mike!
> 
> Since commit fa3354e4ea39 (mm: free_area_init: use maximal zone PFNs rather
> than zone sizes), I get the following BUG on Alpha (an AlphaServer ES47 Marvel)
> and loading userspace leads to a segfault:
> 
> (I didn't notice this for a long time because of other unrelated regressions,
> the pandemic, changing jobs, ...)
 
I suspect there will be more surprises down the road :)

> BUG: Bad page state in process swapper  pfn:2ffc53
> page:fffffc000ecf14c0 refcount:0 mapcount:1 mapping:0000000000000000 index:0x0
> flags: 0x0()
> raw: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> raw: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> page dumped because: nonzero mapcount  Modules linked in:
> CPU: 0 PID: 0 Comm: swapper Not tainted 5.7.0-03841-gfa3354e4ea39-dirty #26
>        fffffc0001b5bd68 fffffc0001b5be80 fffffc00011cd148 fffffc000ecf14c0
>        fffffc00019803df fffffc0001b5be80 fffffc00011ce340 fffffc000ecf14c0
>        0000000000000000 fffffc0001b5be80 fffffc0001b482c0 fffffc00027d6618
>        fffffc00027da7d0 00000000002ff97a 0000000000000000 fffffc0001b5be80
>        fffffc00011d1abc fffffc000ecf14c0 fffffc0002d00000 fffffc0001b5be80
>        fffffc0001b2350c 0000000000300000 fffffc0001b48298 fffffc0001b482c0
> Trace:
> [<fffffc00011cd148>] bad_page+0x168/0x1b0
> [<fffffc00011ce340>] free_pcp_prepare+0x1e0/0x290
> [<fffffc00011d1abc>] free_unref_page+0x2c/0xa0
> [<fffffc00014ee5f0>] cmp_ex_sort+0x0/0x30
> [<fffffc00014ee5f0>] cmp_ex_sort+0x0/0x30
> [<fffffc000101001c>] _stext+0x1c/0x20
> 
> I haven't tried reproducing this on other machines or QEMU, but I'd be glad to
> if that helps.

If it's reproducible on QEMU I can debug it locally.
 
> Any ideas?

It seems like memory map is not properly initialized. Can you enable
CONFIG_DEBUG_MEMORY_INIT and add mminit_debug=4 to the command line. The
interesting part of the log would be before "Memory: xK/yK available ..."
line.

Hopefully it'll give some clues.

-- 
Sincerely yours,
Mike.