linux-kernel - Unhandled page fault in vmemmap_populate on x86

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <20220509090637.24152-1-ken@codelabs.ch>
Date:   Mon,  9 May 2022 11:06:36 +0200
From:   Adrian-Ken Rueegsegger <ken@...elabs.ch>
To:     dave.hansen@...ux.intel.com, osalvador@...e.de
Cc:     david@...hat.com, luto@...nel.org, peterz@...radead.org,
        tglx@...utronix.de, mingo@...hat.com, bp@...en8.de, hpa@...or.com,
        x86@...nel.org, linux-kernel@...r.kernel.org,
        Adrian-Ken Rueegsegger <ken@...elabs.ch>
Subject: Unhandled page fault in vmemmap_populate on x86_64

Hello,

While running Linux 5.15.32/x86_64 (with some out-of-tree patches) on top of
Muen [1], I came across a BUG/page fault triggered in vmemmap_populate:

[    0.000000] BUG: unable to handle page fault for address: ffffea0001e00000
[    0.000000] #PF: supervisor write access in kernel mode
[    0.000000] #PF: error_code(0x0002) - not-present page
[    0.000000] PGD 1003a067 P4D 1003a067 PUD 10039067 PMD 0 
[    0.000000] Oops: 0002 [#1] SMP PTI
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.32-muen #1
[    0.000000] RIP: 0010:vmemmap_populate+0x181/0x218
[    0.000000] Code: 00 a9 ff ff 1f 00 0f 84 a1 00 00 00 e8 91 f7 ff ff b9 0e 00 00 00 31 c0 48 89 ef f3 ab 48 85 f6 74 0a b0 fd 48 89 ef 48 89 f1 <f3> aa 4d 85 c0 74 7c 48 89 1d 2e e2 05 00 eb 73 48 83 3c 24 00 0f
[    0.000000] RSP: 0000:ffffffff82003e00 EFLAGS: 00010006
[    0.000000] RAX: 00000000000000fd RBX: ffffea0001e00000 RCX: 0000000000180000
[    0.000000] RDX: ffffea0000540000 RSI: 00000000001c0000 RDI: ffffea0001e00000
[    0.000000] RBP: ffffea0001dc0000 R08: 0000000000000000 R09: 0000000088000000
[    0.000000] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[    0.000000] R13: ffffea0001f80000 R14: ffffea0001dc0000 R15: ffff888010039070
[    0.000000] FS:  0000000000000000(0000) GS:ffffffff823ea000(0000) knlGS:0000000000000000
[    0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.000000] CR2: ffffea0001e00000 CR3: 000000000200a000 CR4: 00000000000406b0
[    0.000000] DR0: 000000000000003a DR1: 0000000000000003 DR2: 0000000000000000
[    0.000000] DR3: ffffea0001e00000 DR6: 000000000200a000 DR7: ffffffff82003d58
[    0.000000] Call Trace:
[    0.000000]  <TASK>
[    0.000000]  ? __populate_section_memmap+0x3a/0x47
[    0.000000]  ? sparse_init_nid+0xc9/0x174
[    0.000000]  ? sparse_init+0x1c1/0x1d2
[    0.000000]  ? paging_init+0x5/0xa
[    0.000000]  ? setup_arch+0x740/0x810
[    0.000000]  ? start_kernel+0x43/0x5bb
[    0.000000]  ? secondary_startup_64_no_verify+0xb0/0xbb
[    0.000000]  </TASK>
[    0.000000] Modules linked in:
[    0.000000] CR2: ffffea0001e00000
[    0.000000] random: get_random_bytes called from init_oops_id+0x1d/0x2c with crng_init=0
[    0.000000] ---[ end trace 44fe402cfef775de ]---

Announcing an available RAM region at 0x88000000 to Linux (via e820) triggered
the issue while placing it at 0x70000000 did not hit the bug. Since the problem
had not been observed with 5.10, I did a bisect which pointed to commit
8d400913c231 ("x86/vmemmap: handle unpopulated sub-pmd ranges") as the culprit.
Further debugging showed that the #PF originates from vmemmap_use_new_sub_pmd
in arch/x86/mm/init_64.c. In the error case the condition
!IS_ALIGNED(start, PMD_SIZE) evaluates to true and the page-fault is caused by
the memset marking the preceding region as unused:

    if (!IS_ALIGNED(start, PMD_SIZE))
        memset((void *)start, PAGE_UNUSED,
               start - ALIGN_DOWN(start, PMD_SIZE));

If I am not mistaken, the start variable is the wrong address to use here,
since it points to the beginning of the range that is to be *used*. Instead the
"start" of the PMD should be used, i.e. ALIGN_DOWN(start, PMD_SIZE). Looking at
arch/s390/mm/vmem.c, vmemmap_use_new_sub_pmd seems to confirm this. Is the
above analysis correct or did I misread the code?

The attached patch fixes the observed issue for me.

Regards,
Adrian

PS: When replying please include my address as to/cc since I am not subscribed
to LKML, thanks!

[1] - https://muen.sk