linux-kernel - Re: [PATCH] mm: init_mlocked_on_free

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <4da9da2f-73e4-45fd-b62f-a8a513314057@redhat.com>
Date: Mon, 3 Jun 2024 10:37:44 +0200
From: David Hildenbrand <david@...hat.com>
To: Lance Yang <ioworker0@...il.com>, akpm@...ux-foundation.org,
 yjnworkstation@...il.com
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org, willy@...radead.org,
 00107082@....com, libang.li@...group.com
Subject: Re: [PATCH] mm: init_mlocked_on_free_v3

On 01.06.24 16:09, Lance Yang wrote:
> Completely agree with David's point[1]. I'm also not convinced that this is the
> right approach :)
> 
> It seems like this patch won't handle all cases, as David mentioned[1] before.
> folio_remove_rmap_ptes() will immediately munlock a large folio (as large folios
> are not allowed to be batch-added to the LRU list) via munlock_vma_folio() when
> it is fully unmapped, so this patch does not work in this case. Even worse, if
> we encounter a COW mlocked folio, we would run into trouble (data corruption).

Adding to what Lance said regarding COW, here is a simple reproducer:


[root@...alhost ~]# cat mlock.c
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/wait.h>

int main(void)
{
         size_t size = getpagesize();
         volatile char *mem;
         int rc;

         mem = mmap(0, size, PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
         if (mem == MAP_FAILED) {
                 fprintf(stderr, "mmap() failed\n");
                 return -1;
         }

         rc = mlock((char *) mem, size);
         if (rc) {
                 fprintf(stderr, "mlock() failed\n");
                 return -1;
         }

         memset((char *) mem, 1, size);

         rc = fork();
         if (rc < 0) {
                 fprintf(stderr, "fork() failed\n");
                 return -1;
         } else if (!rc) {
                 return 0;
         }

         waitpid(rc, NULL, 0);
         sleep(2);

         if (mem[0] != 1) {
                 fprintf(stderr, "[FAIL] Memory content changed!\n");
                 return -1;
         }
         fprintf(stderr, "[PASS] Memory content did not change!\n");

         return 0;
}

[root@...alhost ~]# ./mlock
[FAIL] Memory content changed!

In contrast, when it's not enabled on the cmdline:

[root@...alhost ~]# ./mlock
[PASS] Memory content did not change!


Further, rebooting with this enabled gives me (retried once to make sure enabling/disabling
this feature makes a difference):

[  105.057230] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000008b
[  105.059491] CPU: 29 PID: 1 Comm: systemd-shutdow Not tainted 6.10.0-rc1+ #16
[  105.061545] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
[  105.063884] Call Trace:
[  105.064548]  <TASK>
[  105.065088]  dump_stack_lvl+0x5d/0x80
[  105.066036]  panic+0x118/0x2c8
[  105.066837]  do_exit.cold+0x18/0x18
[  105.067759]  do_group_exit+0x36/0xa0
[  105.069177]  get_signal+0x9bc/0x9e0
[  105.070191]  arch_do_signal_or_restart+0x3b/0x240
[  105.071538]  irqentry_exit_to_user_mode+0x1c2/0x230
[  105.072913]  asm_exc_page_fault+0x26/0x30
[  105.074057] RIP: 0033:0x0
[  105.074803] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[  105.076647] RSP: 002b:00007fff0d9a41a8 EFLAGS: 00010206
[  105.078090] RAX: 0000000000000011 RBX: 0000000006d8cef9 RCX: 0000000000000005
[  105.080046] RDX: 00007fff0d9a4250 RSI: 00000000000008c7 RDI: 0000000000000001
[  105.082139] RBP: 00007fff0d9a4310 R08: 0019fa9fcd800000 R09: 00000029e0b9a6f0
[  105.084038] R10: 0000000000000008 R11: 0000000000000202 R12: 00007fff0d9a4250
[  105.085852] R13: 00007fff0d9a41d0 R14: 00000000000008c7 R15: 0000000000000000
[  105.087653]  </TASK>
[  105.090686] Kernel Offset: 0x1e000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  105.093359] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000008b ]---


This needs to be reverted.

@Andrew, to you want us to send an official revert patch?

-- 
Cheers,

David / dhildenb