[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4da9da2f-73e4-45fd-b62f-a8a513314057@redhat.com>
Date: Mon, 3 Jun 2024 10:37:44 +0200
From: David Hildenbrand <david@...hat.com>
To: Lance Yang <ioworker0@...il.com>, akpm@...ux-foundation.org,
yjnworkstation@...il.com
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org, willy@...radead.org,
00107082@....com, libang.li@...group.com
Subject: Re: [PATCH] mm: init_mlocked_on_free_v3
On 01.06.24 16:09, Lance Yang wrote:
> Completely agree with David's point[1]. I'm also not convinced that this is the
> right approach :)
>
> It seems like this patch won't handle all cases, as David mentioned[1] before.
> folio_remove_rmap_ptes() will immediately munlock a large folio (as large folios
> are not allowed to be batch-added to the LRU list) via munlock_vma_folio() when
> it is fully unmapped, so this patch does not work in this case. Even worse, if
> we encounter a COW mlocked folio, we would run into trouble (data corruption).
Adding to what Lance said regarding COW, here is a simple reproducer:
[root@...alhost ~]# cat mlock.c
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/wait.h>
int main(void)
{
size_t size = getpagesize();
volatile char *mem;
int rc;
mem = mmap(0, size, PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
if (mem == MAP_FAILED) {
fprintf(stderr, "mmap() failed\n");
return -1;
}
rc = mlock((char *) mem, size);
if (rc) {
fprintf(stderr, "mlock() failed\n");
return -1;
}
memset((char *) mem, 1, size);
rc = fork();
if (rc < 0) {
fprintf(stderr, "fork() failed\n");
return -1;
} else if (!rc) {
return 0;
}
waitpid(rc, NULL, 0);
sleep(2);
if (mem[0] != 1) {
fprintf(stderr, "[FAIL] Memory content changed!\n");
return -1;
}
fprintf(stderr, "[PASS] Memory content did not change!\n");
return 0;
}
[root@...alhost ~]# ./mlock
[FAIL] Memory content changed!
In contrast, when it's not enabled on the cmdline:
[root@...alhost ~]# ./mlock
[PASS] Memory content did not change!
Further, rebooting with this enabled gives me (retried once to make sure enabling/disabling
this feature makes a difference):
[ 105.057230] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000008b
[ 105.059491] CPU: 29 PID: 1 Comm: systemd-shutdow Not tainted 6.10.0-rc1+ #16
[ 105.061545] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
[ 105.063884] Call Trace:
[ 105.064548] <TASK>
[ 105.065088] dump_stack_lvl+0x5d/0x80
[ 105.066036] panic+0x118/0x2c8
[ 105.066837] do_exit.cold+0x18/0x18
[ 105.067759] do_group_exit+0x36/0xa0
[ 105.069177] get_signal+0x9bc/0x9e0
[ 105.070191] arch_do_signal_or_restart+0x3b/0x240
[ 105.071538] irqentry_exit_to_user_mode+0x1c2/0x230
[ 105.072913] asm_exc_page_fault+0x26/0x30
[ 105.074057] RIP: 0033:0x0
[ 105.074803] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[ 105.076647] RSP: 002b:00007fff0d9a41a8 EFLAGS: 00010206
[ 105.078090] RAX: 0000000000000011 RBX: 0000000006d8cef9 RCX: 0000000000000005
[ 105.080046] RDX: 00007fff0d9a4250 RSI: 00000000000008c7 RDI: 0000000000000001
[ 105.082139] RBP: 00007fff0d9a4310 R08: 0019fa9fcd800000 R09: 00000029e0b9a6f0
[ 105.084038] R10: 0000000000000008 R11: 0000000000000202 R12: 00007fff0d9a4250
[ 105.085852] R13: 00007fff0d9a41d0 R14: 00000000000008c7 R15: 0000000000000000
[ 105.087653] </TASK>
[ 105.090686] Kernel Offset: 0x1e000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 105.093359] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000008b ]---
This needs to be reverted.
@Andrew, to you want us to send an official revert patch?
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists