[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250407025455.67nhchndedotn57g@master>
Date: Mon, 7 Apr 2025 02:54:56 +0000
From: Wei Yang <richard.weiyang@...il.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>,
Suren Baghdasaryan <surenb@...gle.com>,
David Hildenbrand <david@...hat.com>,
Matthew Wilcox <willy@...radead.org>,
Rik van Riel <riel@...riel.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/3] tools/testing/selftests: assert that anon merge
cases behave as expected
On Mon, Mar 17, 2025 at 09:15:05PM +0000, Lorenzo Stoakes wrote:
>Prior to the recently applied commit that permits this merge,
>mprotect()'ing a faulted VMA, adjacent to an unfaulted VMA, such that the
>two share characteristics would fail to merge due to what appear to be
>unintended consequences of commit 965f55dea0e3 ("mmap: avoid merging cloned
>VMAs").
>
>Now we have fixed this bug, assert that we can indeed merge anonymous VMAs
>this way.
>
>Also assert that forked source/target VMAs are equally
>rejected. Previously, all empty target anon merges with one VMA faulted and
>the other unfaulted would be rejected incorrectly, now we ensure that
>unforked merge, but forked do not.
>
>Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
>---
> tools/testing/selftests/mm/.gitignore | 1 +
> tools/testing/selftests/mm/Makefile | 1 +
> tools/testing/selftests/mm/merge.c | 454 ++++++++++++++++++++++
> tools/testing/selftests/mm/run_vmtests.sh | 2 +
> 4 files changed, 458 insertions(+)
> create mode 100644 tools/testing/selftests/mm/merge.c
>
>diff --git a/tools/testing/selftests/mm/.gitignore b/tools/testing/selftests/mm/.gitignore
>index c5241b193db8..91db34941a14 100644
>--- a/tools/testing/selftests/mm/.gitignore
>+++ b/tools/testing/selftests/mm/.gitignore
>@@ -58,3 +58,4 @@ hugetlb_dio
> pkey_sighandler_tests_32
> pkey_sighandler_tests_64
> guard-regions
>+merge
>diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
>index 8270895039d1..ad4d6043a60f 100644
>--- a/tools/testing/selftests/mm/Makefile
>+++ b/tools/testing/selftests/mm/Makefile
>@@ -98,6 +98,7 @@ TEST_GEN_FILES += hugetlb_madv_vs_map
> TEST_GEN_FILES += hugetlb_dio
> TEST_GEN_FILES += droppable
> TEST_GEN_FILES += guard-regions
>+TEST_GEN_FILES += merge
>
> ifneq ($(ARCH),arm64)
> TEST_GEN_FILES += soft-dirty
>diff --git a/tools/testing/selftests/mm/merge.c b/tools/testing/selftests/mm/merge.c
>new file mode 100644
>index 000000000000..9cc61bdbfba8
>--- /dev/null
>+++ b/tools/testing/selftests/mm/merge.c
>@@ -0,0 +1,454 @@
>+// SPDX-License-Identifier: GPL-2.0-or-later
>+
>+#define _GNU_SOURCE
>+#include "../kselftest_harness.h"
>+#include <stdio.h>
>+#include <stdlib.h>
>+#include <unistd.h>
>+#include <sys/mman.h>
>+#include <sys/wait.h>
>+#include "vm_util.h"
>+
>+FIXTURE(merge)
>+{
>+ unsigned int page_size;
>+ char *carveout;
>+ struct procmap_fd procmap;
>+};
>+
>+FIXTURE_SETUP(merge)
>+{
>+ self->page_size = psize();
>+ /* Carve out PROT_NONE region to map over. */
>+ self->carveout = mmap(NULL, 12 * self->page_size, PROT_NONE,
>+ MAP_ANON | MAP_PRIVATE, -1, 0);
>+ ASSERT_NE(self->carveout, MAP_FAILED);
>+ /* Setup PROCMAP_QUERY interface. */
>+ ASSERT_EQ(open_self_procmap(&self->procmap), 0);
>+}
>+
>+FIXTURE_TEARDOWN(merge)
>+{
>+ ASSERT_EQ(munmap(self->carveout, 12 * self->page_size), 0);
>+ ASSERT_EQ(close_procmap(&self->procmap), 0);
>+}
>+
>+TEST_F(merge, mprotect_unfaulted_left)
>+{
>+ unsigned int page_size = self->page_size;
>+ char *carveout = self->carveout;
>+ struct procmap_fd *procmap = &self->procmap;
>+ char *ptr;
>+
>+ /*
>+ * Map 10 pages of R/W memory within. MAP_NORESERVE so we don't hit
>+ * merge failure due to lack of VM_ACCOUNT flag by mistake.
>+ *
>+ * |-----------------------|
>+ * | unfaulted |
>+ * |-----------------------|
>+ */
>+ ptr = mmap(&carveout[page_size], 10 * page_size, PROT_READ | PROT_WRITE,
>+ MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
>+ ASSERT_NE(ptr, MAP_FAILED);
>+ /*
>+ * Now make the first 5 pages read-only, splitting the VMA:
>+ *
>+ * RO RW
>+ * |-----------|-----------|
>+ * | unfaulted | unfaulted |
>+ * |-----------|-----------|
>+ */
>+ ASSERT_EQ(mprotect(ptr, 5 * page_size, PROT_READ), 0);
>+ /*
>+ * Fault in the first of the last 5 pages so it gets an anon_vma and
>+ * thus the whole VMA becomes 'faulted':
>+ *
>+ * RO RW
>+ * |-----------|-----------|
>+ * | unfaulted | faulted |
>+ * |-----------|-----------|
>+ */
>+ ptr[5 * page_size] = 'x';
>+ /*
>+ * Now mprotect() the RW region read-only, we should merge (though for
>+ * ~15 years we did not! :):
>+ *
>+ * RO
>+ * |-----------------------|
>+ * | faulted |
>+ * |-----------------------|
>+ */
>+ ASSERT_EQ(mprotect(&ptr[5 * page_size], 5 * page_size, PROT_READ), 0);
>+
>+ /* Assert that the merge succeeded using PROCMAP_QUERY. */
>+ ASSERT_TRUE(find_vma_procmap(procmap, ptr));
>+ ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr);
>+ ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr + 10 * page_size);
>+}
>+
>+TEST_F(merge, mprotect_unfaulted_right)
>+{
>+ unsigned int page_size = self->page_size;
>+ char *carveout = self->carveout;
>+ struct procmap_fd *procmap = &self->procmap;
>+ char *ptr;
>+
>+ /*
>+ * |-----------------------|
>+ * | unfaulted |
>+ * |-----------------------|
>+ */
>+ ptr = mmap(&carveout[page_size], 10 * page_size, PROT_READ | PROT_WRITE,
>+ MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
>+ ASSERT_NE(ptr, MAP_FAILED);
>+ /*
>+ * Now make the last 5 pages read-only, splitting the VMA:
>+ *
>+ * RW RO
>+ * |-----------|-----------|
>+ * | unfaulted | unfaulted |
>+ * |-----------|-----------|
>+ */
>+ ASSERT_EQ(mprotect(&ptr[5 * page_size], 5 * page_size, PROT_READ), 0);
>+ /*
>+ * Fault in the first of the first 5 pages so it gets an anon_vma and
>+ * thus the whole VMA becomes 'faulted':
>+ *
>+ * RW RO
>+ * |-----------|-----------|
>+ * | faulted | unfaulted |
>+ * |-----------|-----------|
>+ */
>+ ptr[0] = 'x';
>+ /*
>+ * Now mprotect() the RW region read-only, we should merge:
>+ *
>+ * RO
>+ * |-----------------------|
>+ * | faulted |
>+ * |-----------------------|
>+ */
>+ ASSERT_EQ(mprotect(ptr, 5 * page_size, PROT_READ), 0);
>+
>+ /* Assert that the merge succeeded using PROCMAP_QUERY. */
>+ ASSERT_TRUE(find_vma_procmap(procmap, ptr));
>+ ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr);
>+ ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr + 10 * page_size);
>+}
>+
>+TEST_F(merge, mprotect_unfaulted_both)
>+{
>+ unsigned int page_size = self->page_size;
>+ char *carveout = self->carveout;
>+ struct procmap_fd *procmap = &self->procmap;
>+ char *ptr;
>+
>+ /*
>+ * |-----------------------|
>+ * | unfaulted |
>+ * |-----------------------|
>+ */
>+ ptr = mmap(&carveout[2 * page_size], 9 * page_size, PROT_READ | PROT_WRITE,
>+ MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
>+ ASSERT_NE(ptr, MAP_FAILED);
>+ /*
>+ * Now make the first and last 3 pages read-only, splitting the VMA:
>+ *
>+ * RO RW RO
>+ * |-----------|-----------|-----------|
>+ * | unfaulted | unfaulted | unfaulted |
>+ * |-----------|-----------|-----------|
>+ */
>+ ASSERT_EQ(mprotect(ptr, 3 * page_size, PROT_READ), 0);
>+ ASSERT_EQ(mprotect(&ptr[6 * page_size], 3 * page_size, PROT_READ), 0);
>+ /*
>+ * Fault in the first of the middle 3 pages so it gets an anon_vma and
>+ * thus the whole VMA becomes 'faulted':
>+ *
>+ * RO RW RO
>+ * |-----------|-----------|-----------|
>+ * | unfaulted | faulted | unfaulted |
>+ * |-----------|-----------|-----------|
>+ */
>+ ptr[3 * page_size] = 'x';
>+ /*
>+ * Now mprotect() the RW region read-only, we should merge:
>+ *
>+ * RO
>+ * |-----------------------|
>+ * | faulted |
>+ * |-----------------------|
>+ */
>+ ASSERT_EQ(mprotect(&ptr[3 * page_size], 3 * page_size, PROT_READ), 0);
>+
>+ /* Assert that the merge succeeded using PROCMAP_QUERY. */
>+ ASSERT_TRUE(find_vma_procmap(procmap, ptr));
>+ ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr);
>+ ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr + 9 * page_size);
>+}
>+
>+TEST_F(merge, mprotect_faulted_left_unfaulted_right)
>+{
>+ unsigned int page_size = self->page_size;
>+ char *carveout = self->carveout;
>+ struct procmap_fd *procmap = &self->procmap;
>+ char *ptr;
>+
>+ /*
>+ * |-----------------------|
>+ * | unfaulted |
>+ * |-----------------------|
>+ */
>+ ptr = mmap(&carveout[2 * page_size], 9 * page_size, PROT_READ | PROT_WRITE,
>+ MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
>+ ASSERT_NE(ptr, MAP_FAILED);
>+ /*
>+ * Now make the last 3 pages read-only, splitting the VMA:
>+ *
>+ * RW RO
>+ * |-----------------------|-----------|
>+ * | unfaulted | unfaulted |
>+ * |-----------------------|-----------|
>+ */
>+ ASSERT_EQ(mprotect(&ptr[6 * page_size], 3 * page_size, PROT_READ), 0);
>+ /*
>+ * Fault in the first of the first 6 pages so it gets an anon_vma and
>+ * thus the whole VMA becomes 'faulted':
>+ *
>+ * RW RO
>+ * |-----------------------|-----------|
>+ * | unfaulted | unfaulted |
^^^
According to your previous comment convention, the comment here describe the
result after ptr[0] = 'x'.
faulted is the correct description here?
>+ * |-----------------------|-----------|
>+ */
>+ ptr[0] = 'x';
>+ /*
>+ * Now make the first 3 pages read-only, splitting the VMA:
>+ *
>+ * RO RW RO
>+ * |-----------|-----------|-----------|
>+ * | faulted | faulted | unfaulted |
>+ * |-----------|-----------|-----------|
>+ */
>+ ASSERT_EQ(mprotect(ptr, 3 * page_size, PROT_READ), 0);
>+ /*
>+ * Now mprotect() the RW region read-only, we should merge:
>+ *
>+ * RO
>+ * |-----------------------|
>+ * | faulted |
>+ * |-----------------------|
>+ */
>+ ASSERT_EQ(mprotect(&ptr[3 * page_size], 3 * page_size, PROT_READ), 0);
>+
>+ /* Assert that the merge succeeded using PROCMAP_QUERY. */
>+ ASSERT_TRUE(find_vma_procmap(procmap, ptr));
>+ ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr);
>+ ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr + 9 * page_size);
>+}
>+
>+TEST_F(merge, mprotect_unfaulted_left_faulted_right)
>+{
>+ unsigned int page_size = self->page_size;
>+ char *carveout = self->carveout;
>+ struct procmap_fd *procmap = &self->procmap;
>+ char *ptr;
>+
>+ /*
>+ * |-----------------------|
>+ * | unfaulted |
>+ * |-----------------------|
>+ */
>+ ptr = mmap(&carveout[2 * page_size], 9 * page_size, PROT_READ | PROT_WRITE,
>+ MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
>+ ASSERT_NE(ptr, MAP_FAILED);
>+ /*
>+ * Now make the first 3 pages read-only, splitting the VMA:
>+ *
>+ * RO RW
>+ * |-----------|-----------------------|
>+ * | unfaulted | unfaulted |
>+ * |-----------|-----------------------|
>+ */
>+ ASSERT_EQ(mprotect(ptr, 3 * page_size, PROT_READ), 0);
>+ /*
>+ * FAult in the first of the last 6 pages so it gets an anon_vma and
^
s/A/a/
>+ * thus the whole VMA becomes 'faulted':
>+ *
>+ * RO RW
>+ * |-----------|-----------------------|
>+ * | unfaulted | faulted |
>+ * |-----------|-----------------------|
>+ */
>+ ptr[3 * page_size] = 'x';
>+ /*
>+ * Now make the last 3 pages read-only, splitting the VMA:
>+ *
>+ * RO RW RO
>+ * |-----------|-----------|-----------|
>+ * | unfaulted | faulted | faulted |
>+ * |-----------|-----------|-----------|
>+ */
>+ ASSERT_EQ(mprotect(&ptr[6 * page_size], 3 * page_size, PROT_READ), 0);
>+ /*
>+ * Now mprotect() the RW region read-only, we should merge:
>+ *
>+ * RO
>+ * |-----------------------|
>+ * | faulted |
>+ * |-----------------------|
>+ */
>+ ASSERT_EQ(mprotect(&ptr[3 * page_size], 3 * page_size, PROT_READ), 0);
>+
>+ /* Assert that the merge succeeded using PROCMAP_QUERY. */
>+ ASSERT_TRUE(find_vma_procmap(procmap, ptr));
>+ ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr);
>+ ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr + 9 * page_size);
>+}
>+
>+TEST_F(merge, forked_target_vma)
>+{
>+ unsigned int page_size = self->page_size;
>+ char *carveout = self->carveout;
>+ struct procmap_fd *procmap = &self->procmap;
>+ pid_t pid;
>+ char *ptr, *ptr2;
>+ int i;
>+
>+ /*
>+ * |-----------|
>+ * | unfaulted |
>+ * |-----------|
>+ */
>+ ptr = mmap(&carveout[page_size], 5 * page_size, PROT_READ | PROT_WRITE,
>+ MAP_ANON | MAP_PRIVATE | MAP_FIXED, -1, 0);
>+ ASSERT_NE(ptr, MAP_FAILED);
>+
>+ /*
>+ * Fault in process.
>+ *
>+ * |-----------|
>+ * | faulted |
>+ * |-----------|
>+ */
>+ ptr[0] = 'x';
>+
>+ pid = fork();
>+ ASSERT_NE(pid, -1);
>+
>+ if (pid != 0) {
>+ wait(NULL);
>+ return;
>+ }
>+
>+ /* Child process below: */
>+
>+ /* Reopen for child. */
>+ ASSERT_EQ(close_procmap(&self->procmap), 0);
>+ ASSERT_EQ(open_self_procmap(&self->procmap), 0);
>+
>+ /* unCOWing everything does not cause the AVC to go away. */
^^^
Before ptr[i] = 'x', we have unCOWed pages in vma. What we are doing here is
COWing, right?
>+ for (i = 0; i < 5 * page_size; i += page_size)
>+ ptr[i] = 'x';
>+
>+ /*
>+ * Map in adjacent VMA in child.
>+ *
>+ * forked
>+ * |-----------|-----------|
>+ * | faulted | unfaulted |
>+ * |-----------|-----------|
>+ * ptr ptr2
>+ */
>+ ptr2 = mmap(&ptr[5 * page_size], 5 * page_size, PROT_READ | PROT_WRITE,
>+ MAP_ANON | MAP_PRIVATE | MAP_FIXED, -1, 0);
>+ ASSERT_NE(ptr2, MAP_FAILED);
>+
>+ /* Make sure not merged. */
>+ ASSERT_TRUE(find_vma_procmap(procmap, ptr));
>+ ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr);
>+ ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr + 5 * page_size);
>+}
>+
>+TEST_F(merge, forked_source_vma)
>+{
>+ unsigned int page_size = self->page_size;
>+ char *carveout = self->carveout;
>+ struct procmap_fd *procmap = &self->procmap;
>+ pid_t pid;
>+ char *ptr, *ptr2;
>+ int i;
>+
>+ /*
>+ * |............|-----------|
>+ * | <unmapped> | unfaulted |
>+ * |............|-----------|
I am not sure "unmapped" is correct here. The range has already been mapped by
FIXTURE_SETUP(merge).
>+ */
>+ ptr = mmap(&carveout[page_size], 5 * page_size, PROT_READ | PROT_WRITE,
>+ MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
>+ ASSERT_NE(ptr, MAP_FAILED);
>+
>+ /*
>+ * Fault in process.
>+ *
>+ * |............||-----------|
>+ * | <unmapped> || faulted |
>+ * |............||-----------|
^
Extra line here?
>+ */
>+ ptr[0] = 'x';
>+
>+ pid = fork();
>+ ASSERT_NE(pid, -1);
>+
>+ if (pid != 0) {
>+ wait(NULL);
>+ return;
>+ }
>+
>+ /* Child process below: */
>+
>+ /* Reopen for child. */
>+ ASSERT_EQ(close_procmap(&self->procmap), 0);
>+ ASSERT_EQ(open_self_procmap(&self->procmap), 0);
>+
>+ /* unCOWing everything does not cause the AVC to go away. */
Same as above.
>+ for (i = 0; i < 5 * page_size; i += page_size)
>+ ptr[i] = 'x';
>+
>+ /*
>+ * Map in adjacent VMA in child, ptr2 before ptr, but incompatible.
>+ *
>+ * RWX forked RW
>+ * |-----------|-----------|
>+ * | unfaulted | faulted |
>+ * |-----------|-----------|
>+ * ptr2 ptr
>+ */
>+ ptr2 = mmap(&carveout[6 * page_size], 5 * page_size, PROT_READ | PROT_WRITE | PROT_EXEC,
>+ MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
>+ ASSERT_NE(ptr2, MAP_FAILED);
>+
I think pt2 is after ptr. Do I miss something?
>+
>+ /* Make sure not merged. */
>+ ASSERT_TRUE(find_vma_procmap(procmap, ptr2));
>+ ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr2);
>+ ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr2 + 5 * page_size);
>+
>+ /*
>+ * Now mprotect forked region to RWX so it becomes the source for the
>+ * merge to unfaulted region:
>+ *
>+ * RWX forked RWX
>+ * |-----------|-----------|
>+ * | unfaulted | faulted |
>+ * |-----------|-----------|
>+ * ptr2 ptr
>+ */
>+ ASSERT_EQ(mprotect(ptr, 5 * page_size, PROT_READ | PROT_WRITE | PROT_EXEC), 0);
>+ /* Again, make sure not merged. */
>+ ASSERT_TRUE(find_vma_procmap(procmap, ptr2));
>+ ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr2);
>+ ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr2 + 5 * page_size);
>+}
>+
>+TEST_HARNESS_MAIN
>diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
>index 9aff33b10999..0b2f8bb91433 100755
>--- a/tools/testing/selftests/mm/run_vmtests.sh
>+++ b/tools/testing/selftests/mm/run_vmtests.sh
>@@ -421,6 +421,8 @@ CATEGORY="madv_guard" run_test ./guard-regions
> # MADV_POPULATE_READ and MADV_POPULATE_WRITE tests
> CATEGORY="madv_populate" run_test ./madv_populate
>
>+CATEGORY="vma_merge" run_test ./merge
>+
./run_vmtests.sh -h would show a list of categories.
How about add the new category "vma_merge" into the list.
> if [ -x ./memfd_secret ]
> then
> (echo 0 > /proc/sys/kernel/yama/ptrace_scope 2>&1) | tap_prefix
>--
>2.48.1
>
--
Wei Yang
Help you, Help me
Powered by blists - more mailing lists