[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4efa9948-a523-4597-baa4-c36d18a658b0@redhat.com>
Date: Fri, 9 May 2025 12:18:51 +0200
From: David Hildenbrand <david@...hat.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
linux-kselftest@...r.kernel.org, Andrew Morton <akpm@...ux-foundation.org>,
Shuah Khan <shuah@...nel.org>, Ingo Molnar <mingo@...hat.com>,
Peter Xu <peterx@...hat.com>
Subject: Re: [PATCH v1] selftests/mm: add simple VM_PFNMAP tests based on
mmap'ing /dev/mem
On 09.05.25 11:49, Lorenzo Stoakes wrote:
> On Fri, May 09, 2025 at 12:20:41AM +0200, David Hildenbrand wrote:
>> Let's test some basic functionality using /dev/mem. These tests will
>> implicitly cover some PAT (Page Attribute Handling) handling on x86.
>
> Ah this is really nice thanks for this!
Thanks for your review!
>
>> ok 14 mprotect(PROT_NONE)
>> ok 15 SIGSEGV expected
>> ok 16 mprotect(PROT_READ)
>> ok 17 SIGSEGV not expected
>> ok 18 fork()
>> ok 19 SIGSEGV in child not expected
>> # Totals: pass:19 fail:0 xfail:0 xpass:0 skip:0 error:0
>
> It'd be good to assert that merging doesn't work for VM_PFNMAP, though hm
> one could argue that's not hugely useful as it's trivially implemented.
>
> But I guess anything like that should live in merge.c.
I assume we'd need is_range_mapped() from mremap_tests.c.
Something for another day :)
[...]
>> +static void signal_handler(int sig)
>> +{
>> + if (sig == SIGSEGV)
>> + siglongjmp(env, 1);
>> + siglongjmp(env, 2);
>> +}
>
> Hm, wouldn't it be better to only catch these only if you specifically
> meant to catch a signal?
I had that, but got tired about the repeated register + unregister,
after all I really don't want to spend a lot more time on this.
> You can see what I did in guard-regions.c for an example (sorry, I'm sure
> you know exactly how the thing works, just I mean for an easy reminder :P)
>
Again, time is the limit. But let me see if I can get something done in
a reasonable timeframe.
>> +
>> +static void sense_support(void)
>> +{
>
> See below comment about the kselftest_harness, but with that you can
> literally declare fixture setups/teardowns very nicely :) You can also
> mmap() these 2 pages and munmap() them afterwards straightforwardly.
>
>> + char *addr, tmp;
>> + int ret;
>> +
>> + dev_mem_fd = open("/dev/mem", O_RDONLY);
>> + if (dev_mem_fd < 0)
>> + ksft_exit_skip("Cannot open '/dev/mem': %s\n", strerror(errno));
>
> Hm skip, or failure? Skip implies it's expected right? I suppose it's
> possible a system might be setup without this...
Try as non-root or on a lockdowned system :)
>
>> +
>> + /* We'll require the first two pages throughout our tests ... */
>> + addr = mmap(0, pagesize * 2, PROT_READ, MAP_SHARED, dev_mem_fd, 0);
>> + if (addr == MAP_FAILED)
>> + ksft_exit_skip("Cannot mmap '/dev/mem'");
>> +
>> + /* ... and want to be able to read from them. */
>> + ret = sigsetjmp(env, 1);
>> + if (!ret) {
>> + tmp = *addr + *(addr + pagesize);
>> + asm volatile("" : "+r" (tmp));
>
> Is this not pretty much equivalent to a volatile read where you're forcing
> the compiler to not optimise this unused thing away? In guard-regions I set:
>
> #define FORCE_READ(x) (*(volatile typeof(x) *)x)
>
> For this purpose, which would make this:
>
> FORCE_READ(addr);
> FORCE_READ(&addr[pagesize]);
Hmmm, a compiler might be allowed to optimize out a volatile read.
>
>> + }
>> + if (ret)
>> + ksft_exit_skip("Cannot read-access mmap'ed '/dev/mem'");
>
> Why are we returning 1 or 2 if we don't differentiate it here?
Copy-and-paste. As we are not registering for SIGBUS, we can just return 1.
>
>> +
>> + munmap(addr, pagesize * 2);
>> +}
>> +
>> +static void test_madvise(void)
>> +{
>> +#define INIT_ADVICE(nr) { nr, #nr}
>> + const struct {
>> + int nr;
>> + const char *name;
>> + } advices[] = {
>> + INIT_ADVICE(MADV_DONTNEED),
>> + INIT_ADVICE(MADV_DONTNEED_LOCKED),
>> + INIT_ADVICE(MADV_FREE),
>> + INIT_ADVICE(MADV_WIPEONFORK),
>> + INIT_ADVICE(MADV_COLD),
>> + INIT_ADVICE(MADV_PAGEOUT),
>> + INIT_ADVICE(MADV_POPULATE_READ),
>> + INIT_ADVICE(MADV_POPULATE_WRITE),
>> + };
>> + char *addr;
>> + int ret, i;
>> +
>> + addr = mmap(0, pagesize, PROT_READ, MAP_SHARED, dev_mem_fd, 0);
>
> Nit (same for all mmap() calls) shouldn't this first parameter be NULL, by
> convention? I mean not a big deal obviously :)
Yes.
>
>> + if (addr == MAP_FAILED)
>> + ksft_exit_fail_msg("mmap() failed: %s\n", strerror(errno));
>> +
>> + /* All these advices must be rejected. */
>> + for (i = 0; i < ARRAY_SIZE(advices); i++) {
>> + ret = madvise(addr, pagesize, advices[i].nr);
>> + ksft_test_result(ret && errno == EINVAL,
>> + "madvise(%s) should be disallowed\n",
>> + advices[i].name);
>> + }
>> +
>> + munmap(addr, pagesize);
>> +}
>> +
>> +static void test_munmap_splitting(void)
>> +{
>> + char *addr1, *addr2;
>> + int ret;
>> +
>> + addr1 = mmap(0, pagesize * 2, PROT_READ, MAP_SHARED, dev_mem_fd, 0);
>> + if (addr1 == MAP_FAILED)
>> + ksft_exit_fail_msg("mmap() failed: %s\n", strerror(errno));
>> +
>> + /* Unmap the first pages. */
>
> NIT: pages -> page.
Ack.
>
>> + ret = munmap(addr1, pagesize);
>> + ksft_test_result(!ret, "munmap() splitting\n");
>> +
>> + /* Remap the first page while the second page is still mapped. */
>> + addr2 = mmap(0, pagesize, PROT_READ, MAP_SHARED, dev_mem_fd, 0);
>> + ksft_test_result(addr2 != MAP_FAILED, "mmap() after splitting\n");
>
> Hm not sure what the assertion is here per se, that we can munmap() partial
> bits of the VMA? It'd be pretty weird if we couldn't though?
> > If it's that we don't get a merge when we remap, we're not really
checking
> that, but you actually can, as I added an API to vm_util for this using
> PROCMAP_QUERY (very handy tool actually - binary version of /proc/smaps).
I don't care about merging tests (I'll leave that to you :P ).
This is a PAT test for upcoming changes where partial unmap can leave
the original region reserved. Making sure that re-mapping with the
pending reservation still works.
>> +
>> + if (addr2 != MAP_FAILED)
>> + munmap(addr2, pagesize);
>> + if (!ret)
>> + munmap(addr1 + pagesize, pagesize);
>> + else
>> + munmap(addr1, pagesize * 2);
>
> There's no need for this dance, you can just munmap() away, it tolerates
> gaps and multiple VMAs.
Yeah, I know. I was not sure if the ksft_test_result() in between might
allocate memory and consume that area.
>
>> +}
>> +
>> +static void test_mremap_fixed(void)
>> +{
>> + char *addr, *new_addr, *ret;
>> +
>> + addr = mmap(0, pagesize * 2, PROT_READ, MAP_SHARED, dev_mem_fd, 0);
>> + if (addr == MAP_FAILED)
>> + ksft_exit_fail_msg("mmap() failed: %s\n", strerror(errno));
>> +
>> + /* Reserve a destination area. */
>> + new_addr = mmap(0, pagesize * 2, PROT_READ, MAP_ANON | MAP_PRIVATE, -1, 0);
>> + if (new_addr == MAP_FAILED)
>> + ksft_exit_fail_msg("mmap() failed: %s\n", strerror(errno));
>> +
>> + /* mremap() over our destination. */
>> + ret = mremap(addr, pagesize * 2, pagesize * 2,
>> + MREMAP_FIXED | MREMAP_MAYMOVE, new_addr);
>> + ksft_test_result(ret == new_addr, "mremap(MREMAP_FIXED)\n");
>> + if (ret != new_addr)
>> + munmap(new_addr, pagesize * 2);
>
> This could only be an error code, and this will fail right?
>
> MREMAP_FIXED is 'do or die' at the new address, not hinting. If there's
> anything already mapped there it goes a bye bye.
>
> So again, we could just have a standard munmap(), and this lends itself
> well to a FIXTURE_SETUP()/FIXTURE_TEARDOWN() :P
I'm afraid I cannot spend much more time on these tests :P But let me
try for a couple of minutes.
>
>> + munmap(addr, pagesize * 2);
>> +}
>> +
>> +static void test_mremap_shrinking(void)
>> +{
>> + char *addr, *ret;
>> +
>> + addr = mmap(0, pagesize * 2, PROT_READ, MAP_SHARED, dev_mem_fd, 0);
>> + if (addr == MAP_FAILED)
>> + ksft_exit_fail_msg("mmap() failed: %s\n", strerror(errno));
>> +
>> + /* Shrinking is expected to work. */
>> + ret = mremap(addr, pagesize * 2, pagesize, 0);
>> + ksft_test_result(ret == addr, "mremap() shrinking\n");
>> + if (ret != addr)
>> + munmap(addr, pagesize * 2);
>> + else
>> + munmap(addr, pagesize);
>
> I think we're safe to just munmap() as usual here :) (it's nitty but I'm
> trying to make the case for teardown again of course :P)
Same reasoning as above regarding ksft_test_result().
>
>> +}
>> +
>> +static void test_mremap_growing(void)
>> +{
>> + char *addr, *ret;
>> +
>> + addr = mmap(0, pagesize, PROT_READ, MAP_SHARED, dev_mem_fd, 0);
>> + if (addr == MAP_FAILED)
>> + ksft_exit_fail_msg("mmap() failed: %s\n", strerror(errno));
>> +
>> + /* Growing is not expected to work. */
>
> God imagine if we did allow it... what hell would it be to figure out how
> to do this correctly in all cases :P
:)
>
>> + ret = mremap(addr, pagesize, pagesize * 2, MREMAP_MAYMOVE);
>> + ksft_test_result(ret == MAP_FAILED,
>> + "mremap() growing should be disallowed\n");
>> + if (ret == MAP_FAILED)
>> + munmap(addr, pagesize);
>> + else
>> + munmap(ret, pagesize * 2);
>
> This is a bit cautious, for a world where we do lose our minds and allow
> this? :)
Yeah, went back and forth with this error cleanup shit.
>
>> +}
>> +
>> +static void test_mprotect(void)
>> +{
>> + char *addr, tmp;
>> + int ret;
>> +
>> + addr = mmap(0, pagesize, PROT_READ, MAP_SHARED, dev_mem_fd, 0);
>> + if (addr == MAP_FAILED)
>> + ksft_exit_fail_msg("mmap() failed: %s\n", strerror(errno));
>> +
>> + /* With PROT_NONE, read access must result in SIGSEGV. */
>> + ret = mprotect(addr, pagesize, PROT_NONE);
>> + ksft_test_result(!ret, "mprotect(PROT_NONE)\n");
>> +
>> + ret = sigsetjmp(env, 1);
>> + if (!ret) {
>> + tmp = *addr;
>> + asm volatile("" : "+r" (tmp));
>> + }
>
> This code is duplicated, we definitely want to abstract it.
Probably yes.
>
>> + ksft_test_result(ret == 1, "SIGSEGV expected\n");
>
> Hmm, what exactly are we testing here though? I mean PROT_NONE will be a
> failed access for _any_ kind of memory? Is this really worthwhile? Maybe
> better to mprotect() as PROT_NONE to start then mprotect() to PROT_READ.
> > But I'm not sure what that really tests? Is it a PAT-specific thing? It
> seems if this is broken then the mapping code is more generally broken
> beyond just VM_PFNMAP mappings right?
Rationale was to test the !vm_normal_folio() code paths that are not
covered by "ordinary" mprotect (except the shared zeropage). But there
should indeed only be such a check on the prot_numa code path, so I can
just drop this test.
[...]
>> +int main(int argc, char **argv)
>> +{
>> + int err;
>> +
>> + ksft_print_header();
>> + ksft_set_plan(19);
>
> I know it's kind of nitpicky, but I really hate this sort of magic number
> and so on. You don't actually need any of this, the kselftest_harness.h is
> _really_ powerful, and makes for much much more readable and standardised
> test code.
>
> You can look at guard-regions.c in the test code (though there's some
> complexity there because I use 'variants') or the merge.c test code
> (simpler) for straight-forward examples.
>
> I won't block this change on this however, I don't want to be a pain and
> you're adding very important tests here, but it'd be really nice if you did
> use that :>)
Yeah, let me explore that real quick, thanks!
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists