linux-kernel - Re: [PATCH v1] selftests/mm: add simple VM_PFNMAP tests based on mmap'ing /dev/mem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4efa9948-a523-4597-baa4-c36d18a658b0@redhat.com>
Date: Fri, 9 May 2025 12:18:51 +0200
From: David Hildenbrand <david@...hat.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
 linux-kselftest@...r.kernel.org, Andrew Morton <akpm@...ux-foundation.org>,
 Shuah Khan <shuah@...nel.org>, Ingo Molnar <mingo@...hat.com>,
 Peter Xu <peterx@...hat.com>
Subject: Re: [PATCH v1] selftests/mm: add simple VM_PFNMAP tests based on
 mmap'ing /dev/mem

On 09.05.25 11:49, Lorenzo Stoakes wrote:
> On Fri, May 09, 2025 at 12:20:41AM +0200, David Hildenbrand wrote:
>> Let's test some basic functionality using /dev/mem. These tests will
>> implicitly cover some PAT (Page Attribute Handling) handling on x86.
> 
> Ah this is really nice thanks for this!

Thanks for your review!

> 
>> 	ok 14 mprotect(PROT_NONE)
>> 	ok 15 SIGSEGV expected
>> 	ok 16 mprotect(PROT_READ)
>> 	ok 17 SIGSEGV not expected
>> 	ok 18 fork()
>> 	ok 19 SIGSEGV in child not expected
>> 	# Totals: pass:19 fail:0 xfail:0 xpass:0 skip:0 error:0
> 
> It'd be good to assert that merging doesn't work for VM_PFNMAP, though hm
> one could argue that's not hugely useful as it's trivially implemented.
> 
> But I guess anything like that should live in merge.c.

I assume we'd need is_range_mapped() from mremap_tests.c.

Something for another day :)

[...]

>> +static void signal_handler(int sig)
>> +{
>> +	if (sig == SIGSEGV)
>> +		siglongjmp(env, 1);
>> +	siglongjmp(env, 2);
>> +}
> 
> Hm, wouldn't it be better to only catch these only if you specifically
> meant to catch a signal?

I had that, but got tired about the repeated register + unregister, 
after all I really don't want to spend a lot more time on this.

> You can see what I did in guard-regions.c for an example (sorry, I'm sure
> you know exactly how the thing works, just I mean for an easy reminder :P)
> 

Again, time is the limit. But let me see if I can get something done in 
a reasonable timeframe.

>> +
>> +static void sense_support(void)
>> +{
> 
> See below comment about the kselftest_harness, but with that you can
> literally declare fixture setups/teardowns very nicely :) You can also
> mmap() these 2 pages and munmap() them afterwards straightforwardly.
> 
>> +	char *addr, tmp;
>> +	int ret;
>> +
>> +	dev_mem_fd = open("/dev/mem", O_RDONLY);
>> +	if (dev_mem_fd < 0)
>> +		ksft_exit_skip("Cannot open '/dev/mem': %s\n", strerror(errno));
> 
> Hm skip, or failure? Skip implies it's expected right? I suppose it's
> possible a system might be setup without this...

Try as non-root or on a lockdowned system :)

> 
>> +
>> +	/* We'll require the first two pages throughout our tests ... */
>> +	addr = mmap(0, pagesize * 2, PROT_READ, MAP_SHARED, dev_mem_fd, 0);
>> +	if (addr == MAP_FAILED)
>> +		ksft_exit_skip("Cannot mmap '/dev/mem'");
>> +
>> +	/* ... and want to be able to read from them. */
>> +	ret = sigsetjmp(env, 1);
>> +	if (!ret) {
>> +		tmp = *addr + *(addr + pagesize);
>> +		asm volatile("" : "+r" (tmp));
> 
> Is this not pretty much equivalent to a volatile read where you're forcing
> the compiler to not optimise this unused thing away? In guard-regions I set:
> 
> #define FORCE_READ(x) (*(volatile typeof(x) *)x)
> 
> For this purpose, which would make this:
> 
> FORCE_READ(addr);
> FORCE_READ(&addr[pagesize]);

Hmmm, a compiler might be allowed to optimize out a volatile read.

> 
>> +	}
>> +	if (ret)
>> +		ksft_exit_skip("Cannot read-access mmap'ed '/dev/mem'");
> 
> Why are we returning 1 or 2 if we don't differentiate it here?

Copy-and-paste. As we are not registering for SIGBUS, we can just return 1.

> 
>> +
>> +	munmap(addr, pagesize * 2);
>> +}
>> +
>> +static void test_madvise(void)
>> +{
>> +#define INIT_ADVICE(nr) { nr, #nr}
>> +	const struct {
>> +		int nr;
>> +		const char *name;
>> +	} advices[] = {
>> +		INIT_ADVICE(MADV_DONTNEED),
>> +		INIT_ADVICE(MADV_DONTNEED_LOCKED),
>> +		INIT_ADVICE(MADV_FREE),
>> +		INIT_ADVICE(MADV_WIPEONFORK),
>> +		INIT_ADVICE(MADV_COLD),
>> +		INIT_ADVICE(MADV_PAGEOUT),
>> +		INIT_ADVICE(MADV_POPULATE_READ),
>> +		INIT_ADVICE(MADV_POPULATE_WRITE),
>> +	};
>> +	char *addr;
>> +	int ret, i;
>> +
>> +	addr = mmap(0, pagesize, PROT_READ, MAP_SHARED, dev_mem_fd, 0);
> 
> Nit (same for all mmap() calls) shouldn't this first parameter be NULL, by
> convention? I mean not a big deal obviously :)

Yes.

> 
>> +	if (addr == MAP_FAILED)
>> +		ksft_exit_fail_msg("mmap() failed: %s\n", strerror(errno));
>> +
>> +	/* All these advices must be rejected. */
>> +	for (i = 0; i < ARRAY_SIZE(advices); i++) {
>> +		ret = madvise(addr, pagesize, advices[i].nr);
>> +		ksft_test_result(ret && errno == EINVAL,
>> +				 "madvise(%s) should be disallowed\n",
>> +				 advices[i].name);
>> +	}
>> +
>> +	munmap(addr, pagesize);
>> +}
>> +
>> +static void test_munmap_splitting(void)
>> +{
>> +	char *addr1, *addr2;
>> +	int ret;
>> +
>> +	addr1 = mmap(0, pagesize * 2, PROT_READ, MAP_SHARED, dev_mem_fd, 0);
>> +	if (addr1 == MAP_FAILED)
>> +		ksft_exit_fail_msg("mmap() failed: %s\n", strerror(errno));
>> +
>> +	/* Unmap the first pages. */
> 
> NIT: pages -> page.

Ack.

> 
>> +	ret = munmap(addr1, pagesize);
>> +	ksft_test_result(!ret, "munmap() splitting\n");
>> +
>> +	/* Remap the first page while the second page is still mapped. */
>> +	addr2 = mmap(0, pagesize, PROT_READ, MAP_SHARED, dev_mem_fd, 0);
>> +	ksft_test_result(addr2 != MAP_FAILED, "mmap() after splitting\n");
> 
> Hm not sure what the assertion is here per se, that we can munmap() partial
> bits of the VMA? It'd be pretty weird if we couldn't though?
 > > If it's that we don't get a merge when we remap, we're not really 
checking
> that, but you actually can, as I added an API to vm_util for this using
> PROCMAP_QUERY (very handy tool actually - binary version of /proc/smaps).

I don't care about merging tests (I'll leave that to you :P ).

This is a PAT test for upcoming changes where partial unmap can leave 
the original region reserved. Making sure that re-mapping with the 
pending reservation still works.

>> +
>> +	if (addr2 != MAP_FAILED)
>> +		munmap(addr2, pagesize);
>> +	if (!ret)
>> +		munmap(addr1 + pagesize, pagesize);
>> +	else
>> +		munmap(addr1, pagesize * 2);
> 
> There's no need for this dance, you can just munmap() away, it tolerates
> gaps and multiple VMAs.

Yeah, I know. I was not sure if the ksft_test_result() in between might 
allocate memory and consume that area.

> 
>> +}
>> +
>> +static void test_mremap_fixed(void)
>> +{
>> +	char *addr, *new_addr, *ret;
>> +
>> +	addr = mmap(0, pagesize * 2, PROT_READ, MAP_SHARED, dev_mem_fd, 0);
>> +	if (addr == MAP_FAILED)
>> +		ksft_exit_fail_msg("mmap() failed: %s\n", strerror(errno));
>> +
>> +	/* Reserve a destination area. */
>> +	new_addr = mmap(0, pagesize * 2, PROT_READ, MAP_ANON | MAP_PRIVATE, -1, 0);
>> +	if (new_addr == MAP_FAILED)
>> +		ksft_exit_fail_msg("mmap() failed: %s\n", strerror(errno));
>> +
>> +	/* mremap() over our destination. */
>> +	ret = mremap(addr, pagesize * 2, pagesize * 2,
>> +		     MREMAP_FIXED | MREMAP_MAYMOVE, new_addr);
>> +	ksft_test_result(ret == new_addr, "mremap(MREMAP_FIXED)\n");
>> +	if (ret != new_addr)
>> +		munmap(new_addr, pagesize * 2);
> 
> This could only be an error code, and this will fail right?
> 
> MREMAP_FIXED is 'do or die' at the new address, not hinting. If there's
> anything already mapped there it goes a bye bye.
> 
> So again, we could just have a standard munmap(), and this lends itself
> well to a FIXTURE_SETUP()/FIXTURE_TEARDOWN() :P

I'm afraid I cannot spend much more time on these tests :P But let me 
try for a couple of minutes.

> 
>> +	munmap(addr, pagesize * 2);
>> +}
>> +
>> +static void test_mremap_shrinking(void)
>> +{
>> +	char *addr, *ret;
>> +
>> +	addr = mmap(0, pagesize * 2, PROT_READ, MAP_SHARED, dev_mem_fd, 0);
>> +	if (addr == MAP_FAILED)
>> +		ksft_exit_fail_msg("mmap() failed: %s\n", strerror(errno));
>> +
>> +	/* Shrinking is expected to work. */
>> +	ret = mremap(addr, pagesize * 2, pagesize, 0);
>> +	ksft_test_result(ret == addr, "mremap() shrinking\n");
>> +	if (ret != addr)
>> +		munmap(addr, pagesize * 2);
>> +	else
>> +		munmap(addr, pagesize);
> 
> I think we're safe to just munmap() as usual here :) (it's nitty but I'm
> trying to make the case for teardown again of course :P)

Same reasoning as above regarding ksft_test_result().

> 
>> +}
>> +
>> +static void test_mremap_growing(void)
>> +{
>> +	char *addr, *ret;
>> +
>> +	addr = mmap(0, pagesize, PROT_READ, MAP_SHARED, dev_mem_fd, 0);
>> +	if (addr == MAP_FAILED)
>> +		ksft_exit_fail_msg("mmap() failed: %s\n", strerror(errno));
>> +
>> +	/* Growing is not expected to work. */
> 
> God imagine if we did allow it... what hell would it be to figure out how
> to do this correctly in all cases :P

:)

> 
>> +	ret = mremap(addr, pagesize, pagesize * 2, MREMAP_MAYMOVE);
>> +	ksft_test_result(ret == MAP_FAILED,
>> +			 "mremap() growing should be disallowed\n");
>> +	if (ret == MAP_FAILED)
>> +		munmap(addr, pagesize);
>> +	else
>> +		munmap(ret, pagesize * 2);
> 
> This is a bit cautious, for a world where we do lose our minds and allow
> this? :)

Yeah, went back and forth with this error cleanup shit.

> 
>> +}
>> +
>> +static void test_mprotect(void)
>> +{
>> +	char *addr, tmp;
>> +	int ret;
>> +
>> +	addr = mmap(0, pagesize, PROT_READ, MAP_SHARED, dev_mem_fd, 0);
>> +	if (addr == MAP_FAILED)
>> +		ksft_exit_fail_msg("mmap() failed: %s\n", strerror(errno));
>> +
>> +	/* With PROT_NONE, read access must result in SIGSEGV. */
>> +	ret = mprotect(addr, pagesize, PROT_NONE);
>> +	ksft_test_result(!ret, "mprotect(PROT_NONE)\n");
>> +
>> +	ret = sigsetjmp(env, 1);
>> +	if (!ret) {
>> +		tmp = *addr;
>> +		asm volatile("" : "+r" (tmp));
>> +	}
> 
> This code is duplicated, we definitely want to abstract it.

Probably yes.

> 
>> +	ksft_test_result(ret == 1, "SIGSEGV expected\n");
> 
> Hmm, what exactly are we testing here though? I mean PROT_NONE will be a
> failed access for _any_ kind of memory? Is this really worthwhile? Maybe
> better to mprotect() as PROT_NONE to start then mprotect() to PROT_READ.
 > > But I'm not sure what that really tests? Is it a PAT-specific thing? It
> seems if this is broken then the mapping code is more generally broken
> beyond just VM_PFNMAP mappings right?

Rationale was to test the !vm_normal_folio() code paths that are not 
covered by "ordinary" mprotect (except the shared zeropage). But there 
should indeed only be such a check on the prot_numa code path, so I can 
just drop this test.

[...]

>> +int main(int argc, char **argv)
>> +{
>> +	int err;
>> +
>> +	ksft_print_header();
>> +	ksft_set_plan(19);
> 
> I know it's kind of nitpicky, but I really hate this sort of magic number
> and so on. You don't actually need any of this, the kselftest_harness.h is
> _really_ powerful, and makes for much much more readable and standardised
> test code.
> 
> You can look at guard-regions.c in the test code (though there's some
> complexity there because I use 'variants') or the merge.c test code
> (simpler) for straight-forward examples.
> 
> I won't block this change on this however, I don't want to be a pain and
> you're adding very important tests here, but it'd be really nice if you did
> use that :>)

Yeah, let me explore that real quick, thanks!

-- 
Cheers,

David / dhildenb