lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <f44fdda0-80da-49a6-a9dd-75b9a46e1f76@lucifer.local>
Date: Sun, 18 Jan 2026 12:58:01 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Dev Jain <dev.jain@....com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
        David Hildenbrand <david@...nel.org>,
        "Liam R . Howlett" <Liam.Howlett@...cle.com>,
        Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>,
        Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        linux-kselftest@...r.kernel.org, Mark Brown <broonie@...nel.org>
Subject: Re: [PATCH] selftests/mm: remove virtual_address_range test

One note - Dev please wrap lines to 75 chars as per standard kernel
practice. It's super hard to read your mail with unwrapped lines, thanks.

On Sun, Jan 18, 2026 at 01:25:25PM +0530, Dev Jain wrote:
>
> On 16/01/26 6:50 pm, Lorenzo Stoakes wrote:
> > This self test is asserting internal implementation details and is highly
> > vulnerable to internal kernel changes as a result.
> >
> > It is currently failing locally from at least v6.17, and it seems that it
> > may have been failing for longer in many configurations/hardware as it
> > skips if e.g. CONFIG_ANON_VMA_NAME is not specified.
>
> True, the test gets skipped for me since the mark_range function was added.

This is underlining the issue with this test.

>
> >
> > With these skips and the fact that run_vmtests.sh won't run the tests in
> > certain configurations it is likely we have simply missed this test being
> > broken in CI for a long while.
> >
> > I have tried multiple versions of these tests and am unable to find a
> > working bisect as previous versions of the test fail also.
>
> Does the test fail for you even for commit 13e860961fd4 ("selftests/mm: virtual_address_range: Switch to ksft_exit_fail_msg").
> I have never observed failure at this.

It fails consistently with everything as I said, I gave up on trying to
bisect it at v6.17.

It's been broken for a whole bunch of commits all over the place so is also
an active bisection hazard.

>
>
> >
> > The tests are essentially mmap()'ing a series of mappings with no hint and
> > asserting what the get_unmapped_area*() functions will come up with, with
> > seemingly few checks for what other mappings may already be in place.
> >
> > It then appears to be mmap()'ing with a hint, and making a series of
> > similar assertions about the internal implementation details of the hinting
> > logic.
>
> The revelation of internal detail starts at 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()").
> All that does is to check whether mmap failure actually means exhaustion. This can reveal bugs in maple tree,
> if it cannot find a 1G chunk in it, even when the gap is present. This is an internal detail which is not
> expected to change - no one reports any breakage (AFAIK, please correct if I am wrong) until after 10 months, at

'Internal detail that is not expected to change' is incorrect - you can have no
expectations about internal implementation details.

Another thing I didn't mention is this test takes a LONG time to run, and we are
already having timeout issues with test runs.

> commit a005145b9c96 ("selftests/mm: virtual_address_range: mmap() without PROT_WRITE"), that too not at the
> gap assertion code - the breakage happens at the while (start_addr + hop < end_addr) chunk of code. In retrospect I should
> not have added this chunk - the purpose was to check whether the VMAs being advertised in procfs are actually usable, testing
> something which, in case breaks, is extremely easy to figure out and fix, without putting this functionality in the test. And, I
> had no knowledge at the time that this will cause pagetable allocation and will touch physical memory. So commits
> b2a79f62133a and 3bd6137220bb could have simply been avoided by removing the bit of code I mentioned.

OK thanks for the explanation, but I don't think this changes anything.

>
> >
> > Commit 0ef3783d7558 ("selftests/mm: add support to test 4PB VA on PPC64"),
> > commit 3bd6137220bb ("selftests/mm: virtual_address_range: avoid reading
> > from VM_IO mappings"), and especially commit a005145b9c96 ("selftests/mm:
> > virtual_address_range: mmap() without PROT_WRITE") are good examples of the
> > whack-a-mole nature of maintaining this test.
> >
> > The last commit there being particularly pertinent as it was accounting for
> > an internal implementation detail change that really should have no bearing
> > on self-tests, that is commit e93d2521b27f ("x86/vdso: Split virtual clock
> > pages into dedicated mapping").
> >
> > The purpose of the mm self-tests are to assert attributes about the API
> > exposed to users, and to ensure that expectations are met.
> >
> > This test is emphatically not doing this, rather making a series of
> > assumptions about internal implementation details and asserting them.
> >
> > It therefore, sadly, seems that the best course is to remove this test
> > altogether.
>
> The objective of the test is to exhaust VA space and find out bugs in mmap(). It has

Well no, you're asserting gap lengths repeatedly, you are making assertions
about get_unmapped_area() behaviour that are totally inappropriate in a
self-test.

I would suggest looking into actually writing unit tests for get_unmapped_area()
functions using kunit or similar would be the correct approach.

But again I'm not sure that it's appropriate to just have a test assert
that functions do what they're implemented to do.

> been useful in discovering a bug at [1].
>
> [1] https://lore.kernel.org/all/20240123171420.3970220-1-ryan.roberts@arm.com/

I mean that proves my point that this test is _actually_ a wrongly-abstracted
get_unmapped_area() unit test...

I'm glad it was useful there, but it's just at the wrong level of abstraction.

The test has been broken consistently, right now it's broken and nobody noticed
because it got skipped (!), it simply does not work on my threadripper in any
configuration, nor in virtme-ng, I've looked at the CI and it seems it's not
been running there either, and it's adding maintenance burden and making test
runs slow even if you have CONFIG_ANON_VMA_NAME set up.

Every time somebody changes an internal implementation detail about mmap()
layout, this will fail even though nothing has broken. This alone renders the
test inappropriate.

It reminds me a little of CRIU, which is tooling that makes a bunch of internal
kernel impl detail assumptions to work - we are not obliged to keep these kinds
of things working.

When I first saw this test I felt it was asserting internal impl. details and
thus not suitable as a self-test but let it go as relatively harmless.

Now it's actively harming my work flow (I run mm selftests locally a
lot). I think on the basis of all the above it's appropriate to remove it.

Thanks, Lorenzo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ