[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250108165052-c03470bd-6ff7-44c9-87b9-9145456bdea8@linutronix.de>
Date: Wed, 8 Jan 2025 17:13:03 +0100
From: Thomas Weißschuh <thomas.weissschuh@...utronix.de>
To: David Hildenbrand <david@...hat.com>
Cc: Dev Jain <dev.jain@....com>, Andrew Morton <akpm@...ux-foundation.org>,
Shuah Khan <shuah@...nel.org>, Thomas Gleixner <tglx@...utronix.de>, linux-mm@...ck.org,
linux-kselftest@...r.kernel.org, linux-kernel@...r.kernel.org, stable@...r.kernel.org,
Ryan Roberts <ryan.roberts@....com>
Subject: Re: [PATCH 1/3] selftests/mm: virtual_address_range: Fix error when
CommitLimit < 1GiB
On Wed, Jan 08, 2025 at 02:36:57PM +0100, David Hildenbrand wrote:
> On 08.01.25 09:05, Thomas Weißschuh wrote:
> > On Wed, Jan 08, 2025 at 11:46:19AM +0530, Dev Jain wrote:
> > >
> > > On 07/01/25 8:44 pm, Thomas Weißschuh wrote:
> > > > If not enough physical memory is available the kernel may fail mmap();
> > > > see __vm_enough_memory() and vm_commit_limit().
> > > > In that case the logic in validate_complete_va_space() does not make
> > > > sense and will even incorrectly fail.
> > > > Instead skip the test if no mmap() succeeded.
> > > >
> > > > Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()")
> > > > Cc: stable@...r.kernel.org
>
> CC stable on tests is ... odd.
I thought it was fairly common, but it isn't.
Will drop it.
> > > > Signed-off-by: Thomas Weißschuh <thomas.weissschuh@...utronix.de>
> > > >
> > > > ---
> > > > The logic in __vm_enough_memory() seems weird.
> > > > It describes itself as "Check that a process has enough memory to
> > > > allocate a new virtual mapping", however it never checks the current
> > > > memory usage of the process.
> > > > So it only disallows large mappings. But many small mappings taking the
> > > > same amount of memory are allowed; and then even automatically merged
> > > > into one big mapping.
> > > > ---
> > > > tools/testing/selftests/mm/virtual_address_range.c | 6 ++++++
> > > > 1 file changed, 6 insertions(+)
> > > >
> > > > diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
> > > > index 2a2b69e91950a37999f606847c9c8328d79890c2..d7bf8094d8bcd4bc96e2db4dc3fcb41968def859 100644
> > > > --- a/tools/testing/selftests/mm/virtual_address_range.c
> > > > +++ b/tools/testing/selftests/mm/virtual_address_range.c
> > > > @@ -178,6 +178,12 @@ int main(int argc, char *argv[])
> > > > validate_addr(ptr[i], 0);
> > > > }
> > > > lchunks = i;
> > > > +
> > > > + if (!lchunks) {
> > > > + ksft_test_result_skip("Not enough memory for a single chunk\n");
> > > > + ksft_finished();
> > > > + }
> > > > +
> > > > hptr = (char **) calloc(NR_CHUNKS_HIGH, sizeof(char *));
> > > > if (hptr == NULL) {
> > > > ksft_test_result_skip("Memory constraint not fulfilled\n");
> > > >
> > >
> > > I do not know about __vm_enough_memory(), but I am going by your description:
> > > You say that the kernel may fail mmap() when enough physical memory is not
> > > there, but it may happen that we have already done 100 mmap()'s, and then
> > > the kernel fails mmap(), so if (!lchunks) won't be able to handle this case.
> > > Basically, lchunks == 0 is not a complete indicator of kernel failing mmap().
> >
> > __vm_enough_memory() only checks the size of each single mmap() on its
> > own. It does not actually check the current memory or address space
> > usage of the process.
> > This seems a bit weird, as indicated in my after-the-fold explanation.
> >
> > > The basic assumption of the test is that any process should be able to exhaust
> > > its virtual address space, and running the test under memory pressure and the
> > > kernel violating this behaviour defeats the point of the test I think?
> >
> > The assumption is correct, as soon as one mapping succeeds the others
> > will also succeed, until the actual address space is exhausted.
> >
> > Looking at it again, __vm_enough_memory() is only called for writable
> > mappings, so it would be possible to use only readable mappings in the
> > test. The test will still fail with OOM, as the many PTEs need more than
> > 1GiB of physical memory anyways, but at least that produces a usable
> > error message.
> > However I'm not sure if this would violate other test assumptions.
> >
>
> Note that with MAP_NORESRVE, most setups we care about will allow mapping as
> much as you want, but on access OOM will fire.
Thanks for the hint.
> So one could require that /proc/sys/vm/overcommit_memory is setup properly
> and use MAP_NORESRVE.
Isn't the check for lchunks == 0 essentially exactly this?
> Reading from anonymous memory will populate the shared zeropage. To mitigate
> OOM from "too many page tables", one could simply unmap the pieces as they
> are verified (or MAP_FIXED over them, to free page tables).
The code has to figure out if a verified region was created by mmap(),
otherwise an munmap() could crash the process.
As the entries from /proc/self/maps may have been merged and (I assume)
the ordering of mappings is not guaranteed, some bespoke logic to establish
the link will be needed.
Is it fine to rely on CONFIG_ANON_VMA_NAME?
That would make it much easier to implement.
Using MAP_NORESERVE and eager munmap()s, the testcase works nicely even
in very low physical memory conditions.
Thomas
Powered by blists - more mailing lists