[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0d89d71e-66cc-40da-8115-18124bcddb5c@arm.com>
Date: Wed, 25 Jun 2025 16:15:38 +0530
From: Dev Jain <dev.jain@....com>
To: Donet Tom <donettom@...ux.ibm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Aboorva Devarajan <aboorvad@...ux.ibm.com>, akpm@...ux-foundation.org,
Liam.Howlett@...cle.com, shuah@...nel.org, pfalcato@...e.de,
david@...hat.com, ziy@...dia.com, baolin.wang@...ux.alibaba.com,
npache@...hat.com, ryan.roberts@....com, baohua@...nel.org,
linux-mm@...ck.org, linux-kselftest@...r.kernel.org,
linux-kernel@...r.kernel.org, ritesh.list@...il.com
Subject: Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.
On 25/06/25 3:06 pm, Donet Tom wrote:
> eOn Tue, Jun 24, 2025 at 11:45:09AM +0530, Dev Jain wrote:
>> On 23/06/25 11:02 pm, Donet Tom wrote:
>>> On Mon, Jun 23, 2025 at 10:23:02AM +0530, Dev Jain wrote:
>>>> On 21/06/25 11:25 pm, Donet Tom wrote:
>>>>> On Fri, Jun 20, 2025 at 08:15:25PM +0530, Dev Jain wrote:
>>>>>> On 19/06/25 1:53 pm, Donet Tom wrote:
>>>>>>> On Wed, Jun 18, 2025 at 08:13:54PM +0530, Dev Jain wrote:
>>>>>>>> On 18/06/25 8:05 pm, Lorenzo Stoakes wrote:
>>>>>>>>> On Wed, Jun 18, 2025 at 07:47:18PM +0530, Dev Jain wrote:
>>>>>>>>>> On 18/06/25 7:37 pm, Lorenzo Stoakes wrote:
>>>>>>>>>>> On Wed, Jun 18, 2025 at 07:28:16PM +0530, Dev Jain wrote:
>>>>>>>>>>>> On 18/06/25 5:27 pm, Lorenzo Stoakes wrote:
>>>>>>>>>>>>> On Wed, Jun 18, 2025 at 05:15:50PM +0530, Dev Jain wrote:
>>>>>>>>>>>>> Are you accounting for sys.max_map_count? If not, then you'll be hitting that
>>>>>>>>>>>>> first.
>>>>>>>>>>>> run_vmtests.sh will run the test in overcommit mode so that won't be an issue.
>>>>>>>>>>> Umm, what? You mean overcommit all mode, and that has no bearing on the max
>>>>>>>>>>> mapping count check.
>>>>>>>>>>>
>>>>>>>>>>> In do_mmap():
>>>>>>>>>>>
>>>>>>>>>>> /* Too many mappings? */
>>>>>>>>>>> if (mm->map_count > sysctl_max_map_count)
>>>>>>>>>>> return -ENOMEM;
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> As well as numerous other checks in mm/vma.c.
>>>>>>>>>> Ah sorry, didn't look at the code properly just assumed that overcommit_always meant overriding
>>>>>>>>>> this.
>>>>>>>>> No problem! It's hard to be aware of everything in mm :)
>>>>>>>>>
>>>>>>>>>>> I'm not sure why an overcommit toggle is even necessary when you could use
>>>>>>>>>>> MAP_NORESERVE or simply map PROT_NONE to avoid the OVERCOMMIT_GUESS limits?
>>>>>>>>>>>
>>>>>>>>>>> I'm pretty confused as to what this test is really achieving honestly. This
>>>>>>>>>>> isn't a useful way of asserting mmap() behaviour as far as I can tell.
>>>>>>>>>> Well, seems like a useful way to me at least : ) Not sure if you are in the mood
>>>>>>>>>> to discuss that but if you'd like me to explain from start to end what the test
>>>>>>>>>> is doing, I can do that : )
>>>>>>>>>>
>>>>>>>>> I just don't have time right now, I guess I'll have to come back to it
>>>>>>>>> later... it's not the end of the world for it to be iffy in my view as long as
>>>>>>>>> it passes, but it might just not be of great value.
>>>>>>>>>
>>>>>>>>> Philosophically I'd rather we didn't assert internal implementation details like
>>>>>>>>> where we place mappings in userland memory. At no point do we promise to not
>>>>>>>>> leave larger gaps if we feel like it :)
>>>>>>>> You have a fair point. Anyhow a debate for another day.
>>>>>>>>
>>>>>>>>> I'm guessing, reading more, the _real_ test here is some mathematical assertion
>>>>>>>>> about layout from HIGH_ADDR_SHIFT -> end of address space when using hints.
>>>>>>>>>
>>>>>>>>> But again I'm not sure that achieves much and again also is asserting internal
>>>>>>>>> implementation details.
>>>>>>>>>
>>>>>>>>> Correct behaviour of this kind of thing probably better belongs to tests in the
>>>>>>>>> userland VMA testing I'd say.
>>>>>>>>>
>>>>>>>>> Sorry I don't mean to do down work you've done before, just giving an honest
>>>>>>>>> technical appraisal!
>>>>>>>> Nah, it will be rather hilarious to see it all go down the drain xD
>>>>>>>>
>>>>>>>>> Anyway don't let this block work to fix the test if it's failing. We can revisit
>>>>>>>>> this later.
>>>>>>>> Sure. @Aboorva and Donet, I still believe that the correct approach is to elide
>>>>>>>> the gap check at the crossing boundary. What do you think?
>>>>>>>>
>>>>>>> One problem I am seeing with this approach is that, since the hint address
>>>>>>> is generated randomly, the VMAs are also being created at randomly based on
>>>>>>> the hint address.So, for the VMAs created at high addresses, we cannot guarantee
>>>>>>> that the gaps between them will be aligned to MAP_CHUNK_SIZE.
>>>>>>>
>>>>>>> High address VMAs
>>>>>>> -----------------
>>>>>>> 1000000000000-1000040000000 r--p 00000000 00:00 0
>>>>>>> 2000000000000-2000040000000 r--p 00000000 00:00 0
>>>>>>> 4000000000000-4000040000000 r--p 00000000 00:00 0
>>>>>>> 8000000000000-8000040000000 r--p 00000000 00:00 0
>>>>>>> e80009d260000-fffff9d260000 r--p 00000000 00:00 0
>>>>>>>
>>>>>>> I have a different approach to solve this issue.
>>>>>> It is really weird that such a large amount of VA space
>>>>>> is left between the two VMAs yet mmap is failing.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Can you please do the following:
>>>>>> set /proc/sys/vm/max_map_count to the highest value possible.
>>>>>> If running without run_vmtests.sh, set /proc/sys/vm/overcommit_memory to 1.
>>>>>> In validate_complete_va_space:
>>>>>>
>>>>>> if (start_addr >= HIGH_ADDR_MARK && found == false) {
>>>>>> found = true;
>>>>>> continue;
>>>>>> }
>>>>> Thanks Dev for the suggestion. I set max_map_count and set overcommit
>>>>> memory to 1, added this code change as well, and then tried. Still, the
>>>>> test is failing
>>>>>
>>>>>> where found is initialized to false. This will skip the check
>>>>>> for the boundary.
>>>>>>
>>>>>> After this can you tell whether the test is still failing.
>>>>>>
>>>>>> Also can you give me the complete output of proc/pid/maps
>>>>>> after putting a sleep at the end of the test.
>>>>>>
>>>>> on powerpc support DEFAULT_MAP_WINDOW is 128TB and with
>>>>> total address space size is 4PB With hint it can map upto
>>>>> 4PB. Since the hint addres is random in this test random hing VMAs
>>>>> are getting created. IIUC this is expected only.
>>>>>
>>>>>
>>>>> 10000000-10010000 r-xp 00000000 fd:05 134226638 /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
>>>>> 10010000-10020000 r--p 00000000 fd:05 134226638 /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
>>>>> 10020000-10030000 rw-p 00010000 fd:05 134226638 /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
>>>>> 30000000-10030000000 r--p 00000000 00:00 0 [anon:virtual_address_range]
>>>>> 10030770000-100307a0000 rw-p 00000000 00:00 0 [heap]
>>>>> 1004f000000-7fff8f000000 r--p 00000000 00:00 0 [anon:virtual_address_range]
>>>>> 7fff8faf0000-7fff8fe00000 rw-p 00000000 00:00 0
>>>>> 7fff8fe00000-7fff90030000 r-xp 00000000 fd:00 792355 /usr/lib64/libc.so.6
>>>>> 7fff90030000-7fff90040000 r--p 00230000 fd:00 792355 /usr/lib64/libc.so.6
>>>>> 7fff90040000-7fff90050000 rw-p 00240000 fd:00 792355 /usr/lib64/libc.so.6
>>>>> 7fff90050000-7fff90130000 r-xp 00000000 fd:00 792358 /usr/lib64/libm.so.6
>>>>> 7fff90130000-7fff90140000 r--p 000d0000 fd:00 792358 /usr/lib64/libm.so.6
>>>>> 7fff90140000-7fff90150000 rw-p 000e0000 fd:00 792358 /usr/lib64/libm.so.6
>>>>> 7fff90160000-7fff901a0000 r--p 00000000 00:00 0 [vvar]
>>>>> 7fff901a0000-7fff901b0000 r-xp 00000000 00:00 0 [vdso]
>>>>> 7fff901b0000-7fff90200000 r-xp 00000000 fd:00 792351 /usr/lib64/ld64.so.2
>>>>> 7fff90200000-7fff90210000 r--p 00040000 fd:00 792351 /usr/lib64/ld64.so.2
>>>>> 7fff90210000-7fff90220000 rw-p 00050000 fd:00 792351 /usr/lib64/ld64.so.2
>>>>> 7fffc9770000-7fffc9880000 rw-p 00000000 00:00 0 [stack]
>>>>> 1000000000000-1000040000000 r--p 00000000 00:00 0 [anon:virtual_address_range]
>>>>> 2000000000000-2000040000000 r--p 00000000 00:00 0 [anon:virtual_address_range]
>>>>> 4000000000000-4000040000000 r--p 00000000 00:00 0 [anon:virtual_address_range]
>>>>> 8000000000000-8000040000000 r--p 00000000 00:00 0 [anon:virtual_address_range]
>>>>> eb95410220000-fffff90220000 r--p 00000000 00:00 0 [anon:virtual_address_range]
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> If I give the hint address serially from 128TB then the address
>>>>> space is contigous and gap is also MAP_SIZE, the test is passing.
>>>>>
>>>>> 10000000-10010000 r-xp 00000000 fd:05 134226638 /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
>>>>> 10010000-10020000 r--p 00000000 fd:05 134226638 /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
>>>>> 10020000-10030000 rw-p 00010000 fd:05 134226638 /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
>>>>> 33000000-10033000000 r--p 00000000 00:00 0 [anon:virtual_address_range]
>>>>> 10033380000-100333b0000 rw-p 00000000 00:00 0 [heap]
>>>>> 1006f0f0000-10071000000 rw-p 00000000 00:00 0
>>>>> 10071000000-7fffb1000000 r--p 00000000 00:00 0 [anon:virtual_address_range]
>>>>> 7fffb15d0000-7fffb1800000 r-xp 00000000 fd:00 792355 /usr/lib64/libc.so.6
>>>>> 7fffb1800000-7fffb1810000 r--p 00230000 fd:00 792355 /usr/lib64/libc.so.6
>>>>> 7fffb1810000-7fffb1820000 rw-p 00240000 fd:00 792355 /usr/lib64/libc.so.6
>>>>> 7fffb1820000-7fffb1900000 r-xp 00000000 fd:00 792358 /usr/lib64/libm.so.6
>>>>> 7fffb1900000-7fffb1910000 r--p 000d0000 fd:00 792358 /usr/lib64/libm.so.6
>>>>> 7fffb1910000-7fffb1920000 rw-p 000e0000 fd:00 792358 /usr/lib64/libm.so.6
>>>>> 7fffb1930000-7fffb1970000 r--p 00000000 00:00 0 [vvar]
>>>>> 7fffb1970000-7fffb1980000 r-xp 00000000 00:00 0 [vdso]
>>>>> 7fffb1980000-7fffb19d0000 r-xp 00000000 fd:00 792351 /usr/lib64/ld64.so.2
>>>>> 7fffb19d0000-7fffb19e0000 r--p 00040000 fd:00 792351 /usr/lib64/ld64.so.2
>>>>> 7fffb19e0000-7fffb19f0000 rw-p 00050000 fd:00 792351 /usr/lib64/ld64.so.2
>>>>> 7fffc5470000-7fffc5580000 rw-p 00000000 00:00 0 [stack]
>>>>> 800000000000-2aab000000000 r--p 00000000 00:00 0 [anon:virtual_address_range]
>>>>>
>>>>>
>>>> Thank you for this output. I can't wrap my head around why this behaviour changes
>>>> when you generate the hint sequentially. The mmap() syscall is supposed to do the
>>>> following (irrespective of high VA space or not) - if the allocation at the hint
>>> Yes, it is working as expected. On PowerPC, the DEFAULT_MAP_WINDOW is
>>> 128TB, and the system can map up to 4PB.
>>>
>>> In the test, the first mmap call maps memory up to 128TB without any
>>> hint, so the VMAs are created below the 128TB boundary.
>>>
>>> In the second mmap call, we provide a hint starting from 256TB, and
>>> the hint address is generated randomly above 256TB. The mappings are
>>> correctly created at these hint addresses. Since the hint addresses
>>> are random, the resulting VMAs are also created at random locations.
>>>
>>> So, what I tried is: mapping from 0 to 128TB without any hint, and
>>> then for the second mmap, instead of starting the hint from 256TB, I
>>> started from 128TB. Instead of using random hint addresses, I used
>>> sequential hint addresses from 128TB up to 512TB. With this change,
>>> the VMAs are created in order, and the test passes.
>>>
>>> 800000000000-2aab000000000 r--p 00000000 00:00 0 128TB to 512TB VMA
>>>
>>> I think we will see same behaviour on x86 with X86_FEATURE_LA57.
>>>
>>> I will send the updated patch in V2.
>> Since you say it fails on both radix and hash, it means that the generic
>> code path is failing. I see that on my system, when I run the test with
>> LPA2 config, write() fails with errno set to -ENOMEM. Can you apply
>> the following diff and check whether the test fails still. Doing this
>> fixed it for arm64.
>>
>> diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
>>
>> index b380e102b22f..3032902d01f2 100644
>>
>> --- a/tools/testing/selftests/mm/virtual_address_range.c
>>
>> +++ b/tools/testing/selftests/mm/virtual_address_range.c
>>
>> @@ -173,10 +173,6 @@ static int validate_complete_va_space(void)
>>
>> */
>>
>> hop = 0;
>>
>> while (start_addr + hop < end_addr) {
>>
>> - if (write(fd, (void *)(start_addr + hop), 1) != 1)
>>
>> - return 1;
>>
>> - lseek(fd, 0, SEEK_SET);
>>
>> -
>>
>> if (is_marked_vma(vma_name))
>>
>> munmap((char *)(start_addr + hop), MAP_CHUNK_SIZE);
>>
> Even with this change, the test is still failing. In this case,
> we are allocating physical memory and writing into it, but our
> issue seems to be with the gap between VMAs, so I believe this
> might not be directly related.
>
> I will send the next revision where the test passes and no
> issues are observed
But we are not solving the real problem - can you give me the diff
of the modified test, the sequential hinting you were talking
about?
>
> Just curious — with LPA2, is the second mmap() call successful?
> And are the VMAs being created at the hint address as expected?
mmap() is working as expected on LPA2 - the first three mmap's
correctly happen at the hint addresses, then mmap retrieves
addresses in a top down fashion, and the test passes, after
eliding the gap check on the boundary.
>
>>>> addr succeeds, then all is well, otherwise, do a top-down search for a large
>>>> enough gap. I am not aware of the nuances in powerpc but I really am suspecting
>>>> a bug in powerpc mmap code. Can you try to do some tracing - which function
>>>> eventually fails to find the empty gap?
>>>>
>>>> Through my limited code tracing - we should end up in slice_find_area_topdown,
>>>> then we ask the generic code to find the gap using vm_unmapped_area. So I
>>>> suspect something is happening between this, probably slice_scan_available().
>>>>
>>>>>>> From 0 to 128TB, we map memory directly without using any hint. For the range above
>>>>>>> 256TB up to 512TB, we perform the mapping using hint addresses. In the current test,
>>>>>>> we use random hint addresses, but I have modified it to generate hint addresses linearly
>>>>>>> starting from 128TB.
>>>>>>>
>>>>>>> With this change:
>>>>>>>
>>>>>>> The 0–128TB range is mapped without hints and verified accordingly.
>>>>>>>
>>>>>>> The 128TB–512TB range is mapped using linear hint addresses and then verified.
>>>>>>>
>>>>>>> Below are the VMAs obtained with this approach:
>>>>>>>
>>>>>>> 10000000-10010000 r-xp 00000000 fd:05 135019531
>>>>>>> 10010000-10020000 r--p 00000000 fd:05 135019531
>>>>>>> 10020000-10030000 rw-p 00010000 fd:05 135019531
>>>>>>> 20000000-10020000000 r--p 00000000 00:00 0
>>>>>>> 10020800000-10020830000 rw-p 00000000 00:00 0
>>>>>>> 1004bcf0000-1004c000000 rw-p 00000000 00:00 0
>>>>>>> 1004c000000-7fff8c000000 r--p 00000000 00:00 0
>>>>>>> 7fff8c130000-7fff8c360000 r-xp 00000000 fd:00 792355
>>>>>>> 7fff8c360000-7fff8c370000 r--p 00230000 fd:00 792355
>>>>>>> 7fff8c370000-7fff8c380000 rw-p 00240000 fd:00 792355
>>>>>>> 7fff8c380000-7fff8c460000 r-xp 00000000 fd:00 792358
>>>>>>> 7fff8c460000-7fff8c470000 r--p 000d0000 fd:00 792358
>>>>>>> 7fff8c470000-7fff8c480000 rw-p 000e0000 fd:00 792358
>>>>>>> 7fff8c490000-7fff8c4d0000 r--p 00000000 00:00 0
>>>>>>> 7fff8c4d0000-7fff8c4e0000 r-xp 00000000 00:00 0
>>>>>>> 7fff8c4e0000-7fff8c530000 r-xp 00000000 fd:00 792351
>>>>>>> 7fff8c530000-7fff8c540000 r--p 00040000 fd:00 792351
>>>>>>> 7fff8c540000-7fff8c550000 rw-p 00050000 fd:00 792351
>>>>>>> 7fff8d000000-7fffcd000000 r--p 00000000 00:00 0
>>>>>>> 7fffe9c80000-7fffe9d90000 rw-p 00000000 00:00 0
>>>>>>> 800000000000-2000000000000 r--p 00000000 00:00 0 -> High Address (128TB to 512TB)
>>>>>>>
>>>>>>> diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
>>>>>>> index 4c4c35eac15e..0be008cba4b0 100644
>>>>>>> --- a/tools/testing/selftests/mm/virtual_address_range.c
>>>>>>> +++ b/tools/testing/selftests/mm/virtual_address_range.c
>>>>>>> @@ -56,21 +56,21 @@
>>>>>>> #ifdef __aarch64__
>>>>>>> #define HIGH_ADDR_MARK ADDR_MARK_256TB
>>>>>>> -#define HIGH_ADDR_SHIFT 49
>>>>>>> +#define HIGH_ADDR_SHIFT 48
>>>>>>> #define NR_CHUNKS_LOW NR_CHUNKS_256TB
>>>>>>> #define NR_CHUNKS_HIGH NR_CHUNKS_3840TB
>>>>>>> #else
>>>>>>> #define HIGH_ADDR_MARK ADDR_MARK_128TB
>>>>>>> -#define HIGH_ADDR_SHIFT 48
>>>>>>> +#define HIGH_ADDR_SHIFT 47
>>>>>>> #define NR_CHUNKS_LOW NR_CHUNKS_128TB
>>>>>>> #define NR_CHUNKS_HIGH NR_CHUNKS_384TB
>>>>>>> #endif
>>>>>>> -static char *hint_addr(void)
>>>>>>> +static char *hint_addr(int hint)
>>>>>>> {
>>>>>>> - int bits = HIGH_ADDR_SHIFT + rand() % (63 - HIGH_ADDR_SHIFT);
>>>>>>> + unsigned long addr = ((1UL << HIGH_ADDR_SHIFT) + (hint * MAP_CHUNK_SIZE));
>>>>>>> - return (char *) (1UL << bits);
>>>>>>> + return (char *) (addr);
>>>>>>> }
>>>>>>> static void validate_addr(char *ptr, int high_addr)
>>>>>>> @@ -217,7 +217,7 @@ int main(int argc, char *argv[])
>>>>>>> }
>>>>>>> for (i = 0; i < NR_CHUNKS_HIGH; i++) {
>>>>>>> - hint = hint_addr();
>>>>>>> + hint = hint_addr(i);
>>>>>>> hptr[i] = mmap(hint, MAP_CHUNK_SIZE, PROT_READ,
>>>>>>> MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Can we fix it this way?
Powered by blists - more mailing lists