[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <49043787-853a-75d5-6da3-90529377bb30@oracle.com>
Date: Fri, 20 Aug 2021 14:00:53 -0700
From: Mike Kravetz <mike.kravetz@...cle.com>
To: Mina Almasry <almasrymina@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, Ken Chen <kenchen@...gle.com>,
Chris Kennelly <ckennelly@...gle.com>,
Michal Hocko <mhocko@...e.com>,
Vlastimil Babka <vbabka@...e.cz>,
Kirill Shutemov <kirill@...temov.name>
Subject: Re: [PATCH v2] mm, hugepages: add mremap() support for hugepage
backed vma
On 8/18/21 4:35 PM, Mina Almasry wrote:
> On Fri, Aug 13, 2021 at 4:40 PM Mike Kravetz <mike.kravetz@...cle.com> wrote:
>> Earlier in mremap code, this following lines exist:
>>
>> old_len = PAGE_ALIGN(old_len);
>> new_len = PAGE_ALIGN(new_len);
>>
>> So, the passed length values are page aligned. This allows 'sloppy'
>> values to be passed by users.
>>
>> Should we do the same for hugetlb mappings? In mmap we have different
>> requirements for hugetlb mappings:
>>
>> " Huge page (Huge TLB) mappings
>> For mappings that employ huge pages, the requirements for the arguments
>> of mmap() and munmap() differ somewhat from the requirements for map‐
>> pings that use the native system page size.
>>
>> For mmap(), offset must be a multiple of the underlying huge page size.
>> The system automatically aligns length to be a multiple of the underly‐
>> ing huge page size.
>>
>> For munmap(), addr and length must both be a multiple of the underlying
>> huge page size.
>> "
>>
>> I actually wish arguments for hugetlb mappings would be treated the same
>> as for base page size mappings. We can not change mmap as legacy code
>> may depend on the different requirements. Since mremap for hugetlb is
>> new, should we treat arguments for hugetlb mappings the same as for base
>> pages (align to huge page boundary)? My vote is yes, but it would be
>> good to get other opinions.
>>
>> If we do not align for hugetlb mappings as we do for base page mappings,
>> then this will also need to be documented.
>>
>> Another question,
>> Should we possibly check addr and new_addr alignment here as well?
>> addr has been previously checked for PAGE alignment and new_addr is
>> checked for PAGE alignment at the beginning of mremap_to().
>>
>
> I'll yield to whatever you decide here because I reckon you have much
> more experience and better judgement here. But my thoughts:
>
> 'Sane' usage of mremap() is something like:
> 1. mmap() a hugetlbfs vma.
> 2. Pass the vma received from step (1) to mremap() to remap it to a
> different location.
>
> I don't know if there is another usage pattern I need to worry about
> but given the above, old_addr and old_len will be hugepage aligned
> already since they are values returned by the previous mmap() call
> which aligns them, no? So, I think aligning old_addr and old_len to
> the hugepage boundary is fine.
>
> With this support we don't allow mremap() expansion. In my use case
> old_len==new_len acutally. I think it's fine to also align new_len to
> the hugepage boundary
>
> I already have this code that errors out if the lengths are not aligned:
>
> if (old_len & ~huge_page_mask(h) || new_len & ~huge_page_mask(h))
> goto out;
>
> I think aligning new_addr breaks my use case though. In my use case
> new_addr is the start of the text segment in the ELF executable, and I
> don't think that's guaranteed to be anything but page aligned.
> Aligning new_addr seems like it would break my use case.
That is interesting. I assumed there was hugetlb code written under the
assumption vmas/mappings were always huge page aligned. I thought the
code would fall over quite quickly if vma was not huge page aligned.
Your use case/statement above surprised me.
So, I took your provided test case (V3 patch)and tried to make destination
address be non-huge page aligned: just page aligned. In every case, mremap
would fail. The routine hugetlb_get_unmapped_area() required huge page
alignment. Not sure how this works for you?
> Aligning new_addr seems like it would break my use case. If you insist
> though I'm happy aligning new_addr in the upstream kernel and not
> doing that in our kernel, but if I'm not particularly happy with the
> hugepage alignment I'd say it is likely future users of hugetlb
> mremap() also won't like the hugepage alignement, but I yield to you
> here.
I am now a bit confused and do not see how this works for your use case?
--
Mike Kravetz
Powered by blists - more mailing lists