lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ea9a9ef0-05da-835f-8d85-491cc133d4d7@oracle.com>
Date:   Wed, 18 May 2022 20:36:46 -0700
From:   Mike Kravetz <mike.kravetz@...cle.com>
To:     Hugh Dickins <hughd@...gle.com>
Cc:     "linux-mm@...ck.org" <linux-mm@...ck.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Michal Hocko <mhocko@...e.com>,
        Oscar Salvador <osalvador@...e.de>,
        David Hildenbrand <david@...hat.com>,
        Naoya Horiguchi <naoya.horiguchi@...ux.dev>,
        Peter Xu <peterx@...hat.com>, Nick Piggin <npiggin@...il.com>,
        Andi Kleen <ak@...ux.intel.com>
Subject: Re: vma_needs_copy always true for VM_HUGETLB ?

On 5/18/22 18:30, Hugh Dickins wrote:
> On Wed, 18 May 2022, Mike Kravetz wrote:
> 
>> For most non-anonymous vmas, we do not copy page tables at fork time, but
>> rather lazily populate the tables after fork via faults.  The routine
>> vma_needs_copy() is used to make this decision. For VM_HUGETLB vmas, it always
>> returns true.
> 
> "vma_needs_copy()" is *very* recent coinage, not reached Linus yet.
> 
>>
>> Anyone know/remember why?  The code was added more than 15 years ago and
>> my search for why hugetlb vmas were excluded came up empty.
>>
>> I do not see a reason why VM_HUGETLB is in this list.  Initial testing did
>> not reveal any problems when I removed the VM_HUGETLB check.
>>
>> FYI - I am looking at the performance of fork and exec (unmap) of processes
>> with very large hugetlb mappings.  Skipping the copy at fork time would
>> certainly speed things up.  Of course, there could some users who would
>> notice if hugetlb page tables are not copied at fork time.  However, this
>> is the behavior for 'normal' mappings.  I am inclined to make hugetlb be
>> 'more normal'.
> 
> Good question, not obvious to me either: but I've found the answer.

Thank you Hugh!  You went above and beyond as usual.

> The commit was of course Nick's d992895ba2b2 ("[PATCH] Lazy page table
> copies in fork()") in 2.6.14; but it doesn't explain why VM_HUGETLB is
> there in the test, and goes on to be copied.
> 
> I haven't re-read through the whole mail thread which led to that
> commit, but I think you'll find the crucial observation comes from
> Andi in https://lore.kernel.org/lkml/200508251756.07849.ak@suse.de/#t

Sorry, that I did not find the entire thread.  There were only a couple
pieces on linux-mm and that is the only place I looked.

> 
> "Actually I disabled it for hugetlbfs (... !is_huge...vma). The reason 
> is that lazy faulting for huge pages is still not in mainline."
> 
> and indeed, look at the 2.6.13 or 2.6.14 mm/hugetlb.c and you find
> /*
>  * We cannot handle pagefaults against hugetlb pages at all.  They cause
>  * handle_mm_fault() to try to instantiate regular-sized pages in the
>  * hugegpage VMA.  do_page_fault() is supposed to trap this, so BUG is we get
>  * this far.
>  */
> static struct page *hugetlb_nopage(struct vm_area_struct *vma,
> 				unsigned long address, int *unused)
> {
> 	BUG();
> 	return NULL;
> }
> 
> Oh, and that pretty much still exists to this day, to cover that path
> to a fault; but 2.6.16 implemented hugetlb_no_page(), which is what
> then actually got used to satisfy a hugetlb fault.
> 
> So the reason for fork copying VM_HUGETLB appears to have gone away
> in 2.6.16.

Yes, that is the likely reason.  Functionality was not originally
supported, and when it was added this 'optimization' was not enabled.

> (I haven't a clue on private hugetlb mappings and reservations and
> whether anon_vma means the same on hugetlb, but you know all that.)

Yes, I believe anon_vma means the same on hugetlb for this purpose.
Although, I do need to look closer just to make sure there are not
hidden surprises.

Thanks again,
-- 
Mike Kravetz

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ