lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230829034701.GG3290@monkey>
Date:   Mon, 28 Aug 2023 20:47:01 -0700
From:   Mike Kravetz <mike.kravetz@...cle.com>
To:     Muchun Song <muchun.song@...ux.dev>
Cc:     Usama Arif <usama.arif@...edance.com>,
        Linux-MM <linux-mm@...ck.org>, Mike Rapoport <rppt@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Muchun Song <songmuchun@...edance.com>,
        fam.zheng@...edance.com, liangma@...ngbit.com,
        punit.agrawal@...edance.com
Subject: Re: [v3 4/4] mm: hugetlb: Skip initialization of gigantic tail
 struct pages if freed by HVO

On 08/29/23 11:33, Muchun Song wrote:
> 
> 
> > On Aug 29, 2023, at 05:04, Mike Kravetz <mike.kravetz@...cle.com> wrote:
> > 
> > On 08/28/23 19:33, Muchun Song wrote:
> >> 
> >> 
> >>> On Aug 25, 2023, at 19:18, Usama Arif <usama.arif@...edance.com> wrote:
> >>> 
> >>> The new boot flow when it comes to initialization of gigantic pages
> >>> is as follows:
> >>> - At boot time, for a gigantic page during __alloc_bootmem_hugepage,
> >>> the region after the first struct page is marked as noinit.
> >>> - This results in only the first struct page to be
> >>> initialized in reserve_bootmem_region. As the tail struct pages are
> >>> not initialized at this point, there can be a significant saving
> >>> in boot time if HVO succeeds later on.
> >>> - Later on in the boot, HVO is attempted. If its successful, only the first
> >>> HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page) - 1 tail struct pages
> >>> after the head struct page are initialized. If it is not successful,
> >>> then all of the tail struct pages are initialized.
> >>> 
> >>> Signed-off-by: Usama Arif <usama.arif@...edance.com>
> >> 
> >> This edition is simpler than before ever, thanks for your work.
> >> 
> >> There is premise that other subsystems do not access vmemmap pages
> >> before the initialization of vmemmap pages associated withe HugeTLB
> >> pages allocated from bootmem for your optimization. However, IIUC, the
> >> compacting path could access arbitrary struct page when memory fails
> >> to be allocated via buddy allocator. So we should make sure that
> >> those struct pages are not referenced in this routine. And I know
> >> if CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, it will encounter
> >> the same issue, but I don't find any code to prevent this from
> >> happening. I need more time to confirm this, if someone already knows,
> >> please let me know, thanks. So I think HugeTLB should adopt the similar
> >> way to prevent this.
> > 
> > In this patch, the call to hugetlb_vmemmap_optimize() is moved BEFORE
> > __prep_new_hugetlb_folio or prep_new_hugetlb_folio in all code paths.
> > The prep_new_hugetlb_folio routine(s) are what set the destructor (soon
> > to be a flag) that identifies the set of pages as a hugetlb page.  So,
> > there is now a window where a set of pages not identified as hugetlb
> > will not have vmemmap pages.
> 
> Thanks for your point it out.
> 
> Seems this issue is not related to this change? hugetlb_vmemmap_optimize()
> is called before the setting of destructor since the initial commit
> f41f2ed43ca5. Right?
> 

Thanks Muchun!

Yes, this issue exists today.  It was the further separation of the calls in
this patch which pointed out the issue to me.

I overlooked the fact that the issue already exists. :(

> > 
> > Recently, I closed the same window in the hugetlb freeing code paths with
> > commit 32c877191e02 'hugetlb: do not clear hugetlb dtor until allocating'.
> 
> Yes, I saw it. 
> 
> > This patch needs to be reworked so that this window is not opened in the
> > allocation paths.
> 
> So I think the fix should be a separate series.
> 

Right.  I can fix that up separately.
-- 
Mike Kravetz

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ