linux-kernel - Re: [PATCH v7 0/5] Free the 2nd vmemmap page associated with each HugeTLB page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAMZfGtXPUNWheky0PLCOryDBMxGA7vghkzVFC8xahX7+_sUQ8w@mail.gmail.com>
Date:   Thu, 10 Feb 2022 15:45:06 +0800
From:   Muchun Song <songmuchun@...edance.com>
To:     Mike Kravetz <mike.kravetz@...cle.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Oscar Salvador <osalvador@...e.de>,
        David Hildenbrand <david@...hat.com>,
        Michal Hocko <mhocko@...e.com>,
        Matthew Wilcox <willy@...radead.org>,
        Jonathan Corbet <corbet@....net>,
        Xiongchun duan <duanxiongchun@...edance.com>,
        Fam Zheng <fam.zheng@...edance.com>,
        Muchun Song <smuchun@...il.com>,
        Qi Zheng <zhengqi.arch@...edance.com>,
        Linux Doc Mailing List <linux-doc@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux Memory Management List <linux-mm@...ck.org>,
        "Song Bao Hua (Barry Song)" <song.bao.hua@...ilicon.com>,
        Barry Song <21cnbao@...il.com>,
        "Bodeddula, Balasubramaniam" <bodeddub@...zon.com>,
        Jue Wang <juew@...gle.com>
Subject: Re: [PATCH v7 0/5] Free the 2nd vmemmap page associated with each
 HugeTLB page

On Thu, Feb 10, 2022 at 6:49 AM Mike Kravetz <mike.kravetz@...cle.com> wrote:
>
> On 2/8/22 23:44, Muchun Song wrote:
> > On Wed, Jan 26, 2022 at 4:04 PM Muchun Song <songmuchun@...edance.com> wrote:
> >>
> >> On Wed, Nov 24, 2021 at 11:09 AM Andrew Morton
> >> <akpm@...ux-foundation.org> wrote:
> >>>
> >>> On Mon, 22 Nov 2021 12:21:32 +0800 Muchun Song <songmuchun@...edance.com> wrote:
> >>>
> >>>> On Wed, Nov 10, 2021 at 2:18 PM Muchun Song <songmuchun@...edance.com> wrote:
> >>>>>
> >>>>> On Tue, Nov 9, 2021 at 3:33 AM Mike Kravetz <mike.kravetz@...cle.com> wrote:
> >>>>>>
> >>>>>> On 11/8/21 12:16 AM, Muchun Song wrote:
> >>>>>>> On Mon, Nov 1, 2021 at 11:22 AM Muchun Song <songmuchun@...edance.com> wrote:
> >>>>>>>>
> >>>>>>>> This series can minimize the overhead of struct page for 2MB HugeTLB pages
> >>>>>>>> significantly. It further reduces the overhead of struct page by 12.5% for
> >>>>>>>> a 2MB HugeTLB compared to the previous approach, which means 2GB per 1TB
> >>>>>>>> HugeTLB. It is a nice gain. Comments and reviews are welcome. Thanks.
> >>>>>>>>
> >>>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> Ping guys. Does anyone have any comments or suggestions
> >>>>>>> on this series?
> >>>>>>>
> >>>>>>> Thanks.
> >>>>>>>
> >>>>>>
> >>>>>> I did look over the series earlier.  I have no issue with the hugetlb and
> >>>>>> vmemmap modifications as they are enhancements to the existing
> >>>>>> optimizations.  My primary concern is the (small) increased overhead
> >>>>>> for the helpers as outlined in your cover letter.  Since these helpers
> >>>>>> are not limited to hugetlb and used throughout the kernel, I would
> >>>>>> really like to get comments from others with a better understanding of
> >>>>>> the potential impact.
> >>>>>
> >>>>> Thanks Mike. I'd like to hear others' comments about this as well.
> >>>>> From my point of view, maybe the (small) overhead is acceptable
> >>>>> since it only affects the head page, however Matthew Wilcox's folio
> >>>>> series could reduce this situation as well.
> >>>
> >>> I think Mike was inviting you to run some tests to quantify the
> >>> overhead ;)
> >>
> >> Hi Andrew,
> >>
> >> Sorry for the late reply.
> >>
> >> Specific overhead figures are already in the cover letter. Also,
> >> I did some other tests, e.g. kernel compilation, sysbench. I didn't
> >> see any regressions.
> >
> > The overhead is introduced by page_fixed_fake_head() which
> > has an "if" statement and an access to a possible cold cache line.
> > I think the main overhead is from the latter. However, probabilistically,
> > only 1/64 of the pages need to do the latter.  And
> > page_fixed_fake_head() is already simple (I mean the overhead
> > is small enough) and many performance bottlenecks in mm are
> > not in compound_head().  This also matches the tests I did.
> > I didn't see any regressions after enabling this feature.
> >
> > I knew Mike's concern is the increased overhead to use cases
> > beyond HugeTLB. If we really want to avoid the access to
> > a possible cold cache line, we can introduce a new page
> > flag like PG_hugetlb and test if it is set in the page->flags,
> > if so, then return the read head page struct. Then
> > page_fixed_fake_head() looks like below.
> >
> > static __always_inline const struct page *page_fixed_fake_head(const
> > struct page *page)
> > {
> >         if (!hugetlb_free_vmemmap_enabled())
> >                 return page;
> >
> >         if (test_bit(PG_hugetlb, &page->flags)) {
> >                 unsigned long head = READ_ONCE(page[1].compound_head);
> >
> >                 if (likely(head & 1))
> >                         return (const struct page *)(head - 1);
> >         }
> >         return page;
> > }
> >
> > But I don't think it's worth doing this.
> >
> > Hi Mike and Andrew,
> >
> > Since these helpers are not limited to hugetlb and used throughout the
> > kernel, I would really like to get comments from others with a better
> > understanding of the potential impact. Do you have any appropriate
> > reviewers to invite?
> >
>
> I think the appropriate people are already on Cc as they provided input on
> the original vmemmap optimization series.
>
> The question that needs to be answered is simple enough:  Is the savings of
> one vmemmap page per hugetlb page worth the extra minimal overhead in
> compound_head()?  Like most things, this depends on workload.
>
> One thing to note is that compound_page() overhead is only introduced if
> hugetlb vmemmap freeing is enabled.  Correct?

Definitely correct.

> During the original vmemmap
> optimization discussions, people thought it important that this be 'opt in'.  I do not know if distos will enable this by default.  But, perhaps the
> potential overhead can be thought of as just part of 'opting in' for
> vmemmap optimizations.

I agree. Does anyone else have a different opinion?

Thanks.