[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADrL8HV92DaNm5bUwcOxsG8Lg4yLT6F13KWSbjkySPNAsgCfpA@mail.gmail.com>
Date: Thu, 26 Jan 2023 08:58:51 -0800
From: James Houghton <jthoughton@...gle.com>
To: Mike Kravetz <mike.kravetz@...cle.com>
Cc: Peter Xu <peterx@...hat.com>, David Hildenbrand <david@...hat.com>,
Muchun Song <songmuchun@...edance.com>,
David Rientjes <rientjes@...gle.com>,
Axel Rasmussen <axelrasmussen@...gle.com>,
Mina Almasry <almasrymina@...gle.com>,
"Zach O'Keefe" <zokeefe@...gle.com>,
Manish Mishra <manish.mishra@...anix.com>,
Naoya Horiguchi <naoya.horiguchi@....com>,
"Dr . David Alan Gilbert" <dgilbert@...hat.com>,
"Matthew Wilcox (Oracle)" <willy@...radead.org>,
Vlastimil Babka <vbabka@...e.cz>,
Baolin Wang <baolin.wang@...ux.alibaba.com>,
Miaohe Lin <linmiaohe@...wei.com>,
Yang Shi <shy828301@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 21/46] hugetlb: use struct hugetlb_pte for walk_hugetlb_range
On Thu, Jan 19, 2023 at 11:42 AM James Houghton <jthoughton@...gle.com> wrote:
>
> On Thu, Jan 19, 2023 at 9:32 AM Mike Kravetz <mike.kravetz@...cle.com> wrote:
> >
> > On 01/19/23 08:57, James Houghton wrote:
> > > FWIW, what makes the most sense to me right now is to implement the
> > > THP-like scheme and mark HGM as mutually exclusive with the vmemmap
> > > optimization. We can later come up with a scheme that lets us retain
> > > compatibility. (Is that what you mean by "this can be done somewhat
> > > independently", Mike?)
> >
> > Sort of, I was only saying that getting the ref/map counting right seems
> > like a task than can be independently worked. Using the THP-like scheme
> > is good.
>
> Ok! So if you're ok with the intermediate mapping sizes, it sounds
> like I should go ahead and implement the THP-like scheme.
It turns out that the THP-like scheme significantly slows down
MADV_COLLAPSE: decrementing the mapcounts for the 4K subpages becomes
the vast majority of the time spent in MADV_COLLAPSE when collapsing
1G mappings. It is doing 262k atomic decrements, so this makes sense.
This is only really a problem because this is done between
mmu_notifier_invalidate_range_start() and
mmu_notifier_invalidate_range_end(), so KVM won't allow vCPUs to
access any of the 1G page while we're doing this (and it can take like
~1 second for each 1G, at least on the x86 server I was testing on).
- James
Powered by blists - more mailing lists