[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y9Li93O6Ffwcr+vn@x1n>
Date: Thu, 26 Jan 2023 15:30:47 -0500
From: Peter Xu <peterx@...hat.com>
To: James Houghton <jthoughton@...gle.com>
Cc: Mike Kravetz <mike.kravetz@...cle.com>,
David Hildenbrand <david@...hat.com>,
Muchun Song <songmuchun@...edance.com>,
David Rientjes <rientjes@...gle.com>,
Axel Rasmussen <axelrasmussen@...gle.com>,
Mina Almasry <almasrymina@...gle.com>,
Zach O'Keefe <zokeefe@...gle.com>,
Manish Mishra <manish.mishra@...anix.com>,
Naoya Horiguchi <naoya.horiguchi@....com>,
"Dr . David Alan Gilbert" <dgilbert@...hat.com>,
"Matthew Wilcox (Oracle)" <willy@...radead.org>,
Vlastimil Babka <vbabka@...e.cz>,
Baolin Wang <baolin.wang@...ux.alibaba.com>,
Miaohe Lin <linmiaohe@...wei.com>,
Yang Shi <shy828301@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 21/46] hugetlb: use struct hugetlb_pte for
walk_hugetlb_range
James,
On Thu, Jan 26, 2023 at 08:58:51AM -0800, James Houghton wrote:
> It turns out that the THP-like scheme significantly slows down
> MADV_COLLAPSE: decrementing the mapcounts for the 4K subpages becomes
> the vast majority of the time spent in MADV_COLLAPSE when collapsing
> 1G mappings. It is doing 262k atomic decrements, so this makes sense.
>
> This is only really a problem because this is done between
> mmu_notifier_invalidate_range_start() and
> mmu_notifier_invalidate_range_end(), so KVM won't allow vCPUs to
> access any of the 1G page while we're doing this (and it can take like
> ~1 second for each 1G, at least on the x86 server I was testing on).
Did you try to measure the time, or it's a quick observation from perf?
IIRC I used to measure some atomic ops, it is not as drastic as I thought.
But maybe it depends on many things.
I'm curious how the 1sec is provisioned between the procedures. E.g., I
would expect mmu_notifier_invalidate_range_start() to also take some time
too as it should walk the smally mapped EPT pgtables.
Since we'll still keep the intermediate levels around - from application
POV, one other thing to remedy this is further shrink the size of COLLAPSE
so potentially for a very large page we can start with building 2M layers.
But then collapse will need to be run at least two rounds.
--
Peter Xu
Powered by blists - more mailing lists