[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <941f0f8f-a2c2-0021-0773-6cfaa81aabd7@redhat.com>
Date: Wed, 18 Jan 2023 19:21:43 +0100
From: David Hildenbrand <david@...hat.com>
To: James Houghton <jthoughton@...gle.com>,
Peter Xu <peterx@...hat.com>
Cc: Mike Kravetz <mike.kravetz@...cle.com>,
Muchun Song <songmuchun@...edance.com>,
David Rientjes <rientjes@...gle.com>,
Axel Rasmussen <axelrasmussen@...gle.com>,
Mina Almasry <almasrymina@...gle.com>,
Zach O'Keefe <zokeefe@...gle.com>,
Manish Mishra <manish.mishra@...anix.com>,
Naoya Horiguchi <naoya.horiguchi@....com>,
"Dr . David Alan Gilbert" <dgilbert@...hat.com>,
"Matthew Wilcox (Oracle)" <willy@...radead.org>,
Vlastimil Babka <vbabka@...e.cz>,
Baolin Wang <baolin.wang@...ux.alibaba.com>,
Miaohe Lin <linmiaohe@...wei.com>,
Yang Shi <shy828301@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 21/46] hugetlb: use struct hugetlb_pte for
walk_hugetlb_range
>>> Once the last piece is unmapped (or simpler: once the complete subtree of
>>> page tables is gone), we decrement refcount+mapcount. Might require some
>>> brain power to do this tracking, but I wouldn't call it impossible right
>>> from the start.
>>>
>>> Would such a design violate other design aspects that are important?
>
> This is actually how mapcount was treated in HGM RFC v1 (though not
> refcount); it is doable for both [2].
>
> One caveat here: if a page is unmapped in small pieces, it is
> difficult to know if the page is legitimately completely unmapped (we
> would have to check all the PTEs in the page table). In RFC v1, I
> sidestepped this caveat by saying that "page_mapcount() is incremented
> if the hstate-level PTE is present". A single unmap on the whole
> hugepage will clear the hstate-level PTE, thus decrementing the
> mapcount.
>
> On a related note, there still exists an (albeit minor) API difference
> vs. THPs: a piece of a page that is legitimately unmapped can still
> have a positive page_mapcount().
>
> Given that this approach allows us to retain the hugetlb vmemmap
> optimization (and it wouldn't require a horrible amount of
> complexity), I prefer this approach over the THP-like approach.
If we can store (directly/indirectly) metadata in the highest pgtable
that HGM-maps a hugetlb page, I guess what would be reasonable:
* hugetlb page pointer
* mapped size
Whenever mapping/unmapping sub-parts, we'd have to update that information.
Once "mapped size" dropped to 0, we know that the hugetlb page was
completely unmapped and we can drop the refcount+mapcount, clear
metadata (including hugetlb page pointer) [+ remove the page tables?].
Similarly, once "mapped size" corresponds to the hugetlb size, we can
immediately spot that everything is mapped.
Again, just a high-level idea.
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists