[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <73de5556-e574-4ed7-a7fb-c4648e46206b@nvidia.com>
Date: Wed, 24 Apr 2024 21:00:50 -0700
From: John Hubbard <jhubbard@...dia.com>
To: David Hildenbrand <david@...hat.com>, <linux-kernel@...r.kernel.org>
CC: <linux-mm@...ck.org>, <linux-doc@...r.kernel.org>, Andrew Morton
<akpm@...ux-foundation.org>, Jonathan Corbet <corbet@....net>, "Kirill A .
Shutemov" <kirill.shutemov@...ux.intel.com>, Zi Yan <ziy@...dia.com>, "Yang
Shi" <yang.shi@...ux.alibaba.com>, Ryan Roberts <ryan.roberts@....com>
Subject: Re: [PATCH v1] mm/khugepaged: replace page_mapcount() check by
folio_likely_mapped_shared()
On 4/24/24 5:26 AM, David Hildenbrand wrote:
Hi David,
Overall, I think this looks good, just a few questions, and of course
some silly documentation nits.
> We want to limit the use of page_mapcount() to places where absolutely
> required, to prepare for kernel configs where we won't keep track of
> per-page mapcounts in large folios.
Just curious, can you elaborate on the motivation? I probably missed
the discussions that explained why page_mapcount() in large folios
is not desirable. Are we getting rid of a field in struct page/folio?
Some other reason?
..
> To summarize, in the common case, this change is not expected to matter
> much. The more common application of khugepaged operates on
Based on the diffs (and some quick hacks for testing that I ran), I agree.
..
>
> This really needs the folio_likely_mapped_shared() optimization [1] that
> resides in mm-unstable, I think, to reduce "false negatives".
>
> The khugepage MM selftests keep working as expected, including:
>
> Run test: collapse_max_ptes_shared (khugepaged:anon)
> Allocate huge page... OK
> Share huge page over fork()... OK
> Trigger CoW on page 255 of 512... OK
> Maybe collapse with max_ptes_shared exceeded.... OK
> Trigger CoW on page 256 of 512... OK
> Collapse with max_ptes_shared PTEs shared.... OK
> Check if parent still has huge page... OK
Well, a word of caution! These tests do not (yet) cover either of
the interesting new cases that folio_likely_mapped_shared() presents:
KSM or hugetlbfs interactions. In other words, false positives.
>
> Where we check that collapsing in the parent behaves as expected after
> COWing a lot of pages in the parent: a sane scenario that is essentially
> unchanged and which does not depend on any action in the child process
> (compared to the cases discussed in (B) above).
>
> [1] https://lkml.kernel.org/r/20240409192301.907377-6-david@redhat.com
>
> ---
> Documentation/admin-guide/mm/transhuge.rst | 3 ++-
> mm/khugepaged.c | 22 +++++++++++++++-------
> 2 files changed, 17 insertions(+), 8 deletions(-)
>
> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> index f82300b9193fe..076443cc10a6c 100644
> --- a/Documentation/admin-guide/mm/transhuge.rst
> +++ b/Documentation/admin-guide/mm/transhuge.rst
> @@ -278,7 +278,8 @@ collapsed, resulting fewer pages being collapsed into
> THPs, and lower memory access performance.
>
> ``max_ptes_shared`` specifies how many pages can be shared across multiple
> -processes. Exceeding the number would block the collapse::
> +processes. khugepaged might treat pages of THPs as shared if any page of
> +that THP is shared. Exceeding the number would block the collapse::
>
> /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_shared
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 2f73d2aa9ae84..cf518fc440982 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -583,7 +583,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> folio = page_folio(page);
> VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio);
>
> - if (page_mapcount(page) > 1) {
> + /* See hpage_collapse_scan_pmd(). */
Why? Because it has an identical code snippet?
I thought about asking if we should factor that out, just to
keep the policy the same. Thoughts?
> + if (folio_likely_mapped_shared(folio)) {
> ++shared;
> if (cc->is_khugepaged &&
> shared > khugepaged_max_ptes_shared) {
> @@ -1317,8 +1318,20 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
> result = SCAN_PAGE_NULL;
> goto out_unmap;
> }
> + folio = page_folio(page);
>
> - if (page_mapcount(page) > 1) {
> + if (!folio_test_anon(folio)) {
> + result = SCAN_PAGE_ANON;
> + goto out_unmap;
> + }
> +
> + /*
> + * We treat a single page as shared if any part of the THP
> + * is shared. "False negatives" from
> + * folio_likely_mapped_shared() are not expected to matter
> + * much in practice.
Maybe delete that second sentence? It is not really pulling its
weight here. :)
thanks,
--
John Hubbard
NVIDIA
Powered by blists - more mailing lists