[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <db37bakzupqagevhjvngsu7vzcqugp6coy635bvhoy6cdrzk53@mrldbtuep3gk>
Date: Fri, 16 May 2025 13:12:01 -0400
From: "Liam R. Howlett" <Liam.Howlett@...cle.com>
To: Nico Pache <npache@...hat.com>
Cc: linux-mm@...ck.org, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
david@...hat.com, ziy@...dia.com, baolin.wang@...ux.alibaba.com,
lorenzo.stoakes@...cle.com, ryan.roberts@....com, dev.jain@....com,
corbet@....net, rostedt@...dmis.org, mhiramat@...nel.org,
mathieu.desnoyers@...icios.com, akpm@...ux-foundation.org,
baohua@...nel.org, willy@...radead.org, peterx@...hat.com,
wangkefeng.wang@...wei.com, usamaarif642@...il.com,
sunnanyong@...wei.com, vishal.moola@...il.com,
thomas.hellstrom@...ux.intel.com, yang@...amperecomputing.com,
kirill.shutemov@...ux.intel.com, aarcange@...hat.com,
raquini@...hat.com, anshuman.khandual@....com, catalin.marinas@....com,
tiwai@...e.de, will@...nel.org, dave.hansen@...ux.intel.com,
jack@...e.cz, cl@...two.org, jglisse@...gle.com, surenb@...gle.com,
zokeefe@...gle.com, hannes@...xchg.org, rientjes@...gle.com,
mhocko@...e.com, rdunlap@...radead.org
Subject: Re: [PATCH v7 02/12] introduce khugepaged_collapse_single_pmd to
unify khugepaged and madvise_collapse
* Nico Pache <npache@...hat.com> [250514 23:23]:
> The khugepaged daemon and madvise_collapse have two different
> implementations that do almost the same thing.
>
> Create khugepaged_collapse_single_pmd to increase code
> reuse and create an entry point for future khugepaged changes.
>
> Refactor madvise_collapse and khugepaged_scan_mm_slot to use
> the new khugepaged_collapse_single_pmd function.
>
> Reviewed-by: Baolin Wang <baolin.wang@...ux.alibaba.com>
> Signed-off-by: Nico Pache <npache@...hat.com>
> ---
> mm/khugepaged.c | 96 +++++++++++++++++++++++++------------------------
> 1 file changed, 49 insertions(+), 47 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 806bcd8c5185..5457571d505a 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2353,6 +2353,48 @@ static int khugepaged_scan_file(struct mm_struct *mm, unsigned long addr,
> return result;
> }
>
> +/*
> + * Try to collapse a single PMD starting at a PMD aligned addr, and return
> + * the results.
> + */
> +static int khugepaged_collapse_single_pmd(unsigned long addr,
> + struct vm_area_struct *vma, bool *mmap_locked,
> + struct collapse_control *cc)
> +{
> + int result = SCAN_FAIL;
> + struct mm_struct *mm = vma->vm_mm;
> +
> + if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) {
why IS_ENABLED(CONFIG_SHMEM) here, it seems new?
> + struct file *file = get_file(vma->vm_file);
> + pgoff_t pgoff = linear_page_index(vma, addr);
> +
> + mmap_read_unlock(mm);
> + *mmap_locked = false;
> + result = khugepaged_scan_file(mm, addr, file, pgoff, cc);
> + fput(file);
> + if (result == SCAN_PTE_MAPPED_HUGEPAGE) {
> + mmap_read_lock(mm);
> + *mmap_locked = true;
> + if (khugepaged_test_exit_or_disable(mm)) {
> + result = SCAN_ANY_PROCESS;
> + goto end;
> + }
> + result = collapse_pte_mapped_thp(mm, addr,
> + !cc->is_khugepaged);
> + if (result == SCAN_PMD_MAPPED)
> + result = SCAN_SUCCEED;
> + mmap_read_unlock(mm);
> + *mmap_locked = false;
> + }
> + } else {
> + result = khugepaged_scan_pmd(mm, vma, addr, mmap_locked, cc);
> + }
> + if (cc->is_khugepaged && result == SCAN_SUCCEED)
> + ++khugepaged_pages_collapsed;
> +end:
> + return result;
This function can return with mmap_read_locked or unlocked..
> +}
> +
> static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
> struct collapse_control *cc)
> __releases(&khugepaged_mm_lock)
> @@ -2427,34 +2469,12 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
> VM_BUG_ON(khugepaged_scan.address < hstart ||
> khugepaged_scan.address + HPAGE_PMD_SIZE >
> hend);
> - if (!vma_is_anonymous(vma)) {
> - struct file *file = get_file(vma->vm_file);
> - pgoff_t pgoff = linear_page_index(vma,
> - khugepaged_scan.address);
> -
> - mmap_read_unlock(mm);
> - mmap_locked = false;
> - *result = hpage_collapse_scan_file(mm,
> - khugepaged_scan.address, file, pgoff, cc);
> - fput(file);
> - if (*result == SCAN_PTE_MAPPED_HUGEPAGE) {
> - mmap_read_lock(mm);
> - if (hpage_collapse_test_exit_or_disable(mm))
> - goto breakouterloop;
> - *result = collapse_pte_mapped_thp(mm,
> - khugepaged_scan.address, false);
> - if (*result == SCAN_PMD_MAPPED)
> - *result = SCAN_SUCCEED;
> - mmap_read_unlock(mm);
> - }
> - } else {
> - *result = hpage_collapse_scan_pmd(mm, vma,
> - khugepaged_scan.address, &mmap_locked, cc);
> - }
> -
> - if (*result == SCAN_SUCCEED)
> - ++khugepaged_pages_collapsed;
>
> + *ngle_pmd(khugepaged_scan.address,
> + vma, &mmap_locked, cc);
> + /* If we return SCAN_ANY_PROCESS we are holding the mmap_lock */
But this comment makes it obvious that you know that..
> + if (*result == SCAN_ANY_PROCESS)
> + goto breakouterloop;
But later..
breakouterloop:
mmap_read_unlock(mm); /* exit_mmap will destroy ptes after this */
breakouterloop_mmap_lock:
So if you return with SCAN_ANY_PROCESS, we are holding the lock and go
immediately and drop it. This seems unnecessarily complicated and
involves a lock.
That would leave just the khugepaged_scan_pmd() path with the
unfortunate locking mess - which is a static function and called in one
location..
Looking at what happens after the return seems to indicate we could
clean that up as well, sometime later.
> /* move to next address */
> khugepaged_scan.address += HPAGE_PMD_SIZE;
> progress += HPAGE_PMD_NR;
> @@ -2773,36 +2793,18 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev,
> mmap_assert_locked(mm);
> memset(cc->node_load, 0, sizeof(cc->node_load));
> nodes_clear(cc->alloc_nmask);
> - if (!vma_is_anonymous(vma)) {
> - struct file *file = get_file(vma->vm_file);
> - pgoff_t pgoff = linear_page_index(vma, addr);
>
> - mmap_read_unlock(mm);
> - mmap_locked = false;
> - result = hpage_collapse_scan_file(mm, addr, file, pgoff,
> - cc);
> - fput(file);
> - } else {
> - result = hpage_collapse_scan_pmd(mm, vma, addr,
> - &mmap_locked, cc);
> - }
> + result = khugepaged_collapse_single_pmd(addr, vma, &mmap_locked, cc);
> +
> if (!mmap_locked)
> *prev = NULL; /* Tell caller we dropped mmap_lock */
>
> -handle_result:
> switch (result) {
> case SCAN_SUCCEED:
> case SCAN_PMD_MAPPED:
> ++thps;
> break;
> case SCAN_PTE_MAPPED_HUGEPAGE:
> - BUG_ON(mmap_locked);
> - BUG_ON(*prev);
> - mmap_read_lock(mm);
> - result = collapse_pte_mapped_thp(mm, addr, true);
> - mmap_read_unlock(mm);
> - goto handle_result;
All of the above should probably be replaced with a BUG_ON(1) since it's
not expected now? Or at least WARN_ON_ONCE(), but it should be safe to
continue if that's the case.
It looks like the mmap_locked boolean is used to ensure that *prev is
safe, but we are now dropping the lock and re-acquiring it (and
potentially returning here) with it set to true, so perv will not be set
to NULL like it should.
I think you can handle this by ensuring that
khugepaged_collapse_single_pmd() returns with mmap_locked false in the
SCAN_ANY_PROCESS case.
> - /* Whitelisted set of results where continuing OK */
This seems worth keeping?
> case SCAN_PMD_NULL:
> case SCAN_PTE_NON_PRESENT:
> case SCAN_PTE_UFFD_WP:
I guess SCAN_ANY_PROCESS should be handled by the default case
statement? It should probably be added to the switch?
That is to say, before your change the result would come from either
hpage_collapse_scan_file(), then lead to collapse_pte_mapped_thp()
above.
Now, you can have khugepaged_test_exit_or_disable() happen to return
SCAN_ANY_PROCESS and it will fall through to the default in this switch
statement, which seems like new behaviour?
At the very least, this information should be added to the git log on
what this patch does - if it's expected?
Thanks,
Liam
Powered by blists - more mailing lists