linux-kernel - Re: [PATCH v3] mm/khugepaged: sched to numa node when collapse huge page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3a441789-b3e4-236e-2e44-e7a1c7258a94@redhat.com>
Date:   Thu, 28 Apr 2022 17:17:07 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Bibo Mao <maobibo@...ngson.cn>,
        Andrew Morton <akpm@...ux-foundation.org>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Yang Shi <shy828301@...il.com>
Subject: Re: [PATCH v3] mm/khugepaged: sched to numa node when collapse huge
 page

On 17.03.22 07:50, Bibo Mao wrote:
> collapse huge page will copy huge page from general small pages,
> dest node is calculated from most one of source pages, however
> THP daemon is not scheduled on dest node. The performance may be
> poor since huge page copying across nodes, also cache is not used
> for target node. With this patch, khugepaged daemon switches to
> the same numa node with huge page. It saves copying time and makes
> use of local cache better.
> 
> With this patch, specint 2006 base performance is improved with 6%
> on Loongson 3C5000L platform with 32 cores and 8 numa nodes.

If it helps, that's nice as long as it doesn't hurt other cases.

> 
> Signed-off-by: Bibo Mao <maobibo@...ngson.cn>
> ---
> changelog:
> V2: remove node record for thp daemon
> V3: remove unlikely statement
> ---
>  mm/khugepaged.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 131492fd1148..b3cf0885f5a2 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1066,6 +1066,7 @@ static void collapse_huge_page(struct mm_struct *mm,
>  	struct vm_area_struct *vma;
>  	struct mmu_notifier_range range;
>  	gfp_t gfp;
> +	const struct cpumask *cpumask;
>  
>  	VM_BUG_ON(address & ~HPAGE_PMD_MASK);
>  
> @@ -1079,6 +1080,13 @@ static void collapse_huge_page(struct mm_struct *mm,
>  	 * that. We will recheck the vma after taking it again in write mode.
>  	 */
>  	mmap_read_unlock(mm);
> +
> +	/* sched to specified node before huage page memory copy */

huage? I assume "huge"

> +	if (task_node(current) != node) {
> +		cpumask = cpumask_of_node(node);
> +		if (!cpumask_empty(cpumask))
> +			set_cpus_allowed_ptr(current, cpumask);
> +	}

I wonder if that will always be optimized out without NUMA and if we
want to check for IS_ENABLED(CONFIG_NUMA).


Regarding comments from others, I agree: I think what we'd actually want
is something like "try to reschedule to one of these CPUs immediately.
If they are all busy, just stay here.


Also, I do wonder if there could already be scenarios where someone
wants to let khugepaged run only on selected housekeeping CPUs (e.g.,
when pinning VCPUs in a VM to physical CPUs). It might even degrade the
VM performance in that case if we schedule something unrelated on these
CPUs. (I don't know which interfaces we might already have to configure
housekeeping CPUs for kthreads).

I can spot in kernel/kthread.c:kthread()

set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_TYPE_KTHREAD));

Hmmmmm ...


-- 
Thanks,

David / dhildenb