lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6ef38c6e-e47d-66c2-216a-76ab4a59feb1@amd.com>
Date:   Mon, 23 Oct 2023 10:55:55 +0530
From:   Raghavendra K T <raghavendra.kt@....com>
To:     linux-kernel@...r.kernel.org, linux-mm@...ck.org
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Mel Gorman <mgorman@...e.de>,
        Andrew Morton <akpm@...ux-foundation.org>,
        David Hildenbrand <david@...hat.com>, rppt@...nel.org,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Bharata B Rao <bharata@....com>,
        Aithal Srikanth <sraithal@....com>,
        kernel test robot <oliver.sang@...el.com>,
        Sapkal Swapnil <Swapnil.Sapkal@....com>,
        K Prateek Nayak <kprateek.nayak@....com>
Subject: Re: [PATCH V1 0/1] sched/numa: Fix mm numa_scan_seq based
 unconditional scan

On 10/20/2023 9:27 PM, Raghavendra K T wrote:
> NUMA balancing code that updates PTEs by allowing unconditional scan
> based on the value of processes' mm numa_scan_seq is not perfect.
> 
> More description is in patch1.
> 
> Have used the below patch to identify the corner case.
> 
> Detailed Result: (Only part of the result is updated
> in patch1 to save space in commit log)
> 
> Detailed Result:
> 
> SUT: AMD EPYC Milan with 2 NUMA nodes 256 cpus.
> 
> Base kernel: upstream 6.6-rc6 (dd72f9c7e512) with Mels patch-series
> from tip/sched/core [1] applied.
> 
> Summary: Some benchmarks imrove. There is increase in system
> time due to additional scanning. But elapsed time shows gain.
> 
> However there is also some overhead seen for benchmarks like NUMA01.
> 
> kernbench
> ==========		base                  patched
> Amean     user-128    13799.58 (   0.00%)    13789.86 *   0.07%*
> Amean     syst-128     3280.80 (   0.00%)     3249.67 *   0.95%*
> Amean     elsp-128      165.09 (   0.00%)      164.78 *   0.19%*
> 
> Duration User       41404.28    41375.08
> Duration System      9862.22     9768.48
> Duration Elapsed      519.87      518.72
> 
> Ops NUMA PTE updates                 1041416.00      831536.00
> Ops NUMA hint faults                  263296.00      220966.00
> Ops NUMA pages migrated               258021.00      212769.00
> Ops AutoNUMA cost                       1328.67        1114.69
> 
> autonumabench
> 
> NUMA01_THREADLOCAL
> ==================
> Amean     syst-NUMA01_THREADLOCAL       10.65 (   0.00%)       26.47 *-148.59%*
> Amean     elsp-NUMA01_THREADLOCAL       81.79 (   0.00%)       67.74 *  17.18%*
> 
> Duration User       54832.73    47379.67
> Duration System        75.00      185.75
> Duration Elapsed      576.72      476.09
> 
> Ops NUMA PTE updates                  394429.00    11121044.00
> Ops NUMA hint faults                    1001.00     8906404.00
> Ops NUMA pages migrated                  288.00     2998694.00
> Ops AutoNUMA cost                          7.77       44666.84
> 
> NUMA01
> =====
> Amean     syst-NUMA01       31.97 (   0.00%)       52.95 * -65.62%*
> Amean     elsp-NUMA01      143.16 (   0.00%)      150.81 *  -5.34%*
> 
> Duration User       84839.49    91342.19
> Duration System       224.26      371.12
> Duration Elapsed     1005.64     1059.01
> 
> Ops NUMA PTE updates                33929508.00    50116313.00
> Ops NUMA hint faults                34993820.00    52895783.00
> Ops NUMA pages migrated              5456115.00     7441228.00
> Ops AutoNUMA cost                     175310.27      264971.11
> 
> NUMA02
> =========
> Amean     syst-NUMA02        0.86 (   0.00%)        0.86 *  -0.50%*
> Amean     elsp-NUMA02        3.99 (   0.00%)        3.82 *   4.40%*
> 
> Duration User        1186.06     1092.07
> Duration System         6.44        6.47
> Duration Elapsed       31.28       30.30
> 
> Ops NUMA PTE updates                     776.00         731.00
> Ops NUMA hint faults                     527.00         490.00
> Ops NUMA pages migrated                  183.00         153.00
> Ops AutoNUMA cost                          2.64           2.46
> 
> Link: https://lore.kernel.org/linux-mm/ZSXF3AFZgIld1meX@gmail.com/T/
> 

Forgot to add skip_vma_count trace results:

autonumabench: numa01_THREAD_LOCAL 3 iterations

base:
inaccessible:13133
pid_inactive:15807
scan_delay:471
seq_completed:50
shared_ro:6983
unsuitable:3917

patched:
inaccessible:4727
pid_inactive:5119
scan_delay:455
seq_completed:7
shared_ro:2551
unsuitable:5402



> Raghavendra K T (1):
>    sched/numa: Fix mm numa_scan_seq based unconditional scan
> 
>   include/linux/mm_types.h | 3 +++
>   kernel/sched/fair.c      | 4 +++-
>   2 files changed, 6 insertions(+), 1 deletion(-)
> 
> ---8<---
> diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
> index 010ba1b7cb0e..a4870b01c8a1 100644
> --- a/include/trace/events/sched.h
> +++ b/include/trace/events/sched.h
> @@ -10,6 +10,30 @@
>   #include <linux/tracepoint.h>
>   #include <linux/binfmts.h>
>   
> +TRACE_EVENT(sched_vma_start_seq,
> +
> +	TP_PROTO(struct task_struct *t, struct vm_area_struct *vma, int start_seq),
> +
> +	TP_ARGS(t, vma, start_seq),
> +
> +	TP_STRUCT__entry(
> +		__array(	char,	comm,	TASK_COMM_LEN	)
> +		__field(	pid_t,	pid			)
> +		__field(	void *,	vma			)
> +		__field(	int, start_seq		)
> +	),
> +
> +	TP_fast_assign(
> +		memcpy(__entry->comm, t->comm, TASK_COMM_LEN);
> +		__entry->pid	= t->pid;
> +		__entry->vma	= vma;
> +		__entry->start_seq	= start_seq;
> +	),
> +
> +	TP_printk("comm=%s pid=%d vma = %px start_seq=%d", __entry->comm, __entry->pid, __entry->vma,
> +			 __entry->start_seq)
> +);
> +
>   /*
>    * Tracepoint for calling kthread_stop, performed to end a kthread:
>    */
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index c8af3a7ccba7..e0c16ea8470b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3335,6 +3335,7 @@ static void task_numa_work(struct callback_head *work)
>   				continue;
>   
>   			vma->numab_state->start_scan_seq = mm->numa_scan_seq;
> +			trace_sched_vma_start_seq(p, vma, mm->numa_scan_seq);
>   
>   			vma->numab_state->next_scan = now +
>   				msecs_to_jiffies(sysctl_numa_balancing_scan_delay);
> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ