[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <697988e9-20bc-8cc9-c3ee-403f58a0f823@amd.com>
Date: Wed, 13 Sep 2023 11:51:53 +0530
From: Raghavendra K T <raghavendra.kt@....com>
To: kernelt test robot <oliver.sang@...el.com>
Cc: oe-lkp@...ts.linux.dev, lkp@...el.com,
Aithal Srikanth <sraithal@....com>,
Mel Gorman <mgorman@...hsingularity.net>,
linux-kernel@...r.kernel.org, ying.huang@...el.com,
feng.tang@...el.com, fengwei.yin@...el.com,
aubrey.li@...ux.intel.com, yu.c.chen@...el.com, linux-mm@...ck.org,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Mel Gorman <mgorman@...e.de>,
Andrew Morton <akpm@...ux-foundation.org>,
David Hildenbrand <david@...hat.com>, rppt@...nel.org,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Bharata B Rao <bharata@....com>,
Sapkal Swapnil <Swapnil.Sapkal@....com>,
K Prateek Nayak <kprateek.nayak@....com>
Subject: Re: [RFC PATCH V1 2/6] sched/numa: Add disjoint vma unconditional
scan logic
On 9/12/2023 1:20 PM, kernelt test robot wrote:
>
>
> Hello,
>
> kernel test robot noticed a -11.9% improvement of autonuma-benchmark.numa01_THREAD_ALLOC.seconds on:
>
>
> commit: 1ef5cbb92bdb320c5eb9fdee1a811d22ee9e19fe ("[RFC PATCH V1 2/6] sched/numa: Add disjoint vma unconditional scan logic")
> url: https://github.com/intel-lab-lkp/linux/commits/Raghavendra-K-T/sched-numa-Move-up-the-access-pid-reset-logic/20230829-141007
> base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 2f88c8e802c8b128a155976631f4eb2ce4f3c805
> patch link: https://lore.kernel.org/all/87e3c08bd1770dd3e6eee099c01e595f14c76fc3.1693287931.git.raghavendra.kt@amd.com/
> patch subject: [RFC PATCH V1 2/6] sched/numa: Add disjoint vma unconditional scan logic
>
> testcase: autonuma-benchmark
> test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
> parameters:
>
> iterations: 4x
> test: numa01_THREAD_ALLOC
> cpufreq_governor: performance
>
>
> hi, Raghu,
>
> the reason there is a separate report for this commit besides
> https://lore.kernel.org/all/202309102311.84b42068-oliver.sang@intel.com/
> is due to bisection nature, for one auto-bisect, we so far only could capture
> one commit for performance change.
>
> this auto-bisect is running on another test machine (Sapphire Rapids), and it
> happened to choose autonuma-benchmark.numa01_THREAD_ALLOC.seconds as indicator
> to do the bisect, it finally captured
> "[RFC PATCH V1 2/6] sched/numa: Add disjoint vma unconditional"
>
> and from
> https://lore.kernel.org/all/acf254e9-0207-7030-131f-8a3f520c657b@amd.com/
> I noticed you care more about the performance impact of whole patch set,
> so let me give a summary table as below.
>
> firstly, let me give out how we apply your patch again:
>
> 68cfe9439a1ba (linux-review/Raghavendra-K-T/sched-numa-Move-up-the-access-pid-reset-logic/20230829-141007) sched/numa: Allow scanning of shared VMAs
> af46f3c9ca2d1 sched/numa: Allow recently accessed VMAs to be scanned
> 167773d1ddb5f sched/numa: Increase tasks' access history
> fc769221b2306 sched/numa: Remove unconditional scan logic using mm numa_scan_seq
> 1ef5cbb92bdb3 sched/numa: Add disjoint vma unconditional scan logic
> 2a806eab1c2e1 sched/numa: Move up the access pid reset logic
> 2f88c8e802c8b (tip/sched/core) sched/eevdf/doc: Modify the documented knob to base_slice_ns as well
>
>
> we have below data on this test machine
> (full table will be very big, if you want it, please let me know):
>
> =========================================================================================
> compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase:
> gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-spr-r02/numa01_THREAD_ALLOC/autonuma-benchmark
>
> commit:
> 2f88c8e802 ("(tip/sched/core) sched/eevdf/doc: Modify the documented knob to base_slice_ns as well")
> 2a806eab1c ("sched/numa: Move up the access pid reset logic")
> 1ef5cbb92b ("sched/numa: Add disjoint vma unconditional scan logic")
> 68cfe9439a ("sched/numa: Allow scanning of shared VMAs")
>
>
> 2f88c8e802c8b128 2a806eab1c2e1c9f0ae39dc0307 1ef5cbb92bdb320c5eb9fdee1a8 68cfe9439a1baa642e05883fa64
> ---------------- --------------------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev %change %stddev
> \ | \ | \ | \
> 271.01 +0.8% 273.24 -0.7% 269.00 -26.4% 199.49 ± 3% autonuma-benchmark.numa01.seconds
> 76.28 +0.2% 76.44 -11.7% 67.36 ± 6% -46.9% 40.49 ± 5% autonuma-benchmark.numa01_THREAD_ALLOC.seconds
> 8.11 -0.9% 8.04 -0.7% 8.05 -0.1% 8.10 autonuma-benchmark.numa02.seconds
> 1425 +0.7% 1434 -3.1% 1381 -30.1% 996.02 ± 2% autonuma-benchmark.time.elapsed_time
>
>
Thanks for this Summary too.
I think slight additional time overhead from first patch is coming
from additional logic that gets executed before we return from
is_vma_accessed() check as expected.
Regards
- Raghu
Powered by blists - more mailing lists