linux-kernel - Re: [RFC PATCH V1 2/6] sched/numa: Add disjoint vma unconditional scan logic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <697988e9-20bc-8cc9-c3ee-403f58a0f823@amd.com>
Date:   Wed, 13 Sep 2023 11:51:53 +0530
From:   Raghavendra K T <raghavendra.kt@....com>
To:     kernelt test robot <oliver.sang@...el.com>
Cc:     oe-lkp@...ts.linux.dev, lkp@...el.com,
        Aithal Srikanth <sraithal@....com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        linux-kernel@...r.kernel.org, ying.huang@...el.com,
        feng.tang@...el.com, fengwei.yin@...el.com,
        aubrey.li@...ux.intel.com, yu.c.chen@...el.com, linux-mm@...ck.org,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Mel Gorman <mgorman@...e.de>,
        Andrew Morton <akpm@...ux-foundation.org>,
        David Hildenbrand <david@...hat.com>, rppt@...nel.org,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Bharata B Rao <bharata@....com>,
        Sapkal Swapnil <Swapnil.Sapkal@....com>,
        K Prateek Nayak <kprateek.nayak@....com>
Subject: Re: [RFC PATCH V1 2/6] sched/numa: Add disjoint vma unconditional
 scan logic

On 9/12/2023 1:20 PM, kernelt test robot wrote:
> 
> 
> Hello,
> 
> kernel test robot noticed a -11.9% improvement of autonuma-benchmark.numa01_THREAD_ALLOC.seconds on:
> 
> 
> commit: 1ef5cbb92bdb320c5eb9fdee1a811d22ee9e19fe ("[RFC PATCH V1 2/6] sched/numa: Add disjoint vma unconditional scan logic")
> url: https://github.com/intel-lab-lkp/linux/commits/Raghavendra-K-T/sched-numa-Move-up-the-access-pid-reset-logic/20230829-141007
> base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 2f88c8e802c8b128a155976631f4eb2ce4f3c805
> patch link: https://lore.kernel.org/all/87e3c08bd1770dd3e6eee099c01e595f14c76fc3.1693287931.git.raghavendra.kt@amd.com/
> patch subject: [RFC PATCH V1 2/6] sched/numa: Add disjoint vma unconditional scan logic
> 
> testcase: autonuma-benchmark
> test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
> parameters:
> 
> 	iterations: 4x
> 	test: numa01_THREAD_ALLOC
> 	cpufreq_governor: performance
> 
> 
> hi, Raghu,
> 
> the reason there is a separate report for this commit besides
> https://lore.kernel.org/all/202309102311.84b42068-oliver.sang@intel.com/
> is due to bisection nature, for one auto-bisect, we so far only could capture
> one commit for performance change.
> 
> this auto-bisect is running on another test machine (Sapphire Rapids), and it
> happened to choose autonuma-benchmark.numa01_THREAD_ALLOC.seconds as indicator
> to do the bisect, it finally captured
> "[RFC PATCH V1 2/6] sched/numa: Add disjoint vma unconditional"
> 
> and from
> https://lore.kernel.org/all/acf254e9-0207-7030-131f-8a3f520c657b@amd.com/
> I noticed you care more about the performance impact of whole patch set,
> so let me give a summary table as below.
> 
> firstly, let me give out how we apply your patch again:
> 
> 68cfe9439a1ba (linux-review/Raghavendra-K-T/sched-numa-Move-up-the-access-pid-reset-logic/20230829-141007) sched/numa: Allow scanning of shared VMAs
> af46f3c9ca2d1 sched/numa: Allow recently accessed VMAs to be scanned
> 167773d1ddb5f sched/numa: Increase tasks' access history
> fc769221b2306 sched/numa: Remove unconditional scan logic using mm numa_scan_seq
> 1ef5cbb92bdb3 sched/numa: Add disjoint vma unconditional scan logic
> 2a806eab1c2e1 sched/numa: Move up the access pid reset logic
> 2f88c8e802c8b (tip/sched/core) sched/eevdf/doc: Modify the documented knob to base_slice_ns as well
> 
> 
> we have below data on this test machine
> (full table will be very big, if you want it, please let me know):
> 
> =========================================================================================
> compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase:
>    gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-spr-r02/numa01_THREAD_ALLOC/autonuma-benchmark
> 
> commit:
>    2f88c8e802 ("(tip/sched/core) sched/eevdf/doc: Modify the documented knob to base_slice_ns as well")
>    2a806eab1c ("sched/numa: Move up the access pid reset logic")
>    1ef5cbb92b ("sched/numa: Add disjoint vma unconditional scan logic")
>    68cfe9439a ("sched/numa: Allow scanning of shared VMAs")
> 
> 
> 2f88c8e802c8b128 2a806eab1c2e1c9f0ae39dc0307 1ef5cbb92bdb320c5eb9fdee1a8 68cfe9439a1baa642e05883fa64
> ---------------- --------------------------- --------------------------- ---------------------------
>           %stddev     %change         %stddev     %change         %stddev     %change         %stddev
>               \          |                \          |                \          |                \
>      271.01            +0.8%     273.24            -0.7%     269.00           -26.4%     199.49 ±  3%  autonuma-benchmark.numa01.seconds
>       76.28            +0.2%      76.44           -11.7%      67.36 ±  6%     -46.9%      40.49 ±  5%  autonuma-benchmark.numa01_THREAD_ALLOC.seconds
>        8.11            -0.9%       8.04            -0.7%       8.05            -0.1%       8.10        autonuma-benchmark.numa02.seconds
>        1425            +0.7%       1434            -3.1%       1381           -30.1%     996.02 ±  2%  autonuma-benchmark.time.elapsed_time
> 
> 

Thanks for this Summary too.

I think slight additional time overhead from first patch is coming
from additional logic that gets executed before we return from
is_vma_accessed() check as expected.

Regards
- Raghu