[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZSUX9NLa+DDjFLnZ@gmail.com>
Date: Tue, 10 Oct 2023 11:23:00 +0200
From: Ingo Molnar <mingo@...nel.org>
To: Mel Gorman <mgorman@...hsingularity.net>
Cc: Peter Zijlstra <peterz@...radead.org>,
Raghavendra K T <raghavendra.kt@....com>,
K Prateek Nayak <kprateek.nayak@....com>,
Bharata B Rao <bharata@....com>,
Ingo Molnar <mingo@...hat.com>,
LKML <linux-kernel@...r.kernel.org>,
Linux-MM <linux-mm@...ck.org>
Subject: Re: [PATCH 6/6] sched/numa: Complete scanning of inactive VMAs when
there is no alternative
* Mel Gorman <mgorman@...hsingularity.net> wrote:
> On a 2-socket Cascade Lake test machine, the time to complete the
> workload is as follows;
>
> 6.6.0-rc2 6.6.0-rc2
> sched-numabtrace-v1 sched-numabselective-v1
> Min elsp-NUMA01_THREADLOCAL 174.22 ( 0.00%) 117.64 ( 32.48%)
> Amean elsp-NUMA01_THREADLOCAL 175.68 ( 0.00%) 123.34 * 29.79%*
> Stddev elsp-NUMA01_THREADLOCAL 1.20 ( 0.00%) 4.06 (-238.20%)
> CoeffVar elsp-NUMA01_THREADLOCAL 0.68 ( 0.00%) 3.29 (-381.70%)
> Max elsp-NUMA01_THREADLOCAL 177.18 ( 0.00%) 128.03 ( 27.74%)
>
> The time to complete the workload is reduced by almost 30%
>
> 6.6.0-rc2 6.6.0-rc2
> sched-numabtrace-v1 sched-numabselective-v1 /
> Duration User 91201.80 63506.64
> Duration System 2015.53 1819.78
> Duration Elapsed 1234.77 868.37
>
> In this specific case, system CPU time was not increased but it's not
> universally true.
>
> From vmstat, the NUMA scanning and fault activity is as follows;
>
> 6.6.0-rc2 6.6.0-rc2
> sched-numabtrace-v1 sched-numabselective-v1
> Ops NUMA base-page range updates 64272.00 26374386.00
> Ops NUMA PTE updates 36624.00 55538.00
> Ops NUMA PMD updates 54.00 51404.00
> Ops NUMA hint faults 15504.00 75786.00
> Ops NUMA hint local faults % 14860.00 56763.00
> Ops NUMA hint local percent 95.85 74.90
> Ops NUMA pages migrated 1629.00 6469222.00
>
> Both the number of PTE updates and hint faults is dramatically
> increased. While this is superficially unfortunate, it represents
> ranges that were simply skipped without the patch. As a result
> of the scanning and hinting faults, many more pages were also
> migrated but as the time to completion is reduced, the overhead
> is offset by the gain.
Nice! I've applied your series to tip:sched/core with a few non-functional
edits to comment/changelog formatting/clarity.
Btw., was any previous analysis done on the size of the pids_active[] hash
and the hash collision rate?
64 (BITS_PER_LONG) feels a bit small, especially on larger machines running
threaded workloads, and the kmalloc of numab_state likely allocates a full
cacheline anyway, so we could double the hash size from 8 bytes (2x1 longs)
to 32 bytes (2x2 longs) with very little real cost, and still have a long
field left to spare?
Thanks,
Ingo
Powered by blists - more mailing lists