[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4c40bf22-292c-4a3a-bd32-4461c2d4f7d9@amd.com>
Date: Tue, 25 Mar 2025 13:30:05 +0530
From: Raghavendra K T <raghavendra.kt@....com>
To: Nikhil Dhama <nikhil.dhama@....com>, akpm@...ux-foundation.org,
ying.huang@...ux.alibaba.com
Cc: Ying Huang <huang.ying.caritas@...il.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, Bharata B Rao <bharata@....com>,
Raghavendra <raghavendra.kodsarathimmappa@....com>
Subject: Re: [PATCH -V2] mm: pcp: scale batch to reduce number of high order
pcp flushes on deallocation
On 3/19/2025 1:44 PM, Nikhil Dhama wrote:
[...]
>> And, do you run network related workloads on one machine? If so, please
>> try to run them on two machines instead, with clients and servers run on
>> different machines. At least, please use different sockets for clients
>> and servers. Because larger pcp->free_count will make it easier to
>> trigger free_high heuristics. If that is the case, please try to
>> optimize free_high heuristics directly too.
>
> I agree with Ying Huang, the above change is not the best possible fix for
> the issue. On futher analysis I figured that root cause of the issue is
> the frequent pcp high order flushes. During a 20sec iperf3 run
> I observed on avg 5 pcp high order flushes in kernel v6.6, whereas, in
> v6.7, I observed about 170 pcp high order flushes.
> Tracing pcp->free_count, I figured with the patch v1 (patch I suggested
> earlier) free_count is going into negatives which reduces the number of
> times free_high heuristics is triggered hence reducing the high order
> flushes.
>
> As Ying Huang Suggested, it helps the performance on increasing the batch size
> for free_high heuristics. I tried different scaling factors to find best
> suitable batch value for free_high heuristics,
>
>
> score # free_high
> ----------- ----- -----------
> v6.6 (base) 100 4
> v6.12 (batch*1) 69 170
> batch*2 69 150
> batch*4 74 101
> batch*5 100 53
> batch*6 100 36
> batch*8 100 3
>
> scaling batch for free_high heuristics with a factor of 5 restores the
> performance.
Hello Nikhil,
Thanks for looking further on this. But from design standpoint,
how a batch-size of 5 is helping here is not clear (Andrew's original
question).
Any case can you post the patch-set in a new email so that the below
patch is not lost in discussion thread?
>
> On AMD 2-node machine, score for other benchmarks with patch v2
> are as follows:
>
> iperf3 lmbench3 netperf kbuild
> (AF_UNIX) (SCTP_STREAM_MANY)
> ------- --------- ----------------- ------
> v6.6 (base) 100 100 100 100
> v6.12 69 113 98.5 98.8
> v6.12 with patch v2 100 112.5 100.1 99.6
>
> for network workloads, clients and server are running on different
> machines conneted via Mellanox Connect-7 NIC.
>
> number of free_high:
> iperf3 lmbench3 netperf kbuild
> (AF_UNIX) (SCTP_STREAM_MANY)
> ------- --------- ----------------- ------
> v6.6 (base) 5 12 6 2
> v6.12 170 11 92 2
> v6.12 with patch v2 58 11 34 2
>
>
> Signed-off-by: Nikhil Dhama <nikhil.dhama@....com>
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> Cc: Ying Huang <huang.ying.caritas@...il.com>
> Cc: linux-mm@...ck.org
> Cc: linux-kernel@...r.kernel.org
> Cc: Bharata B Rao <bharata@....com>
> Cc: Raghavendra <raghavendra.kodsarathimmappa@....com>
> ---
> mm/page_alloc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b6958333054d..326d5fbae353 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2617,7 +2617,7 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp,
> * stops will be drained from vmstat refresh context.
> */
> if (order && order <= PAGE_ALLOC_COSTLY_ORDER) {
> - free_high = (pcp->free_count >= batch &&
> + free_high = (pcp->free_count >= (batch*5) &&
> (pcp->flags & PCPF_PREV_FREE_HIGH_ORDER) &&
> (!(pcp->flags & PCPF_FREE_HIGH_BATCH) ||
> pcp->count >= READ_ONCE(batch)));
Powered by blists - more mailing lists