linux-kernel - Re: [PATCH -V2] mm: pcp: scale batch to reduce number of high order pcp flushes on deallocation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4c40bf22-292c-4a3a-bd32-4461c2d4f7d9@amd.com>
Date: Tue, 25 Mar 2025 13:30:05 +0530
From: Raghavendra K T <raghavendra.kt@....com>
To: Nikhil Dhama <nikhil.dhama@....com>, akpm@...ux-foundation.org,
 ying.huang@...ux.alibaba.com
Cc: Ying Huang <huang.ying.caritas@...il.com>, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org, Bharata B Rao <bharata@....com>,
 Raghavendra <raghavendra.kodsarathimmappa@....com>
Subject: Re: [PATCH -V2] mm: pcp: scale batch to reduce number of high order
 pcp flushes on deallocation

On 3/19/2025 1:44 PM, Nikhil Dhama wrote:
[...]
>> And, do you run network related workloads on one machine?  If so, please
>> try to run them on two machines instead, with clients and servers run on
>> different machines.  At least, please use different sockets for clients
>> and servers.  Because larger pcp->free_count will make it easier to
>> trigger free_high heuristics.  If that is the case, please try to
>> optimize free_high heuristics directly too.
> 
> I agree with Ying Huang, the above change is not the best possible fix for
> the issue. On futher analysis I figured that root cause of the issue is
> the frequent pcp high order flushes. During a 20sec iperf3 run
> I observed on avg 5 pcp high order flushes in kernel v6.6, whereas, in
> v6.7, I observed about 170 pcp high order flushes.
> Tracing pcp->free_count, I figured with the patch v1 (patch I suggested
> earlier) free_count is going into negatives which reduces the number of
> times free_high heuristics is triggered hence reducing the high order
> flushes.
> 
> As Ying Huang Suggested, it helps the performance on increasing the batch size
> for free_high heuristics. I tried different scaling factors to find best
> suitable batch value for free_high heuristics,
> 
> 
> 			score	# free_high
> -----------		-----	-----------
> v6.6 (base)		100	 	4
> v6.12 (batch*1)		 69	      170
> batch*2			 69	      150
> batch*4			 74	      101
> batch*5			100	       53
> batch*6			100	       36
> batch*8			100		3
>    
> scaling batch for free_high heuristics with a factor of 5 restores the
> performance.

Hello Nikhil,

Thanks for looking further on this. But from design standpoint,
how a batch-size of 5 is helping here is not clear (Andrew's original
question).

Any case can you post the patch-set in a new email so that the below
patch is not lost in discussion thread?

> 
> On AMD 2-node machine, score for other benchmarks with patch v2
> are as follows:
> 
>                       iperf3    lmbench3            netperf         kbuild
>                                (AF_UNIX)      (SCTP_STREAM_MANY)
>                      -------   ---------      -----------------     ------
> v6.6 (base)            100          100                  100          100
> v6.12                   69          113                 98.5         98.8
> v6.12 with patch v2    100        112.5                100.1         99.6
> 
> for network workloads, clients and server are running on different
> machines conneted via Mellanox Connect-7 NIC.
> 
> number of free_high:
> 		     iperf3    lmbench3            netperf         kbuild
>                                (AF_UNIX)      (SCTP_STREAM_MANY)
>                      -------   ---------      -----------------     ------
> v6.6 (base)              5          12                   6           2
> v6.12                  170          11                  92           2
> v6.12 with patch v2    	58          11                	34           2
> 
> 
> Signed-off-by: Nikhil Dhama <nikhil.dhama@....com>
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> Cc: Ying Huang <huang.ying.caritas@...il.com>
> Cc: linux-mm@...ck.org
> Cc: linux-kernel@...r.kernel.org
> Cc: Bharata B Rao <bharata@....com>
> Cc: Raghavendra <raghavendra.kodsarathimmappa@....com>
> ---
>   mm/page_alloc.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b6958333054d..326d5fbae353 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2617,7 +2617,7 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp,
>   	 * stops will be drained from vmstat refresh context.
>   	 */
>   	if (order && order <= PAGE_ALLOC_COSTLY_ORDER) {
> -		free_high = (pcp->free_count >= batch &&
> +		free_high = (pcp->free_count >= (batch*5) &&
>   			     (pcp->flags & PCPF_PREV_FREE_HIGH_ORDER) &&
>   			     (!(pcp->flags & PCPF_FREE_HIGH_BATCH) ||
>   			      pcp->count >= READ_ONCE(batch)));