lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cbb812ea-4585-4777-aaae-3dcdcd0bd8d9@gmail.com>
Date: Tue, 18 Feb 2025 14:16:20 +0000
From: "Colin King (gmail)" <colin.i.king@...il.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Matthew Wilcox <willy@...radead.org>, linux-mm@...ck.org,
 kernel-janitors@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH][next] mm/mincore: improve performance by adding an
 unlikely hint

On 18/02/2025 03:13, Andrew Morton wrote:
> On Mon, 17 Feb 2025 18:00:22 +0000 "Colin King (gmail)" <colin.i.king@...il.com> wrote:
> 
>> fOn 17/02/2025 17:58, Matthew Wilcox wrote:
>>> On Mon, Feb 17, 2025 at 05:09:34PM +0000, Colin Ian King wrote:
>>>> Adding an unlikely() hint on the masked start comparison error
>>>> return path improves run-time performance of the mincore system call.
>>>>
>>>> Benchmarking on an i9-12900 shows an improvement of 7ns on mincore calls
>>>> on a 256KB mmap'd region where 50% of the pages we resident.
>>>>
>>>> Results based on running 20 tests with turbo disabled (to reduce
>>>> clock freq turbo changes), with 10 second run per test and comparing
>>>> the number of mincores calls per second. The % standard deviation of
>>>> the 20 tests was ~0.10%, so results are reliable.
>>>
>>> I think you've elided _just_ enough information here that nobody can
>>> judge whether your stats skills are any good ;-)  You've told us 7ns
>>> (per call, presumably) and you've told us 0.10% standard deviation,
>>> but you haven't told us how long the syscall takes, so nobody can tell
>>> whether 7ns is within 0.10% or not ;-)
>>
>> Ugh, my bad.
>>
>> Improvement was from ~970 down to 963 ns, so small ~0.7% improvement.
>>
> 
> It actually doesn't change the generated code:

I've compare the generated x86 object code using gcc 14.2.1 20240912 
(Fedora 41) and 14.2.0 (Debian 14.2.0-17), 14.2.1 20250211 (Clear Linux) 
and I get differences in the generated object code comparing old and 
new, and the improvement on ClearLinux is more significant too because 
it uses -O3. So I'm confident the change is generating improved object code.


> 
> hp2:/usr/src/25> diff -u mm/mincore.lst.old mm/mincore.lst
> --- mm/mincore.lst.old	2025-02-17 19:11:34.093727411 -0800
> +++ mm/mincore.lst	2025-02-17 19:12:59.797009056 -0800
> @@ -1563,7 +1563,7 @@
>   	start = untagged_addr(start);
>   
>   	/* Check the start address: needs to be page-aligned.. */
> -	if (start & ~PAGE_MASK)
> +	if (unlikely(start & ~PAGE_MASK))
>        b27:	31 ff                	xor    %edi,%edi
>   	asm (ALTERNATIVE("",
>        b29:	90                   	nop


Download attachment "OpenPGP_0x68C287DFC6A80226.asc" of type "application/pgp-keys" (4825 bytes)

Download attachment "OpenPGP_signature.asc" of type "application/pgp-signature" (841 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ