lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [day] [month] [year] [list]
Message-Id: <20240118165549.1935000-1-l.stach@pengutronix.de>
Date: Thu, 18 Jan 2024 17:55:49 +0100
From: Lucas Stach <l.stach@...gutronix.de>
To: Russell King <linux@...linux.org.uk>
Cc: Ard Biesheuvel <ardb@...nel.org>,
	Linus Walleij <linus.walleij@...aro.org>,
	linux-arm-kernel@...ts.infradead.org,
	linux-kernel@...r.kernel.org,
	kernel@...gutronix.de,
	patchwork-lst@...gutronix.de
Subject: [PATCH RFC] ARM: VDSO: don't drop clock_gettime when architected timer isn't available

Dropping the clock_gettime entry points when the architected timer is not
available is done to gain some efficiency, as it allows libc to fall back
to the syscall without dispatching through the vDSO.

The difference on a i.MX6 system using the vdso test utility [1]
looks like this:
$ vdsotest clock-gettime-monotonic bench -d 10

           w/o vDSO entrypoint        with vDSO entrypoint
syscall:   987 nsec/call              974 nsec/call
libc:      1095 nsec/call             1148 nsec/call
vdso:      not available              not available

Going through libc adds a ~100ns penalty compared to calling the syscall
directly. Dispatching through the vDSO adds another ~50ns, which isn't
negligible, but also not huge.

The downside of dropping the entry points is that now also the COARSE
versions of the clocks have to go through the syscall, while they can
be accelerated through the vDSO even without the architected timer when
the entry points are kept.

$ vdsotest clock-gettime-monotonic-coarse bench -d 10

           w/o vDSO entrypoint	      with vDSO	entrypoint
syscall:   659 nsec/call              662 nsec/call
libc:      772 nsec/call              137 nsec/call
vdso:      not available              63 nsec/call

This is quite a nice speedup, but arguably coarse clocks are also not
as widely used as the high-res versions. Still, this patch proposes to
to take the hit on his-res clocks by dispatching through the vDSO to gain
the ability to accelerate coarse clocks.

[1] https://github.com/nlynch-mentor/vdsotest

Signed-off-by: Lucas Stach <l.stach@...gutronix.de>
---
 arch/arm/kernel/vdso.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/arm/kernel/vdso.c b/arch/arm/kernel/vdso.c
index f297d66a8a76..947f3d8144fc 100644
--- a/arch/arm/kernel/vdso.c
+++ b/arch/arm/kernel/vdso.c
@@ -172,11 +172,8 @@ static void __init patch_vdso(void *ehdr)
 	 * want programs to incur the slight additional overhead of
 	 * dispatching through the VDSO only to fall back to syscalls.
 	 */
-	if (!cntvct_ok) {
+	if (!cntvct_ok)
 		vdso_nullpatch_one(&einfo, "__vdso_gettimeofday");
-		vdso_nullpatch_one(&einfo, "__vdso_clock_gettime");
-		vdso_nullpatch_one(&einfo, "__vdso_clock_gettime64");
-	}
 }
 
 static int __init vdso_init(void)
-- 
2.43.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ