lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230601203127.GY4253@hirez.programming.kicks-ass.net>
Date:   Thu, 1 Jun 2023 22:31:27 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Steven Noonan <steven@...inklabs.net>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Muhammad Usama Anjum <usama.anjum@...labora.com>,
        Jonathan Corbet <corbet@....net>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" <x86@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>,
        "open list:DOCUMENTATION" <linux-doc@...r.kernel.org>,
        open list <linux-kernel@...r.kernel.org>,
        "Guilherme G. Piccoli" <gpiccoli@...lia.com>, kernel@...labora.com
Subject: Re: Direct rdtsc call side-effect

On Thu, Jun 01, 2023 at 07:07:38PM +0000, Steven Noonan wrote:
> One issue is how much overhead it has. This is an instruction that
> normally executes in roughly 50 clock cycles (RDTSC) to 100 clock
> cycles (RDTSCP) on Zen 3. Based on a proof-of-concept I wrote, the
> overhead of trapping and emulating with a signal handler is roughly
> 100x. On my Zen 3 system, it goes up to around 10000 clock cycles per
> trapped read of RDTSCP.

What about kernel based emulation? You could tie it into user_dispatch
and have a user_dispatch tsc offset.

So regular kernel emulation simply returns the native value (keeps the
VDSO working for one), but then from a user_dispatch range, it returns
+offset.

That is; how slow is the below?

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 58b1f208eff5..18175b45db1f 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -645,6 +645,25 @@ static bool fixup_iopl_exception(struct pt_regs *regs)
 	return true;
 }
 
+static bool fixup_rdtsc_exception(struct pt_regs *regs)
+{
+	unsigned short bytes;
+	u32 eax, edx;
+
+	if (get_user(bytes, (const short __user *)ip))
+		return false;
+
+	if (bytes != 0x0f31)
+		return false;
+
+	asm volatile ("rdtsc", "=a" (eax), "=d" (edx));
+	regs->ax = eax;
+	regs->dx = edx;
+
+	regs->ip += 2;
+	return true;
+}
+
 /*
  * The unprivileged ENQCMD instruction generates #GPs if the
  * IA32_PASID MSR has not been populated.  If possible, populate
@@ -752,6 +771,9 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
 		if (fixup_iopl_exception(regs))
 			goto exit;
 
+		if (fixup_rdtsc_exception(regs))
+			goto exit;
+
 		if (fixup_vdso_exception(regs, X86_TRAP_GP, error_code, 0))
 			goto exit;
 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ