lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:   Mon, 17 Jul 2023 10:35:20 +0800
From:   Wang Jianchao <jianchwa@...look.com>
To:     seanjc@...gle.com, tglx@...utronix.de, mingo@...hat.com,
        bp@...en8.de, dave.hansen@...ux.intel.com, x86@...nel.org,
        hpa@...or.com, kvm@...r.kernel.org
Cc:     arkinjob@...look.com, zhi.wang.linux@...il.com,
        xiaoyao.li@...el.com, linux-kernel@...r.kernel.org
Subject: [RFC V3 3/6] x86/apic: switch set_next_event to lazy tscdeadline version

This is the guest side code of lazy tscdeadline. If the cpuid
tell us lazy tscdeadline is enabled, swtich .set_next_event to
lazy tscdeadline version. And Let's explain the core idea here.

Every time guest start or modify a hrtimer, we need to write the
msr of tsc deadline, a vm-exit occurs and host arms a hv or sw
timer for it. However, in some workload that needs setup timer
frequently, msr of tscdeadline is usually overwritten many times
before the timer expires.

w: write msr         x: vm-exit t:        hv or sw timer

1. write to msr with t1
Guest
         w1
---------------------------------------->  Time
Host     x1             t1
...

n. write to msr with tn
Guest
                    wn
------------------------------------------>  Time
Host                xn         tn-1 -> tn

What this patch want to do is to eliminate the vm-exit of x2 ... xn

Firstly, we have two fields shared between guest and host as other
pv features, saying,
 - armed, the value of tscdeadline that has a timer in host side,
   only updated by HOST side
 - pending, the next value of tscdeadline, only updated by GUEST
   side

1. write to msr with t1
     armed : t1     pending : t1
Guest
         w1
---------------------------------------->  Time
Host     x1             t1

vm-exit occurs and arms a timer for t1 in host side

2. write to msr with t2
    armed : t1      pending : t2
Guest
             w2
------------------------------------------>  Time
Host                     t1

the value of tsc deadline that has been armed, namely t1, is smaller
than t2, needn't to write to msr but just update pending to t2
dd
...
n.  write to msr with tn
    armed : t1      pending : tn
Guest
                      wn
------------------------------------------>  Time
Host                       t1

Similar with step 2, just update pending field with tn, no vm-exit

n+1.  t1 expires, arm tn
    armed : tn     pending : tn
Guest

------------------------------------------>  Time
Host                       t1  ------> tn

When we try to update the tscdeadline, if the 'pending' field is
smaller, then we know there is a pending timer, needn' to do msr
write.

Signed-off-by: Li Shujin <arkinjob@...look.com>
Signed-off-by: Wang Jianchao <jianchwa@...look.com>
---
 arch/x86/kernel/apic/apic.c | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index af49e24..5aea74f 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -62,6 +62,9 @@
 #include <asm/intel-family.h>
 #include <asm/irq_regs.h>
 #include <asm/cpu.h>
+#include <linux/kvm_para.h>
+
+DECLARE_PER_CPU_DECRYPTED(struct kvm_lazy_tscdeadline, kvm_lazy_tscdeadline);
 
 unsigned int num_processors;
 
@@ -495,6 +498,26 @@ static int lapic_next_deadline(unsigned long delta,
 	return 0;
 }
 
+static int kvm_lapic_next_deadline(unsigned long delta,
+			       struct clock_event_device *evt)
+{
+	struct kvm_lazy_tscdeadline *lazy_tscddl = this_cpu_ptr(&kvm_lazy_tscdeadline);
+	u64 tsc;
+
+	tsc =  rdtsc() + (((u64) delta) * TSC_DIVISOR);
+	lazy_tscddl->pending = tsc;
+	/*
+	 * There fence can have two functions:
+	 *  - avoid the wrmsrl is reordered
+	 *  - avoid the reorder of writing to pending and reading from armed
+	 */
+	weak_wrmsr_fence();
+	if (!lazy_tscddl->armed || tsc < lazy_tscddl->armed)
+		wrmsrl(MSR_IA32_TSC_DEADLINE, tsc);
+
+	return 0;
+}
+
 static int lapic_timer_shutdown(struct clock_event_device *evt)
 {
 	unsigned int v;
@@ -639,7 +662,12 @@ static void setup_APIC_timer(void)
 		levt->name = "lapic-deadline";
 		levt->features &= ~(CLOCK_EVT_FEAT_PERIODIC |
 				    CLOCK_EVT_FEAT_DUMMY);
-		levt->set_next_event = lapic_next_deadline;
+		if (kvm_para_available() &&
+		    kvm_para_has_feature(KVM_FEATURE_LAZY_TSCDEADLINE)) {
+			levt->set_next_event = kvm_lapic_next_deadline;
+		} else {
+			levt->set_next_event = lapic_next_deadline;
+		}
 		clockevents_config_and_register(levt,
 						tsc_khz * (1000 / TSC_DIVISOR),
 						0xF, ~0UL);
-- 
2.7.4

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ