lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4bac07dc91de1c45720f36ed6b797d3f215f131a.camel@infradead.org>
Date: Thu, 01 Aug 2024 16:22:35 +0100
From: David Woodhouse <dwmw2@...radead.org>
To: mikelley@...rosoft.com, kvm <kvm@...r.kernel.org>
Cc: bp@...en8.de, dave.hansen@...ux.intel.com, decui@...rosoft.com, 
	haiyangz@...rosoft.com, kys@...rosoft.com, linux-hyperv@...r.kernel.org, 
	linux-kernel@...r.kernel.org, lirongqing@...du.com, mingo@...hat.com, 
	seanjc@...gle.com, tglx@...utronix.de, wei.liu@...nel.org, x86@...nel.org
Subject: Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in
 shutdown

On Thu, 2024-08-01 at 10:00 +0100, David Woodhouse wrote:
> > 
> > 
> > 
> > On 2023-02-08 at 01:04:19 +0000, "Michael Kelley (LINUX)" <mikelley@...rosoft.com> said:
> > > > From: lirongqing@...du.com <lirongqing@...du.com> Sent: Monday, February 6, 2023 5:15 PM
> > > > > > 
> > > > > > Zeroing the counter register in pit_shutdown() isn't actually supposed to
> > > > > > stop it from counting,  will causes the PIT to start running again,
> > > > > > From the spec:
> > > > > > 
> > > > > >   The largest possible initial count is 0; this is equivalent to 216 for
> > > > > >   binary counting and 104 for BCD counting.
> > > > > > 
> > > > > >   The Counter does not stop when it reaches zero. In Modes 0, 1, 4, and 5 the
> > > > > >   Counter "wraps around" to the highest count, either FFFF hex for binary
> > > > > >   count- ing or 9999 for BCD counting, and continues counting.
> > > > > > 
> > > > > >   Mode 0 is typically used for event counting. After the Control Word is
> > > > > >   written, OUT is initially low, and will remain low until the Counter
> > > > > >   reaches zero. OUT then goes high and remains high until a new count or a
> > > > > >   new Mode 0 Control Word is written into the Counter.
> > > > > > 
> > > > > > Hyper-V and KVM follow the spec, the issue that 35b69a42 "(clockevents/drivers/
> > > > > > i8253: Add support for PIT shutdown quirk") fixed is in i8253 drivers, not Hyper-v,
> > > > > > so delete the zero timer counter register in shutdown, and delete PIT shutdown
> > > > > > quirk for Hyper-v
> > > > 
> > > > From the standpoint of Hyper-V, I'm good with this change.  But there's a
> > > > risk that old hardware might not be compliant with the spec, and needs the
> > > > zero'ing for some reason. The experts in the x86 space will be in the best
> > > > position to assess the risk.  At the time, the quirk approach was taken so
> > > > the change applied only to Hyper-V, and any such risk was avoided.
> > > > 
> > > > For Hyper-V,
> > > > Reviewed-by: Michael Kelley <mikelley@...rosoft.com>
> > 
> > It's not entirely clear why we zero it at all. What was it supposed to
> > achieve?

Answering my own question here, it seems that *some* implementations
don't apply the mode change until the counter is subsequently written.

I think QEMU applies it immediately. But the KVM in-kernel one (and the
AWS Nitro Hypervisor) do not. So the interrupt doesn't shut up until
you write the counter.

I don't see any justification for writing the counter causing any more
than *one* further interrupt though, in single-shot mode. If that's
what Hyper-V does, that seems like a bug (which is worked around
already).

I do see the VMM wasting a bunch of time emulating a PIT IRQ that goes
nowhere, but that seems to be because the bootloader or BIOS turns it
on, and nothing in Linux ever turns it back *off* again unless it's
registered at boot and then pit_shutdown() is called when we switch to
something better.

I'll look at an optimisation in the VMM which can stop firing the timer
if the PIT IRQ is already masked or pending in the PIC and IOAPIC;
that's just a waste of steal time.

But the guest kernel should probably do something like this...

diff --git a/arch/x86/kernel/i8253.c b/arch/x86/kernel/i8253.c
index 2b7999a1a50a..d13fd793254e 100644
--- a/arch/x86/kernel/i8253.c
+++ b/arch/x86/kernel/i8253.c
@@ -8,6 +8,7 @@
 #include <linux/timex.h>
 #include <linux/i8253.h>
 
+#include <asm/hypervisor.h>
 #include <asm/apic.h>
 #include <asm/hpet.h>
 #include <asm/time.h>
@@ -39,8 +40,29 @@ static bool __init use_pit(void)
 
 bool __init pit_timer_init(void)
 {
-	if (!use_pit())
+	if (!use_pit()) {
+		if (!hypervisor_is_type(X86_HYPER_NATIVE)) {
+			/*
+			 * Don't just ignore the PIT. Ensure it's stopped,
+			 * because VMMs otherwise steal CPU time just to
+			 * pointlessly waggle the (masked) IRQ.
+			 */
+			raw_spin_lock(&i8253_lock);
+			outb_p(0x30, PIT_MODE);
+
+			/*
+			 * It's not entirely clear from the datasheet, but some
+			 * virtual implementations don't stop until the counter
+			 * is actually written.
+			 */
+			if (i8253_clear_counter_on_shutdown) {
+				outb_p(0, PIT_CH0);
+				outb_p(0, PIT_CH0);
+			}
+			raw_spin_unlock(&i8253_lock);
+		}
 		return false;
+	}
 
 	clockevent_i8253_init(true);
 	global_clock_event = &i8253_clockevent;
> 



Download attachment "smime.p7s" of type "application/pkcs7-signature" (5965 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ