[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4bac07dc91de1c45720f36ed6b797d3f215f131a.camel@infradead.org>
Date: Thu, 01 Aug 2024 16:22:35 +0100
From: David Woodhouse <dwmw2@...radead.org>
To: mikelley@...rosoft.com, kvm <kvm@...r.kernel.org>
Cc: bp@...en8.de, dave.hansen@...ux.intel.com, decui@...rosoft.com,
haiyangz@...rosoft.com, kys@...rosoft.com, linux-hyperv@...r.kernel.org,
linux-kernel@...r.kernel.org, lirongqing@...du.com, mingo@...hat.com,
seanjc@...gle.com, tglx@...utronix.de, wei.liu@...nel.org, x86@...nel.org
Subject: Re: [PATCH] clockevents/drivers/i8253: Do not zero timer counter in
shutdown
On Thu, 2024-08-01 at 10:00 +0100, David Woodhouse wrote:
> >
> >
> >
> > On 2023-02-08 at 01:04:19 +0000, "Michael Kelley (LINUX)" <mikelley@...rosoft.com> said:
> > > > From: lirongqing@...du.com <lirongqing@...du.com> Sent: Monday, February 6, 2023 5:15 PM
> > > > > >
> > > > > > Zeroing the counter register in pit_shutdown() isn't actually supposed to
> > > > > > stop it from counting, will causes the PIT to start running again,
> > > > > > From the spec:
> > > > > >
> > > > > > The largest possible initial count is 0; this is equivalent to 216 for
> > > > > > binary counting and 104 for BCD counting.
> > > > > >
> > > > > > The Counter does not stop when it reaches zero. In Modes 0, 1, 4, and 5 the
> > > > > > Counter "wraps around" to the highest count, either FFFF hex for binary
> > > > > > count- ing or 9999 for BCD counting, and continues counting.
> > > > > >
> > > > > > Mode 0 is typically used for event counting. After the Control Word is
> > > > > > written, OUT is initially low, and will remain low until the Counter
> > > > > > reaches zero. OUT then goes high and remains high until a new count or a
> > > > > > new Mode 0 Control Word is written into the Counter.
> > > > > >
> > > > > > Hyper-V and KVM follow the spec, the issue that 35b69a42 "(clockevents/drivers/
> > > > > > i8253: Add support for PIT shutdown quirk") fixed is in i8253 drivers, not Hyper-v,
> > > > > > so delete the zero timer counter register in shutdown, and delete PIT shutdown
> > > > > > quirk for Hyper-v
> > > >
> > > > From the standpoint of Hyper-V, I'm good with this change. But there's a
> > > > risk that old hardware might not be compliant with the spec, and needs the
> > > > zero'ing for some reason. The experts in the x86 space will be in the best
> > > > position to assess the risk. At the time, the quirk approach was taken so
> > > > the change applied only to Hyper-V, and any such risk was avoided.
> > > >
> > > > For Hyper-V,
> > > > Reviewed-by: Michael Kelley <mikelley@...rosoft.com>
> >
> > It's not entirely clear why we zero it at all. What was it supposed to
> > achieve?
Answering my own question here, it seems that *some* implementations
don't apply the mode change until the counter is subsequently written.
I think QEMU applies it immediately. But the KVM in-kernel one (and the
AWS Nitro Hypervisor) do not. So the interrupt doesn't shut up until
you write the counter.
I don't see any justification for writing the counter causing any more
than *one* further interrupt though, in single-shot mode. If that's
what Hyper-V does, that seems like a bug (which is worked around
already).
I do see the VMM wasting a bunch of time emulating a PIT IRQ that goes
nowhere, but that seems to be because the bootloader or BIOS turns it
on, and nothing in Linux ever turns it back *off* again unless it's
registered at boot and then pit_shutdown() is called when we switch to
something better.
I'll look at an optimisation in the VMM which can stop firing the timer
if the PIT IRQ is already masked or pending in the PIC and IOAPIC;
that's just a waste of steal time.
But the guest kernel should probably do something like this...
diff --git a/arch/x86/kernel/i8253.c b/arch/x86/kernel/i8253.c
index 2b7999a1a50a..d13fd793254e 100644
--- a/arch/x86/kernel/i8253.c
+++ b/arch/x86/kernel/i8253.c
@@ -8,6 +8,7 @@
#include <linux/timex.h>
#include <linux/i8253.h>
+#include <asm/hypervisor.h>
#include <asm/apic.h>
#include <asm/hpet.h>
#include <asm/time.h>
@@ -39,8 +40,29 @@ static bool __init use_pit(void)
bool __init pit_timer_init(void)
{
- if (!use_pit())
+ if (!use_pit()) {
+ if (!hypervisor_is_type(X86_HYPER_NATIVE)) {
+ /*
+ * Don't just ignore the PIT. Ensure it's stopped,
+ * because VMMs otherwise steal CPU time just to
+ * pointlessly waggle the (masked) IRQ.
+ */
+ raw_spin_lock(&i8253_lock);
+ outb_p(0x30, PIT_MODE);
+
+ /*
+ * It's not entirely clear from the datasheet, but some
+ * virtual implementations don't stop until the counter
+ * is actually written.
+ */
+ if (i8253_clear_counter_on_shutdown) {
+ outb_p(0, PIT_CH0);
+ outb_p(0, PIT_CH0);
+ }
+ raw_spin_unlock(&i8253_lock);
+ }
return false;
+ }
clockevent_i8253_init(true);
global_clock_event = &i8253_clockevent;
>
Download attachment "smime.p7s" of type "application/pkcs7-signature" (5965 bytes)
Powered by blists - more mailing lists