[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <769f538d-dd42-4d36-a4c5-7e6e48b209f6@amazon.com>
Date: Sat, 24 Jan 2026 22:26:47 +0100
From: Alexander Graf <graf@...zon.com>
To: Sean Christopherson <seanjc@...gle.com>
CC: <kvm@...r.kernel.org>, <linux-kernel@...r.kernel.org>, <hpa@...or.com>,
<x86@...nel.org>, Paolo Bonzini <pbonzini@...hat.com>, Vitaly Kuznetsov
<vkuznets@...hat.com>, <nh-open-source@...zon.com>, <gurugubs@...zon.com>,
<jalliste@...zon.co.uk>, Michael Kelley <mhklinux@...look.com>, John Starks
<jostarks@...rosoft.com>
Subject: Re: [PATCH] kvm: hyper-v: Delay firing of expired stimers
On 23.01.26 19:21, Sean Christopherson wrote:
> On Thu, Jan 15, 2026, Alexander Graf wrote:
>> During Windows Server 2025 hibernation, I have seen Windows' calculation
>> of interrupt target time get skewed over the hypervisor view of the same.
>> This can cause Windows to emit timer events in the past for events that
>> do not fire yet according to the real time source. This then leads to
>> interrupt storms in the guest which slow down execution to a point where
>> watchdogs trigger. Those manifest as bugchecks 0x9f and 0xa0 during
>> hibernation, typically in the resume path.
>>
>> To work around this problem, we can delay timers that get created with a
>> target time in the past by a tiny bit (10µs) to give the guest CPU time
>> to process real work and make forward progress, hopefully recovering its
>> interrupt logic in the process. While this small delay can marginally
>> reduce accuracy of guest timers, 10µs are within the noise of VM
>> entry/exit overhead (~1-2 µs) so I do not expect to see real world impact.
> There is a lot of hope piled into this. And *always* padding the count makes me
> more than a bit uncomfortable. If the skew is really due to a guest bug and not
> something on the host's side, i.e. if this isn't just a symptom of a real bug that
> can be fixed and the _only_ option is to chuck in a workaround, then I would
> strongly prefer to be as conservative as possible. E.g. is it possible to
> precisely detect this scenario and only add the delay when the guest appears to
> be stuck?
This patch only pads when a timer is in the past, which I have not seen
happen much on real systems. Usually you're trying to configure a timer
for the future :).
That said, I have continued digging deeper since I posted this patch and
I'm still trying to wrap my head around under which exact conditions any
of this really does happen. Let's put this patch on hold until I have a
more reliable reproducer.
Alex
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
Powered by blists - more mailing lists