linux-kernel - Re: [PATCH] kvm: hyper-v: Delay firing of expired stimers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <87bjigaeyx.fsf@redhat.com>
Date: Mon, 26 Jan 2026 10:41:10 +0100
From: Vitaly Kuznetsov <vkuznets@...hat.com>
To: Alexander Graf <graf@...zon.com>, Sean Christopherson <seanjc@...gle.com>
Cc: kvm@...r.kernel.org, linux-kernel@...r.kernel.org, hpa@...or.com,
 x86@...nel.org, Paolo Bonzini <pbonzini@...hat.com>,
 nh-open-source@...zon.com, gurugubs@...zon.com, jalliste@...zon.co.uk,
 Michael Kelley <mhklinux@...look.com>, John Starks
 <jostarks@...rosoft.com>
Subject: Re: [PATCH] kvm: hyper-v: Delay firing of expired stimers

Alexander Graf <graf@...zon.com> writes:

> On 23.01.26 19:21, Sean Christopherson wrote:
>> On Thu, Jan 15, 2026, Alexander Graf wrote:
>>> During Windows Server 2025 hibernation, I have seen Windows' calculation
>>> of interrupt target time get skewed over the hypervisor view of the same.
>>> This can cause Windows to emit timer events in the past for events that
>>> do not fire yet according to the real time source. This then leads to
>>> interrupt storms in the guest which slow down execution to a point where
>>> watchdogs trigger. Those manifest as bugchecks 0x9f and 0xa0 during
>>> hibernation, typically in the resume path.
>>>
>>> To work around this problem, we can delay timers that get created with a
>>> target time in the past by a tiny bit (10µs) to give the guest CPU time
>>> to process real work and make forward progress, hopefully recovering its
>>> interrupt logic in the process. While this small delay can marginally
>>> reduce accuracy of guest timers, 10µs are within the noise of VM
>>> entry/exit overhead (~1-2 µs) so I do not expect to see real world impact.
>> There is a lot of hope piled into this.  And *always* padding the count makes me
>> more than a bit uncomfortable.  If the skew is really due to a guest bug and not
>> something on the host's side, i.e. if this isn't just a symptom of a real bug that
>> can be fixed and the _only_ option is to chuck in a workaround, then I would
>> strongly prefer to be as conservative as possible.  E.g. is it possible to
>> precisely detect this scenario and only add the delay when the guest appears to
>> be stuck?
>
>
> This patch only pads when a timer is in the past, which I have not seen 
> happen much on real systems. Usually you're trying to configure a timer 
> for the future :).
>
> That said, I have continued digging deeper since I posted this patch and 
> I'm still trying to wrap my head around under which exact conditions any 
> of this really does happen. Let's put this patch on hold until I have a 
> more reliable reproducer.

My bet goes to the clocksource switch, e.g. the guest disables (or just
stops using, good luck detecting that :-) ) TSC page and uses raw TSC
for some period or something. 

I remember we had to add some fairly ugly hacks where we also "piled a
log of hope", e.g.:

commit 0469f2f7ab4c6a6cae4b74c4f981c4da6d909411
Author: Vitaly Kuznetsov <vkuznets@...hat.com>
Date:   Tue Mar 16 15:37:36 2021 +0100

    KVM: x86: hyper-v: Don't touch TSC page values when guest opted for re-enlightenment

Also, AFAIR we don't currently implement "Synthetic Time-Unhalted Timer"
from TLFS and who knows, maybe Windows' behavior is going to change when
we do...

-- 
Vitaly