[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
<SN6PR02MB4157B6A9C8BEFA312F0D9D68D499A@SN6PR02MB4157.namprd02.prod.outlook.com>
Date: Thu, 5 Feb 2026 18:55:17 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Jan Kiszka <jan.kiszka@...mens.com>, "K. Y. Srinivasan"
<kys@...rosoft.com>, Haiyang Zhang <haiyangz@...rosoft.com>, Wei Liu
<wei.liu@...nel.org>, Dexuan Cui <decui@...rosoft.com>, Long Li
<longli@...rosoft.com>, Thomas Gleixner <tglx@...nel.org>, Ingo Molnar
<mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, Dave Hansen
<dave.hansen@...ux.intel.com>, "x86@...nel.org" <x86@...nel.org>
CC: "linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Florian
Bezdeka <florian.bezdeka@...mens.com>, RT <linux-rt-users@...r.kernel.org>,
Mitchell Levy <levymitchell0@...il.com>, "skinsburskii@...ux.microsoft.com"
<skinsburskii@...ux.microsoft.com>, "mrathor@...ux.microsoft.com"
<mrathor@...ux.microsoft.com>, "anirudh@...rudhrb.com"
<anirudh@...rudhrb.com>, "schakrabarti@...ux.microsoft.com"
<schakrabarti@...ux.microsoft.com>, "ssengar@...ux.microsoft.com"
<ssengar@...ux.microsoft.com>
Subject: RE: [PATCH] x86: mshyperv: Use kthread for vmbus interrupts on
PREEMPT_RT
From: Jan Kiszka <jan.kiszka@...mens.com> Sent: Tuesday, February 3, 2026 8:02 AM
>
> Resolves the following lockdep report when booting PREEMPT_RT on Hyper-V
> with related guest support enabled:
>
> [ 1.127941] hv_vmbus: registering driver hyperv_drm
>
> [ 1.132518] =============================
> [ 1.132519] [ BUG: Invalid wait context ]
> [ 1.132521] 6.19.0-rc8+ #9 Not tainted
> [ 1.132524] -----------------------------
> [ 1.132525] swapper/0/0 is trying to lock:
> [ 1.132526] ffff8b9381bb3c90 (&channel->sched_lock){....}-{3:3}, at: vmbus_chan_sched+0xc4/0x2b0
> [ 1.132543] other info that might help us debug this:
> [ 1.132544] context-{2:2}
> [ 1.132545] 1 lock held by swapper/0/0:
> [ 1.132547] #0: ffffffffa010c4c0 (rcu_read_lock){....}-{1:3}, at: vmbus_chan_sched+0x31/0x2b0
> [ 1.132557] stack backtrace:
> [ 1.132560] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.19.0-rc8+ #9 PREEMPT_{RT,(lazy)}
> [ 1.132565] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/25/2025
> [ 1.132567] Call Trace:
> [ 1.132570] <IRQ>
> [ 1.132573] dump_stack_lvl+0x6e/0xa0
> [ 1.132581] __lock_acquire+0xee0/0x21b0
> [ 1.132592] lock_acquire+0xd5/0x2d0
> [ 1.132598] ? vmbus_chan_sched+0xc4/0x2b0
> [ 1.132606] ? lock_acquire+0xd5/0x2d0
> [ 1.132613] ? vmbus_chan_sched+0x31/0x2b0
> [ 1.132619] rt_spin_lock+0x3f/0x1f0
> [ 1.132623] ? vmbus_chan_sched+0xc4/0x2b0
> [ 1.132629] ? vmbus_chan_sched+0x31/0x2b0
> [ 1.132634] vmbus_chan_sched+0xc4/0x2b0
> [ 1.132641] vmbus_isr+0x2c/0x150
> [ 1.132648] __sysvec_hyperv_callback+0x5f/0xa0
> [ 1.132654] sysvec_hyperv_callback+0x88/0xb0
> [ 1.132658] </IRQ>
> [ 1.132659] <TASK>
> [ 1.132660] asm_sysvec_hyperv_callback+0x1a/0x20
>
> As code paths that handle vmbus IRQs use sleepy locks under PREEMPT_RT,
> the complete vmbus_handler execution needs to be moved into thread
> context. Open-coding this allows to skip the IPI that irq_work would
> additionally bring and which we do not need, being an IRQ, never an NMI.
>
> Signed-off-by: Jan Kiszka <jan.kiszka@...mens.com>
> ---
>
> This should resolve what was once brought forward via [1]. If it
> actually resolves all remaining compatibility issues of the hyperv
> support with RT is not yet clear, though. So far, lockdep is happy when
> using this plus [2].
>
> [1] https://lore.kernel.org/all/20230809-b4-rt_preempt-fix-v1-0-7283bbdc8b14@gmail.com/
> [2] https://lore.kernel.org/lkml/0c7fb5cd-fb21-4760-8593-e04bade84744@siemens.com/
>
> arch/x86/kernel/cpu/mshyperv.c | 52 ++++++++++++++++++++++++++++++++--
You've added this code under arch/x86. But isn't it architecture independent? I
think it should also work on arm64. If that's the case, the code should probably
be added to drivers/hv/vmbus_drv.c instead.
> 1 file changed, 50 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 579fb2c64cfd..1194ca452c52 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -17,6 +17,7 @@
> #include <linux/irq.h>
> #include <linux/kexec.h>
> #include <linux/random.h>
> +#include <linux/smpboot.h>
> #include <asm/processor.h>
> #include <asm/hypervisor.h>
> #include <hyperv/hvhdk.h>
> @@ -150,6 +151,43 @@ static void (*hv_stimer0_handler)(void);
> static void (*hv_kexec_handler)(void);
> static void (*hv_crash_handler)(struct pt_regs *regs);
>
> +static DEFINE_PER_CPU(bool, vmbus_irq_pending);
> +static DEFINE_PER_CPU(struct task_struct *, vmbus_irqd);
> +
> +static void vmbus_irqd_wake(void)
> +{
> + struct task_struct *tsk = __this_cpu_read(vmbus_irqd);
> +
> + __this_cpu_write(vmbus_irq_pending, true);
> + wake_up_process(tsk);
> +}
> +
> +static void vmbus_irqd_setup(unsigned int cpu)
> +{
> + sched_set_fifo(current);
> +}
> +
> +static int vmbus_irqd_should_run(unsigned int cpu)
> +{
> + return __this_cpu_read(vmbus_irq_pending);
> +}
> +
> +static void run_vmbus_irqd(unsigned int cpu)
> +{
> + vmbus_handler();
> + __this_cpu_write(vmbus_irq_pending, false);
> +}
The two statements in this function should be swapped. This function
runs with pre-emption enabled and interrupts enabled. If a VMBus
interrupt comes in as vmbus_handler() is finishing, vmbus_irqd_wake()
will run and set vmbus_irq_pending to "true". This function will then set
vmbus_irq_pending to 'false", wiping out the "true" setting. The hotplug
thread will decide it doesn't need to run again, and whatever generated
the new interrupt doesn't get processed (at least until another interrupt
comes in).
This scenario could specifically happen because of the way VMBus messages
are processed. The vmbus_handler function calls vmbus_message_sched(),
which always processes a single message. When that message is handled,
Hyper-V sends the next message that may have been queued up, and
generates another interrupt to the guest VM. There's no looping in the Linux
code to process all messages, so Linux depends on getting a new interrupt for
each subsequent message in order to run vmbus_message_sched() again.
There might be a similar situation with vmbus_chan_sched() and channel
interrupts. There are three interrupt handling modes across multiple VMBus
devices, and it would take some additional sleuthing to see if any of them
depend on a similar scheme of needing a new interrupt for each channel
event.
Please double-check my thinking. The likelihood of the problem occurring
is very low, because VMBus messages generally are used only when VMBus
devices are being added (or removed), which is usually during boot, and
the timing window must be hit just right. But the fix is simple, so it should
be done.
Michael
> +
> +static bool vmbus_irq_initialized;
> +
> +static struct smp_hotplug_thread vmbus_irq_threads = {
> + .store = &vmbus_irqd,
> + .setup = vmbus_irqd_setup,
> + .thread_should_run = vmbus_irqd_should_run,
> + .thread_fn = run_vmbus_irqd,
> + .thread_comm = "vmbus_irq/%u",
> +};
> +
> DEFINE_IDTENTRY_SYSVEC(sysvec_hyperv_callback)
> {
> struct pt_regs *old_regs = set_irq_regs(regs);
> @@ -158,8 +196,12 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_hyperv_callback)
> if (mshv_handler)
> mshv_handler();
>
> - if (vmbus_handler)
> - vmbus_handler();
> + if (vmbus_handler) {
> + if (IS_ENABLED(CONFIG_PREEMPT_RT))
> + vmbus_irqd_wake();
> + else
> + vmbus_handler();
> + }
>
> if (ms_hyperv.hints & HV_DEPRECATING_AEOI_RECOMMENDED)
> apic_eoi();
> @@ -174,6 +216,10 @@ void hv_setup_mshv_handler(void (*handler)(void))
>
> void hv_setup_vmbus_handler(void (*handler)(void))
> {
> + if (IS_ENABLED(CONFIG_PREEMPT_RT) && !vmbus_irq_initialized) {
> + BUG_ON(smpboot_register_percpu_thread(&vmbus_irq_threads));
> + vmbus_irq_initialized = true;
> + }
> vmbus_handler = handler;
> }
>
> @@ -181,6 +227,8 @@ void hv_remove_vmbus_handler(void)
> {
> /* We have no way to deallocate the interrupt gate */
> vmbus_handler = NULL;
> + smpboot_unregister_percpu_thread(&vmbus_irq_threads);
> + vmbus_irq_initialized = false;
> }
>
> /*
> --
> 2.51.0
Powered by blists - more mailing lists