[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
<SN6PR02MB4157DB59F0F7BFBF56612651D465A@SN6PR02MB4157.namprd02.prod.outlook.com>
Date: Mon, 9 Feb 2026 18:25:59 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Florian Bezdeka <florian.bezdeka@...mens.com>, Jan Kiszka
<jan.kiszka@...mens.com>, "K. Y. Srinivasan" <kys@...rosoft.com>, Haiyang
Zhang <haiyangz@...rosoft.com>, Wei Liu <wei.liu@...nel.org>, Dexuan Cui
<decui@...rosoft.com>, Long Li <longli@...rosoft.com>, Thomas Gleixner
<tglx@...nel.org>, Ingo Molnar <mingo@...hat.com>, Borislav Petkov
<bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>, "x86@...nel.org"
<x86@...nel.org>
CC: "linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, RT
<linux-rt-users@...r.kernel.org>, Mitchell Levy <levymitchell0@...il.com>,
"skinsburskii@...ux.microsoft.com" <skinsburskii@...ux.microsoft.com>,
"mrathor@...ux.microsoft.com" <mrathor@...ux.microsoft.com>,
"anirudh@...rudhrb.com" <anirudh@...rudhrb.com>,
"schakrabarti@...ux.microsoft.com" <schakrabarti@...ux.microsoft.com>,
"ssengar@...ux.microsoft.com" <ssengar@...ux.microsoft.com>
Subject: RE: [PATCH] x86: mshyperv: Use kthread for vmbus interrupts on
PREEMPT_RT
From: Florian Bezdeka <florian.bezdeka@...mens.com> Sent: Monday, February 9, 2026 2:35 AM
>
> On Sat, 2026-02-07 at 01:30 +0000, Michael Kelley wrote:
>
> [snip]
> >
> > I've run your suggested experiment on an arm64 VM in the Azure cloud. My
> > kernel was linux-next 20260128. I set CONFIG_PREEMPT_RT=y and
> > CONFIG_PROVE_LOCKING=y, but did not add either of your two patches
> > (neither the storvsc driver patch nor the x86 VMBus interrupt handling patch).
> > The VM comes up and runs, but with this warning during boot:
> >
> > [ 3.075604] hv_utils: Registering HyperV Utility Driver
> > [ 3.075636] hv_vmbus: registering driver hv_utils
> > [ 3.085920] =============================
> > [ 3.088128] hv_vmbus: registering driver hv_netvsc
> > [ 3.091180] [ BUG: Invalid wait context ]
> > [ 3.093544] 6.19.0-rc7-next-20260128+ #3 Tainted: G E
> > [ 3.097582] -----------------------------
> > [ 3.099899] systemd-udevd/284 is trying to lock:
> > [ 3.102568] ffff000100e24490 (&channel->sched_lock){....}-{3:3}, at: vmbus_chan_sched+0x128/0x3b8 [hv_vmbus]
> > [ 3.108208] other info that might help us debug this:
> > [ 3.111454] context-{2:2}
> > [ 3.112987] 1 lock held by systemd-udevd/284:
> > [ 3.115626] #0: ffffd5cfc20bcc80 (rcu_read_lock){....}-{1:3}, at: vmbus_chan_sched+0xcc/0x3b8 [hv_vmbus]
> > [ 3.121224] stack backtrace:
> > [ 3.122897] CPU: 0 UID: 0 PID: 284 Comm: systemd-udevd Tainted: G E 6.19.0-rc7-next-20260128+ #3 PREEMPT_RT
> > [ 3.129631] Tainted: [E]=UNSIGNED_MODULE
> > [ 3.131946] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 06/10/2025
> > [ 3.138553] Call trace:
> > [ 3.140015] show_stack+0x20/0x38 (C)
> > [ 3.142137] dump_stack_lvl+0x9c/0x158
> > [ 3.144340] dump_stack+0x18/0x28
> > [ 3.146290] __lock_acquire+0x488/0x1e20
> > [ 3.148569] lock_acquire+0x11c/0x388
> > [ 3.150703] rt_spin_lock+0x54/0x230
> > [ 3.152785] vmbus_chan_sched+0x128/0x3b8 [hv_vmbus]
> > [ 3.155611] vmbus_isr+0x34/0x80 [hv_vmbus]
> > [ 3.158093] vmbus_percpu_isr+0x18/0x30 [hv_vmbus]
> > [ 3.160848] handle_percpu_devid_irq+0xdc/0x348
> > [ 3.163495] handle_irq_desc+0x48/0x68
> > [ 3.165851] generic_handle_domain_irq+0x20/0x38
> > [ 3.168664] gic_handle_irq+0x1dc/0x430
> > [ 3.170868] call_on_irq_stack+0x30/0x70
> > [ 3.173161] do_interrupt_handler+0x88/0xa0
> > [ 3.175724] el1_interrupt+0x4c/0xb0
> > [ 3.177855] el1h_64_irq_handler+0x18/0x28
> > [ 3.180332] el1h_64_irq+0x84/0x88
> > [ 3.182378] _raw_spin_unlock_irqrestore+0x4c/0xb0 (P)
> > [ 3.185493] rt_mutex_slowunlock+0x404/0x440
> > [ 3.187951] rt_spin_unlock+0xb8/0x178
> > [ 3.190394] kmem_cache_alloc_noprof+0xf0/0x4f8
> > [ 3.193100] alloc_empty_file+0x64/0x148
> > [ 3.195461] path_openat+0x58/0xaa0
> > [ 3.197658] do_file_open+0xa0/0x140
> > [ 3.199752] do_sys_openat2+0x190/0x278
> > [ 3.202124] do_sys_open+0x60/0xb8
> > [ 3.204047] __arm64_sys_openat+0x2c/0x48
> > [ 3.206433] invoke_syscall+0x6c/0xf8
> > [ 3.208519] el0_svc_common.constprop.0+0x48/0xf0
> > [ 3.211050] do_el0_svc+0x24/0x38
> > [ 3.212990] el0_svc+0x164/0x3c8
> > [ 3.214842] el0t_64_sync_handler+0xd0/0xe8
> > [ 3.217251] el0t_64_sync+0x1b0/0x1b8
> > [ 3.219450] hv_utils: Heartbeat IC version 3.0
> > [ 3.219471] hv_utils: Shutdown IC version 3.2
> > [ 3.219844] hv_utils: TimeSync IC version 4.0
>
> That matches with my expectation that the same problem exists on arm64.
> The patch from Jan addresses that issue for x86 (only, so far) as we do
> not have a working test environment for arm64 yet.
OK. I had understood Jan's earlier comments to mean that the VMBus
interrupt problem was implicitly solved on arm64 because of VMBus using
a standard Linux IRQ on arm64. But evidently that's not the case. So my
earlier comment stands: The code changes should go into the architecture
independent portion of the VMBus driver, and not under arch/x86. I
can probably work with you to test on arm64 if need be.
>
> >
> > I don't see an indication that vmbus_isr() has been offloaded from
> > interrupt level onto a thread. The stack starting with el1h_64_irq()
> > and going forward is the stack for normal per-cpu interrupt handling.
> > Maybe arm64 with PREEMPT_RT does the offload to a thread only
> > for SPIs and LPIs, but not for PPIs? I haven't looked at the source code
> > for how PREEMPT_RT affects arm64 interrupt handling.
> >
> > Also, I had expected to see a problem with storvsc because I did
> > not apply your storvsc patch. But there was no such problem, even
> > with some disk I/O load (read only). arm64 VMs in Azure use exactly
> > the same virtual SCSI devices that are used with x86 VMs in Azure or
> > on local Hyper-V. I don't have an explanation. Will think about it.
> >
>
> Running the --iomix stressor provided by stress-ng was able to trigger
> the SCSI problem within 2 minutes. The result was a completely frozen
> system. For completeness the complete stress-ng command line:
>
> # stress-ng --cpu 2 --iomix 8 --vm 2 --vm-bytes 128M --fork 4
>
Thanks!
Yes, that command line reproduced the storvsc problem on arm64. And
then applying the storvsc patch made the problem go away. FWIW, on
arm64 Linux recovered and kept running after hitting the storvsc
problem.
Michael
Powered by blists - more mailing lists