linux-kernel - Re: Qemu KVM thread spins at 100% CPU usage on scsi hot-unplug (kernel 6.6.8 guest)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5f4dfc03-bdfc-41d1-8c5a-1e767e472a96@crc.id.au>
Date: Fri, 29 Dec 2023 16:46:50 +1100
From: Steven Haigh <netwiz@....id.au>
To: Lukas Wunner <lukas@...ner.de>
Cc: linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
 f.ebner@...xmox.com
Subject: Re: Qemu KVM thread spins at 100% CPU usage on scsi hot-unplug
 (kernel 6.6.8 guest)

On 29/12/23 00:18, Lukas Wunner wrote:
> On Thu, Dec 28, 2023 at 01:03:10PM +1100, Steven Haigh wrote:
>> At some point in kernel 6.6.x, SCSI hotplug in qemu VMs broke. This was
>> mostly fixed in the following commit to release 6.6.8:
>> 	commit 5cc8d88a1b94b900fd74abda744c29ff5845430b
>> 	Author: Bjorn Helgaas <bhelgaas@...gle.com>
>> 	Date:   Thu Dec 14 09:08:56 2023 -0600
>> 	Revert "PCI: acpiphp: Reassign resources on bridge if necessary"
>>
>> After this commit, the SCSI block device is hotplugged correctly, and a device node as /dev/sdX appears within the qemu VM.
>>
>> New problem:
>>
>> When the same SCSI block device is hot-unplugged, the QEMU KVM process will
>> spin at 100% CPU usage. The guest shows no CPU being used via top, but the
>> host will continue to spin in the KVM thread until the VM is rebooted.
> 
> Find out the PID of the qemu process on the host, then cat /proc/$PID/stack
> to see where the CPU time is spent.

Thanks for the tip - I'll certainly do that.

Annoyingly, since I posted this report originally, then adding in a new report to the kernel.org lists in this, I have 
been unable to reproduce this problem. I have successfully done ~22 scsi hotplug / remove cycles and none resulted in 
reproducing the issue.

Kernel versions are still the same on both proxmox host and the Fedora guest - however I see an update on the host of 
the qemu-kvm packages in Proxmox. The proxmox host hasn't even been rebooted in this time.

I wonder if the initial revert included in 6.6.8 fixed the main problem, and the later update to qemu-kvm packages on 
the proxmox host followed by the last reboot of the VM with the new KVM package sorted the second issue.

Seeing as I can no longer reproduce this reliably - whereas it was 100% reproducible prior, maybe I'm now chasing ghosts.

I'll still continue to monitor - as I normally do this SCSI hotplug ~3 times per week doing backups to different 
external HDDs - so if I do observe it again, I'll grab the stack and reply to this thread again with what I can find.

Until then, I don't want to waste other peoples time also chasing ghosts :)

-- 
Steven Haigh

📧 netwiz@....id.au
💻 https://crc.id.au