[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <stYazPax2Mcu7VeTPJu8XXGkOiaVyF8LaZzfHDEG4izEwt4-Ztoo-cmNAv19O9nryHkybONeruyS8yNKOcV0CSicHcd6q_ptGOmADHgut2U=@ranguvar.io>
Date: Wed, 18 Dec 2024 06:21:10 +0000
From: Ranguvar <ranguvar@...guvar.io>
To: Juri Lelli <juri.lelli@...hat.com>, Sean Christopherson <seanjc@...gle.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...il.com>, "regressions@...ts.linux.dev" <regressions@...ts.linux.dev>, "regressions@...mhuis.info" <regressions@...mhuis.info>, "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>
Subject: Re: [REGRESSION][BISECTED] from bd9bbc96e835: cannot boot Win11 KVM guest
The bug is caused by Windows kernel as a KVM guest.
Cannot reproduce with Ubuntu 24.10 install iso and nouveau driver.
Windows 11 23H2 install iso reproduces reliably.
Two [0] more [1] kernel logs below.
Decode worked only on the first - spent too long trying to fix it.
On Tuesday, December 17th, 2024 at 08:57, Juri Lelli <juri.lelli@...hat.com> wrote:
>
> On 16/12/24 20:40, Ranguvar wrote:
>
> > On Monday, December 16th, 2024 at 16:50, Sean Christopherson seanjc@...gle.com wrote:
> >
> > > On Mon, Dec 16, 2024, Juri Lelli wrote:
> > >
> > > > On 14/12/24 19:52, Peter Zijlstra wrote:
> > > >
> > > > > On Sat, Dec 14, 2024 at 06:32:57AM +0000, Ranguvar wrote:
> > > > >
> > > > > > I have in kernel cmdline `iommu=pt isolcpus=1-7,17-23 rcu_nocbs=1-7,17-23 nohz_full=1-7,17-23`. Removing iommu=pt does not produce a change, and
> > > > > > dropping the core isolation freezes the host on VM startup.
> > >
> > > As in, dropping all of isolcpus, rcu_nocbs, and nohz_full? Or just dropping
> > > isolcpus?
> >
> > Thanks for looking.
> > I had dropped all three, but not altered the VM guest config, which is:
> >
> > <cputune>
> > <vcpupin vcpu='0' cpuset='2'/>
> > <vcpupin vcpu='1' cpuset='18'/>
> > ...
> > <vcpupin vcpu='11' cpuset='23'/>
> > <emulatorpin cpuset='1,17'/>
> > <iothreadpin iothread='1' cpuset='1,17'/>
> > <vcpusched vcpus='0' scheduler='fifo' priority='95'/>
> > ...
> > <iothreadsched iothreads='1' scheduler='fifo' priority='50'/>
>
>
> Are you disabling/enabling/configuring RT throttling (sched_rt_{runtime,
> period}_us) in your configuration?
>
I don't touch these.
[ranguvar@...fu ~]$ cat /proc/sys/kernel/sched_rt_period_us
1000000
[ranguvar@...fu ~]$ cat /proc/sys/kernel/sched_rt_runtime_us
950000
I removed myself from realtime group also (used by PipeWire) but still the same breakage.
> > </cputune>
> >
> > CPU mode is host-passthrough, cache mode is passthrough.
> >
> > The 24GB VRAM did cause trouble when setting up resizeable BAR months ago as well. It necessitated a special qemu config:
> > qemu:commandline
> > <qemu:arg value='-fw_cfg'/>
> > <qemu:arg value='opt/ovmf/PciMmio64Mb,string=65536'/>
> > </qemu:commandline>
I removed this config block as it appears unnecessary now.
No impact on this issue.
I tried also changed the size of the BAR from 32GB to 256MB manually before running the guest.
lspci:
Region 1: Memory at 7000000000 (64-bit, prefetchable) [size=32G]
Region 3: Memory at 7800000000 (64-bit, prefetchable) [size=32M]
after unbinding vfio_pci, writing '8' to to resource1_resize, and rebinding:
Region 1: Memory at 1040000000 (64-bit, prefetchable) [size=256M]
Region 3: Memory at 1050000000 (64-bit, prefetchable) [size=32M]
No impact.
[0]: https://ranguvar.io/pub/paste/linux-6.12-vm-regression/dmesg-6.11.0-rc1-1-git-00057-gbd9bbc96e835-20241216-decoded.log
[1]: https://ranguvar.io/pub/paste/linux-6.12-vm-regression/dmesg-6.11.0-rc1-1-git-00057-gbd9bbc96e835-20241217.log
Powered by blists - more mailing lists