linux-kernel - Re: [PATCH] clocksource: disable irq when holding watchdog

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d71643eb-9373-44ad-93ce-ecb42a62e074@I-love.SAKURA.ne.jp>
Date:   Sat, 28 Oct 2023 00:18:31 +0900
From:   Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To:     Thomas Gleixner <tglx@...utronix.de>, paulmck@...nel.org
Cc:     John Stultz <jstultz@...gle.com>, Stephen Boyd <sboyd@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        x86@...nel.org, joel@...lfernandes.org
Subject: Re: [PATCH] clocksource: disable irq when holding watchdog_lock.

On 2023/10/26 6:28, Thomas Gleixner wrote:
> I have no idea what the kernel, VirtualPox or Windoze are doing during
> that time. I fear you need to add some debug on your own or if
> VirtualPox has a monitor/debugger you might use that to inspect what the
> guest is doing.

Although VirtualBox has a debugger
( https://www.virtualbox.org/manual/ch12.html#ts_debugger ), I'm not familiar enough
to use it; I'd like to debug from the guest side.

I found a minimal kernel config.
Changing https://I-love.SAKURA.ne.jp/tmp/config-6.6-rc7-ok from CONFIG_HZ=250
to CONFIG_HZ=1000 likely reproduces this slowdown problem. This difference
explains that this problem is timing-dependent; something unexpected event is
happening while bringing up secondary CPUs.

Fedora kernels have CONFIG_HZ=1000 and Ubuntu kernels have CONFIG_HZ=250.
I guess that changing CONFIG_HZ value is nothing special from the point of
view of hypervisors and host OS.

Can somebody reproduce this problem using different hypervisors and host OS?
You can try whether booting e.g. Fedora-Everything-netinst-x86_64-Rawhide-20231018.n.0.iso ,
Fedora-Server-netinst-x86_64-37-1.7.iso , Fedora-Everything-netinst-x86_64-34-1.2.iso etc. with
"nosmp" option added reaches GUI installer screen much faster than booting these ISO images
without adding "nosmp" option. Alternatively, you can also build a vanilla kernel using
config-6.6-rc7-ok with CONFIG_HZ changed from 250 to 1000, and boot that kernel like
a bare kernel command line shown below.

Trying

  /usr/libexec/qemu-kvm -m 4096 -smp 8 -nographic -append 'console=ttyS0,115200n8 panic=1' -no-reboot -kernel /boot/vmlinuz-6.6.0-rc7+

using qemu-kvm 1.5.3-175.el7_9.6.x86_64 on kernel 3.10.0-1160.102.1.el7.x86_64
on a physical host PC cannot reproduce this problem.

But trying

  /usr/bin/qemu-system-x86_64 -m 4096 -smp 8 -nographic -append 'console=ttyS0,115200n8 panic=1' -no-reboot -kernel /boot/vmlinuz-6.6.0-rc7+

using qemu-system-x86 1:6.2+dfsg-2ubuntu6.15 on kernel 5.15.0-87-generic
on VirtualBox on Windows 11 reproduces similar slowdown (and 5.15.0-87-generic
kernel sometimes emits soft lockup messages).

Thus, someone might be able to reproduce this problem on a nested virtualization
environment.