lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 28 Oct 2023 00:18:31 +0900
From:   Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To:     Thomas Gleixner <tglx@...utronix.de>, paulmck@...nel.org
Cc:     John Stultz <jstultz@...gle.com>, Stephen Boyd <sboyd@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        x86@...nel.org, joel@...lfernandes.org
Subject: Re: [PATCH] clocksource: disable irq when holding watchdog_lock.

On 2023/10/26 6:28, Thomas Gleixner wrote:
> I have no idea what the kernel, VirtualPox or Windoze are doing during
> that time. I fear you need to add some debug on your own or if
> VirtualPox has a monitor/debugger you might use that to inspect what the
> guest is doing.

Although VirtualBox has a debugger
( https://www.virtualbox.org/manual/ch12.html#ts_debugger ), I'm not familiar enough
to use it; I'd like to debug from the guest side.

I found a minimal kernel config.
Changing https://I-love.SAKURA.ne.jp/tmp/config-6.6-rc7-ok from CONFIG_HZ=250
to CONFIG_HZ=1000 likely reproduces this slowdown problem. This difference
explains that this problem is timing-dependent; something unexpected event is
happening while bringing up secondary CPUs.

Fedora kernels have CONFIG_HZ=1000 and Ubuntu kernels have CONFIG_HZ=250.
I guess that changing CONFIG_HZ value is nothing special from the point of
view of hypervisors and host OS.

Can somebody reproduce this problem using different hypervisors and host OS?
You can try whether booting e.g. Fedora-Everything-netinst-x86_64-Rawhide-20231018.n.0.iso ,
Fedora-Server-netinst-x86_64-37-1.7.iso , Fedora-Everything-netinst-x86_64-34-1.2.iso etc. with
"nosmp" option added reaches GUI installer screen much faster than booting these ISO images
without adding "nosmp" option. Alternatively, you can also build a vanilla kernel using
config-6.6-rc7-ok with CONFIG_HZ changed from 250 to 1000, and boot that kernel like
a bare kernel command line shown below.

Trying

  /usr/libexec/qemu-kvm -m 4096 -smp 8 -nographic -append 'console=ttyS0,115200n8 panic=1' -no-reboot -kernel /boot/vmlinuz-6.6.0-rc7+

using qemu-kvm 1.5.3-175.el7_9.6.x86_64 on kernel 3.10.0-1160.102.1.el7.x86_64
on a physical host PC cannot reproduce this problem.

But trying

  /usr/bin/qemu-system-x86_64 -m 4096 -smp 8 -nographic -append 'console=ttyS0,115200n8 panic=1' -no-reboot -kernel /boot/vmlinuz-6.6.0-rc7+

using qemu-system-x86 1:6.2+dfsg-2ubuntu6.15 on kernel 5.15.0-87-generic
on VirtualBox on Windows 11 reproduces similar slowdown (and 5.15.0-87-generic
kernel sometimes emits soft lockup messages).

Thus, someone might be able to reproduce this problem on a nested virtualization
environment.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ