lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <bc51bc4e-21e5-d6a9-22ee-7c1194deefc8@gmail.com>
Date:   Sat, 23 Nov 2019 17:51:19 -0500
From:   Woody Suwalski <terraluna977@...il.com>
To:     LKML <linux-kernel@...r.kernel.org>
Cc:     "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: kernel 5.2+: suspend freeze in VMware Player.

Rafael, Thomas, this is the same VMware Player 15.2 freeze on suspend issue
I have been discussing with you in August.

It has surfaced after Thomas Gleixner's change in kernel 5.2
dfe0cf8b  x86/ioapic: Implement irq_get irqchip_state() callback

It is still with us in 5.4, 100% repeatable on a second suspend after a 
reboot.

I have traced it down to the ioapic_irq_get_chip_state() function, where
rentry.rr is stuck hi.

On the first suspend I can see that for IRQ9 the test exits with irr=0,
trigger=1, but on second and consecutive suspends it is returning
irr=1 trigger=1, so *state=1, and this results in a never-ending loop
in __synchronize_hardirq(), because inprogress is always 1.

I have been usig a "fix" to timeout in __synchronize_hardirq() after
64 iterations, and that seems to work OK (no side-effects noticed),
but of course is not addressing the underlying problem.

And the problem may be somewhere in VMware emulation code, returning bad 
data?

Would you have ideas as to what should be the right setting for
IRQ9 in VM environment?  Edge or level?
And which part of code is reading the "hardware" state from VMware?

OTOH, current implementation is not really safe, as the wait loop should 
have
a timeout, or else it may get stuck. Should I provide my safety-exit patch?

Thanks, Woody

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ