lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100208093719.GA15428@rhlx01.hs-esslingen.de>
Date:	Mon, 8 Feb 2010 10:37:19 +0100
From:	Andreas Mohr <andi@...as.de>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	Andreas Mohr <andi@...as.de>, linux-kernel@...r.kernel.org,
	Ingo Molnar <mingo@...hat.com>,
	John Stultz <johnstul@...ibm.com>
Subject: Re: clocksource mutex deadlock, cat current_clocksource
	(2.6.33-rc6/7)

On Mon, Feb 08, 2010 at 10:13:00AM +0100, Thomas Gleixner wrote:
> On Mon, 8 Feb 2010, Andreas Mohr wrote:
> > 
> > And then a cat current_clocksource managed to hang again.
> 
> Well, that's not surprising at all. If one task is stuck on clocksource_mutex,
> then the next one will be stuck as well.

I believe here you are pointing at the initial bootup acpi_pm lockup which NMI
watchdog detected. And not some thought that I somehow simply executed
cat current_clocksource twice, given my wording which might erroneously
hint at that.


So you'd think that we have a clocksource_mutex problem even before
the initial bootup switch to acpi_pm?

However I don't see how this could be the case, given that in some instances
boot does continue after acpi_pm selection, albeit after a delay.

> > (NOTE that the - now complete! - SysRq-T list does NOT show any backtraces
> > of kwatchdog any more, only many other processes)
> > Could it be that the (rather disruptive) NMI watchdog confuses the current state at
> > change_clocksource and causes that stuff to get left with
> > clocksource_mutex remaining taken?
> 
> Nope, the NMI watchdog is not involved. It merily tells us that the
> task is stuck.

OK.
And after that message debug_locks is zeroed and kwatchdog is gone
from the process list (probably during debug_locks change).



I still can't make much reason of this behaviour.
If we have a problem during acpi_pm selection on boot, then by all
accounts it should get stuck completely (plus yielding watchdog's lockup
message), not continue booting after some weird delay.
OK, this particular delay phenomenon could still be explained
by a pretty severe contention,
but then after successfully having gotten the mutex it certainly
shouldn't happen that then even after bootup the mutex remains blocked
(as witnessed by the cat current_clocksource issue).
After all all mutex use is fully symmetric, no surprises IMHO - unless
NMI / SMI or so are involved.



I'll explain what I think might be happening:
bootup switches to acpi_pm, timekeeping gets borked, NMI watchdog complains
due to timekeeping issues, brutally yanks the waiting acpi_pm switchover
(thereby NOT releasing clocksource_mutex),
and the result is that I have cat current_clocksource stuck in my userspace.

Andreas Mohr
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ