lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100207191949.GA23665@rhlx01.hs-esslingen.de>
Date:	Sun, 7 Feb 2010 20:19:49 +0100
From:	Andreas Mohr <andi@...as.de>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	Andreas Mohr <andi@...as.de>, linux-kernel@...r.kernel.org,
	Ingo Molnar <mingo@...hat.com>,
	John Stultz <johnstul@...ibm.com>
Subject: Re: clocksource mutex deadlock, cat current_clocksource
	(2.6.33-rc6)

Hi,

On Sat, Feb 06, 2010 at 03:46:50PM +0100, Thomas Gleixner wrote:
> On Sat, 6 Feb 2010, Andreas Mohr wrote:
> > echo t > /proc/sysrq-trigger shows (full output on request):
> 
> At least the state of "kwatchdog" would be interesting.

Hrmm, scrolled out of dmesg buffer :(
(will now increase buffer size)

> > cat           D 00000000     0 19361   2959 0x00000000
> >  de215ef8 00000046 00000000 00000000 c13b0294 dd45c380 dd45c5fc df9c6ef0
> >  de215f28 de215f08 c13b0260 00000202 de215f28 c124fcae dd45c380 c13b0280
> >  c13b0280 c13b0280 dd45c380 de215f08 ffffffff d9de1000 c13b0338 df9c6ef0
> > Call Trace:
> >  [<c124fcae>] mutex_lock_nested+0x157/0x23d
> >  [<c104873e>] sysfs_show_current_clocksources+0x17/0x41
> > 
> > INFO: lockdep is turned off.
> 
> Why is lockdep turned off ?

Good question.

# grep -i lockdep /boot/config-2.6.33-rc6
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_LOCKDEP=y
# CONFIG_DEBUG_LOCKDEP is not set


BTW, things get even weirder:

On bootup, I get lockups during

Switching to clocksource acpi_pm

very frequently, but _only_ in case of connected AC adapter.
If I don't connect AC adapter, it always boots.
(I tried about 8 times, with 5 times AC and 3 times without,
and consistent behaviour)

The current boot was with AC connected, and this time it did manage to
boot, but with a slight hang at the
Switching to clocksource acpi_pm
line. And this time NMI watchdog detected things going haywire:

PM: Adding info for No Bus:input1
input: Power Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input1
ACPI: Power Button [PBTN]
driver: 'PNP0C0C:00': driver_bound: bound to device 'button'
bus: 'acpi': really_probe: bound device PNP0C0C:00 to driver button
bus: 'acpi': driver_probe_device: matched device PNP0C0E:00 with driver button
bus: 'acpi': really_probe: probing driver button with device PNP0C0E:00
device: 'input2': device_add
PM: Adding info for No Bus:input2
input: Sleep Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0E:00/input/input2
ACPI: Sleep Button [SBTN]
driver: 'PNP0C0E:00': driver_bound: bound to device 'button'
bus: 'acpi': really_probe: bound device PNP0C0E:00 to driver button
bus: 'acpi': add driver fan
bus: 'acpi': add driver processor
bus: 'acpi': driver_probe_device: matched device LNXCPU:00 with driver processor
bus: 'acpi': really_probe: probing driver processor with device LNXCPU:00
Marking TSC unstable due to TSC halts in idle
device: 'cooling_device0': device_add
PM: Adding info for No Bus:cooling_device0
Switching to clocksource acpi_pm
BUG: NMI Watchdog detected LOCKUP on CPU0, ip c1046694, registers:
Modules linked in:

Pid: 266, comm: kwatchdog Not tainted 2.6.33-rc6 #3 Inspiron 8000            /Inspiron 8000
EIP: 0060:[<c1046694>] EFLAGS: 00000082 CPU: 0
EIP is at timekeeping_forward_now+0xf0/0x11d
EAX: 342100fe EBX: f3dcdfc8 ECX: 63826cfb EDX: ffffffff
ESI: 003ffd7b EDI: 00000000 EBP: dfa31f50 ESP: dfa31f28
 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
Process kwatchdog (pid: 266, ti=dfa31000 task=dfa4e0a0 task.ti=dfa31000)
Stack:
 005b9e7e 00000000 6184c590 00000fff 002cb338 4b6f0da7 00000000 c13d8524
<0> c13d8524 dfa31f48 dfa31f60 c10466d2 c13d8524 c13d8524 dfa31f6c c104725e
<0> c13b02c0 dfa31f84 c1048376 c13285ec c1328650 c13b03a4 c13d8528 dfa31fa4
Call Trace:
 [<c10466d2>] ? change_clocksource+0x11/0x3e
 [<c104725e>] ? timekeeping_notify+0x20/0x2e
 [<c1048376>] ? clocksource_select+0x9a/0xa3
 [<c10484ba>] ? clocksource_watchdog_kthread+0x13b/0x163
 [<c104837f>] ? clocksource_watchdog_kthread+0x0/0x163
 [<c103fa6e>] ? kthread+0x65/0x6a
 [<c103fa09>] ? kthread+0x0/0x6a
 [<c1002dba>] ? kernel_thread_helper+0x6/0x10
Code: 65 e8 8d 14 17 89 c6 89 d7 31 c0 0f ad d6 d3 ef f6 c1 20 0f 45 f7 0f 45 f8 a1 10 21 48 c1 89 f1 89 fb 99 01 c1 11 d3 31 c0 eb 0a <81> c1 00 36 65 c4 83 d3 ff 40 83 fb 00 77 f1 81 f9 ff c9 9a 3b
---[ end trace a7919e7f17c0a725 ]---
driver: 'LNXCPU:00': driver_bound: bound to device 'processor'
bus: 'acpi': really_probe: bound device LNXCPU:00 to driver processor
bus: 'acpi': add driver container
bus: 'acpi': add driver thermal




Kernel options I used are:

Kernel command line: BOOT_IMAGE=L2633rc6 root=802 ramdisk=0 video=vesa:mtrr ywrap devfs=nomount apm=off root=/dev/sda2 lapic apic=debug nmi_watchdog=2 usbcore.autosuspend=1



Since clocksource_mutex code invocation seems pretty symmetric, I cannot imagine any problem here,
except for mutex implementation issues, recursive clocksource code calls and NMI / interrupt issues.

Could it be that some system code (ACPI, SMI, ...?) calls clocksource stuff?
_Or_ there's some corruption going on, in the area near the clocksource_mutex storage.

I _think_ older kernels had some slight delays during clocksource switch, too, but I cannot remember any
outright hangs during bootup.

Hrmm, OTOH I changed quite some options after the last build:

$ diff -uN config-2.6.33-rc4 config-2.6.33-rc6|grep ^+
+++ config-2.6.33-rc6   2010-01-31 20:06:59.000000000 +0100
+# Linux kernel version: 2.6.33-rc6
+# Sun Jan 31 20:05:19 2010
+# CONFIG_COMPAT_BRK is not set
+# CONFIG_COMPAT_VDSO is not set
+CONFIG_PARPORT=m
+CONFIG_PARPORT_PC=m
+CONFIG_TRACE_IRQFLAGS=y
+CONFIG_LATENCYTOP=y
+CONFIG_HAVE_FTRACE_NMI_ENTER=y
+CONFIG_TRACER_MAX_TRACE=y
+CONFIG_FTRACE_NMI_ENTER=y
+CONFIG_FUNCTION_TRACER=y
+CONFIG_IRQSOFF_TRACER=y
+CONFIG_SCHED_TRACER=y
+CONFIG_FTRACE_SYSCALLS=y
+CONFIG_BLK_DEV_IO_TRACE=y
+CONFIG_DYNAMIC_FTRACE=y
+# CONFIG_FUNCTION_PROFILER is not set
+CONFIG_FTRACE_MCOUNT_RECORD=y
+CONFIG_4KSTACKS=y



Umm, CONFIG_FTRACE_NMI_ENTER, anyone?
That sounds like the most invasive candidate at least.

Anyway, behaviour does seem broken given that clocksource_mutex use is fully symmetric
and shouldn't cause issues on its own.
If I had a guess, then I'd say some of these options above do some mutex-related things
and corrupt their surroundings. But who knows...

BTW, it's not an lapic thing: I had hangs with and without enabling it.

I'm afraid I'll have to back out one half of the options above and retry...

Any ideas?

Thanks,

Andreas Mohr
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ