lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110530161014.GS25813@smp.if.uj.edu.pl>
Date:	Mon, 30 May 2011 18:10:14 +0200
From:	"Witold Baryluk" <baryluk@....if.uj.edu.pl>
To:	Andy Lutomirski <luto@....EDU>
Cc:	Thomas Gleixner <tglx@...utronix.de>,
	Andi Kleen <andi@...stfloor.org>,
	Eric Dumazet <eric.dumazet@...il.com>,
	Ingo Molnar <mingo@...hat.com>,
	John Stultz <johnstul@...ibm.com>,
	"H. Peter Anvin" <hpa@...or.com>, x86@...nel.org,
	linux-kernel@...r.kernel.org
Subject: Kernel OOPS in function_graph_tracer due to the
 44259b1abfaa8bb819d25d41d71e8e33e25dd36a commit (x86-64: Move vread_tsc
 into a new file with sensible options)

Hi,

I found yesterday a problem when booting system on Pentium-M, 32-bit.

I got approximetly this

[    0.000000] Fast TSC calibration using PIT
[    0.000000] Detected 3199.363 MHz processor.
[    0.020197] Calibrating delay loop (skipped), value calculated using timer frequency.. 6398.72 BogoMIPS (lpj=31993630)
[    0.020763] pid_max: default: 32768 minimum: 301
[    0.022972] Mount-cache hash table entries: 512
[    0.040440] CPU: Intel Pentium III (Katmai) stepping 03
[    0.042789] ftrace: allocating 6263 entries in 13 pages
[    0.062128] Performance Events: 
[    0.062505] no APIC, boot with the "lapic" boot parameter to force-enable it.
[    0.062818] no hardware sampling interrupt available.
[    0.063038] Broken PMU hardware detected, using software events only.
[    0.072671] Testing tracer nop: PASSED
[    0.073477] NMI watchdog disabled (cpu0): hardware events not enabled
[    0.077682] devtmpfs: initialized
[    0.090118] kworker/u:0 used greatest stack depth: 6840 bytes left
[    0.105247] atomic64 test passed for i586+ platform with CX8 and with SSE
[    0.112515] kworker/u:0 used greatest stack depth: 6700 bytes left
[    0.198527] bio: create slab <bio-0> at 0
[    0.212671] Switching to clocksource pit
[    0.882202] kworker/u:0 used greatest stack depth: 6636 bytes left
[    1.070749] platform rtc_cmos: registered platform RTC device (no PNP device found)
[    1.114000] Initializing RT-Tester: OK
[    1.119256] Testing tracer function: PASSED
[    1.268289] Testing dynamic ftrace: PASSED
[    1.517992] Testing dynamic ftrace ops #1: (1 0 1 1 0) (1 1 2 1 0) (2 1 3 1 35342) (2 2 4 1 35634) PASSED
[    1.591847] Testing dynamic ftrace ops #2: (1 0 1 35341 0) (1 1 2 35612 0) (2 1 3 1 8) (2 2 4 249 256) PASSED
[    1.707799] Testing tracer irqsoff: PASSED
[    1.733435] Testing tracer preemptoff: PASSED
[    1.752663] Testing tracer preemptirqsoff: PASSED
[    1.769764] Testing tracer wakeup: 
[    2.068581] Switching to clocksource tsc
[    2.108718] PASSED
[    2.108928] Testing tracer wakeup_rt: PASSED
[    2.459170] Testing tracer function_graph: 
[    2.466979] BUG: unable to handle kernel paging request at e421cc10
[    2.466979] IP: [<c10850bc>] ftrace_push_return_trace+0x6c/0xd0
[    2.466979] *pdpt = 00000000013b4001 *pde = 0000000000000000 
[    2.466979] Thread overran stack, or stack corrupted
[    2.466979] Oops: 0002 [#1] PREEMPT 
[    2.466979] 
[    2.466979] Pid: 1, comm: swapper Not tainted 2.6.39-t43+ #40 Bochs Bochs
[    2.466979] EIP: 0060:[<c10850bc>] EFLAGS: 00000006 CPU: 0
[    2.466979] EIP is at ftrace_push_return_trace+0x6c/0xd0
[    2.466979] EAX: c7833480 EBX: c7832020 ECX: c1070aac EDX: 1c9e9790
[    2.466979] ESI: 986b5466 EDI: 00000000 EBP: c78335ac ESP: c7833590
[    2.466979]  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[    2.466979] Process swapper (pid: 1, ti=c7832000 task=c7832020 task.ti=c7834000)
[    2.466979] Stack:
[    2.466979]  c105a9fc c78335c0 c1007d26 c1070aac c78335f4 c1007d26 c1070aac c78335d0
[    2.466979]  c10150a2 c7833600 c7833600 c1007d26 00000000 c7832020 c1007d26 c7910800
[    2.466979]  c78335f0 c117e212 c1007d26 c7833638 c7832000 c1007d2b 00000046 c13551c0
[    2.466979] Call Trace:
[    2.466979]  [<c105a9fc>] ? trace_hardirqs_off+0xc/0x10
[    2.466979]  [<c1007d26>] ? native_sched_clock+0x6/0x70
[    2.466979]  [<c1070aac>] ? trace_clock_local+0x1c/0x50
[    2.466979]  [<c1007d26>] ? native_sched_clock+0x6/0x70
[    2.466979]  [<c1070aac>] ? trace_clock_local+0x1c/0x50
[    2.466979]  [<c10150a2>] prepare_ftrace_return+0x72/0xa0
[    2.466979]  [<c1007d26>] ? native_sched_clock+0x6/0x70
[    2.466979]  [<c1007d26>] ? native_sched_clock+0x6/0x70
[    2.466979]  [<c117e212>] ftrace_graph_caller+0x22/0x26
[    2.466979]  [<c1007d26>] ? native_sched_clock+0x6/0x70
[    2.466979]  [<c1007d2b>] ? native_sched_clock+0xb/0x70
[    2.466979]  [<c117e216>] ftrace_graph_caller+0x26/0x26
[    2.466979] BUG: unable to handle kernel NULL pointer dereference at 0000032d
[    2.466979] IP: [<c1005529>] print_context_stack+0xd9/0x110
[    2.466979] *pdpt = 0000000000000000 *pde = f000ff53f000ff53 
[    2.466979] Thread overran stack, or stack corrupted
[    2.466979] Oops: 0000 [#2] PREEMPT 
[    2.466979] 
[    2.466979] Pid: 1, comm: swapper Not tainted 2.6.39-t43+ #40 Bochs Bochs
[    2.466979] EIP: 0060:[<c1005529>] EFLAGS: 00000002 CPU: 0
[    2.466979] EIP is at print_context_stack+0xd9/0x110
[    2.466979] EAX: 0000001d EBX: c78335f4 ECX: 00000001 EDX: 00000310
[    2.466979] ESI: 00000000 EDI: c11806d4 EBP: c7833380 ESP: c783335c
[    2.466979]  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[    2.466979] Process swapper (pid: 1, ti=c7832000 task=c7832020 task.ti=c7834000)
[    2.466979] Stack:
[    2.466979]  c117e216 c7833ffc c7833600 ffffe000 c7832000 c7833600 c7833590 c11806d4
[    2.466979]  c7832000 c78333ac c10040dd c11806d4 c11d6c66 00000000 c783339c 00000018
[    2.466979]  00000000 00000000 c11806d4 00000018 c78333d0 c10052ec c78335ac c11806d4
[    2.466979] Call Trace:
[    2.466979]  [<c117e216>] ? ftrace_graph_caller+0x26/0x26
[    2.466979] BUG: unable to handle kernel NULL pointer dereference at 0000032d
[    2.466979] IP: [<c1005529>] print_context_stack+0xd9/0x110
[    2.466979] *pdpt = 0000000000000000 *pde = f000ff53f000ff53 
[    2.466979] Thread overran stack, or stack corrupted
[    2.466979] Oops: 0000 [#3] PREEMPT 
[    2.466979] 
[    2.466979] Pid: 1, comm: swapper Not tainted 2.6.39-t43+ #40 Bochs Bochs
[    2.466979] EIP: 0060:[<c1005529>] EFLAGS: 00000002 CPU: 0
[    2.466979] EIP is at print_context_stack+0xd9/0x110
[    2.466979] EAX: 0000001d EBX: c783335c ECX: 00000001 EDX: 00000310
[    2.466979] ESI: 00000000 EDI: c11806d4 EBP: c783314c ESP: c7833128
[    2.466979]  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[    2.466979] Process swapper (pid: 1, ti=c7832000 task=c7832020 task.ti=c7834000)
[    2.466979] Stack:
[    2.466979]  c117e216 c7833ffc c7833380 ffffe000 c7832000 c7833380 c783335c c11806d4
[    2.466979]  c7832000 c7833178 c10040dd c11806d4 c11d6c66 00000000 c7833168 00000018
[    2.466979]  00000000 00000000 c11806d4 00000018 c783319c c10052ec c7833380 c11806d4
[    2.466979] Call Trace:
[    2.466979]  [<c117e216>] ? ftrace_graph_caller+0x26/0x26
[    2.466979] BUG: unable to handle kernel NULL pointer dereference at 0000032d
[    2.466979] IP: [<c1005529>] print_context_stack+0xd9/0x110
[    2.466979] *pdpt = 0000000000000000 *pde = f000ff53f000ff53 
[    2.466979] Thread overran stack, or stack corrupted
[    2.466979] Oops: 0000 [#4] PREEMPT 
[    2.466979] 
[    2.466979] Pid: 1, comm: swapper Not tainted 2.6.39-t43+ #40 Bochs Bochs
[    2.466979] EIP: 0060:[<c1005529>] EFLAGS: 00000002 CPU: 0
[    2.466979] EIP is at print_context_stack+0xd9/0x110
[    2.466979] EAX: 0000001d EBX: c7833128 ECX: 00000001 EDX: 00000310
[    2.466979] ESI: 00000000 EDI: c11806d4 EBP: c7832f18 ESP: c7832ef4
[    2.466979]  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[    2.466979] Process swapper (pid: 1, ti=c7832000 task=c7832020 task.ti=c7834000)
[    2.466979] Stack:
[    2.466979]  c117e216 c7833ffc c783314c ffffe000 c7832000 c783314c c7833128 c11806d4
[    2.466979]  c7832000 c7832f44 c10040dd c11806d4 c11d6c66 00000000 c7832f34 00000018


full kernel log in attachment.

By bisection I got 

commit 44259b1abfaa8bb819d25d41d71e8e33e25dd36a
Author: Andy Lutomirski <luto@....EDU>
Date:   Mon May 23 09:31:28 2011 -0400

    x86-64: Move vread_tsc into a new file with sensible options


I tested on i386, 32-bit architecture. Both gcc 4.4 and 4.6 result
in same problem. Also -O2 vs -O2 doesn't change anything.

Sometimes bug doesn't appear, or appear differently, for example

[    1.114660] Initializing RT-Tester: OK
[    1.118077] Testing tracer function: PASSED
[    1.272665] Testing dynamic ftrace: PASSED
[    1.524057] Testing dynamic ftrace ops #1: (1 0 1 1 0) (1 1 2 1 0) (2 1 3 1 44876) (2 2 4 1 45184) PASSED
[    1.600066] Testing dynamic ftrace ops #2: (1 0 1 44944 0) (1 1 2 45228 0) (2 1 3 1 8) (2 2 4 265 272) PASSED
[    1.717279] Testing tracer irqsoff: PASSED
[    1.737368] Testing tracer preemptoff: PASSED
[    1.759463] Testing tracer preemptirqsoff: 
[    1.773645] ------------[ cut here ]------------
[    1.774220] WARNING: at kernel/lockdep.c:3351 check_flags+0x126/0x157()
[    1.774723] Hardware name: Bochs
[    1.775020] Pid: 1, comm: swapper Not tainted 2.6.39-t43+ #42
[    1.775332] Call Trace:
[    1.775560]  [<c102213e>] warn_slowpath_common+0x77/0x8c
[    1.775862]  [<c104728d>] ? check_flags+0x126/0x157
[    1.776138]  [<c104728d>] ? check_flags+0x126/0x157
[    1.776411]  [<c1064f5a>] ? T.482+0x114/0x16e
[    1.776664]  [<c1022175>] warn_slowpath_null+0x22/0x24
[    1.776950]  [<c104728d>] check_flags+0x126/0x157
[    1.777236]  [<c104b0a6>] lock_acquire+0x34/0xe7
[    1.777541]  [<c1126672>] _raw_spin_lock_irqsave+0x4e/0x7e
[    1.777839]  [<c1064f5a>] ? T.482+0x114/0x16e
[    1.778092]  [<c1064f5a>] T.482+0x114/0x16e
[    1.778341]  [<c105d62c>] ? ring_buffer_unlock_commit+0x21/0x2e
[    1.778654]  [<c1026ff6>] ? __do_softirq+0x17e/0x191
[    1.778931]  [<c1026ff6>] ? __do_softirq+0x17e/0x191
[    1.779208]  [<c106502a>] stop_critical_timing+0x76/0x87
[    1.779497]  [<c1026d47>] ? __local_bh_enable+0x82/0x85
[    1.779784]  [<c106505a>] trace_preempt_on+0x1f/0x21
[    1.780061]  [<c101c17d>] sub_preempt_count+0x8b/0x98
[    1.780341]  [<c1026d47>] __local_bh_enable+0x82/0x85
[    1.780621]  [<c1026ff6>] __do_softirq+0x17e/0x191
[    1.780892]  [<c1026e78>] ? local_bh_enable+0x12/0x12
[    1.781160]  <IRQ>  [<c1026c06>] ? irq_exit+0x44/0x66
[    1.781536]  [<c1003589>] ? do_IRQ+0x75/0x88
[    1.781789]  [<c1127b6e>] ? common_interrupt+0x2e/0x34
[    1.782076]  [<c106151c>] ? trace_selftest_startup_preemptirqsoff+0xae/0x1a9
[    1.782435]  [<c1061914>] ? register_tracer+0x12d/0x1fe
[    1.782724]  [<c1310344>] ? setup_command_line+0x73/0x73
[    1.783066]  [<c131c1b4>] ? init_irqsoff_tracer+0x21/0x25
[    1.783390]  [<c13101fa>] ? do_one_initcall+0x86/0x138
[    1.783390]  [<c131c193>] ? init_function_trace+0x35/0x35
[    1.783390]  [<c1310344>] ? setup_command_line+0x73/0x73
[    1.783390]  [<c13103cb>] ? kernel_init+0x87/0x11b
[    1.783390]  [<c1127b7a>] ? kernel_thread_helper+0x6/0x10
[    1.783390] ---[ end trace 4eaa2a86a8e2da22 ]---
[    1.783390] irq event stamp: 133534
[    1.783390] hardirqs last  enabled at (133532): [<c1126cbd>] _raw_spin_unlock_irq+0x27/0x4a
[    1.783390] hardirqs last disabled at (133533): [<c1026fbf>] __do_softirq+0x147/0x191
[    1.783390] softirqs last  enabled at (133534): [<c1026ff6>] __do_softirq+0x17e/0x191
[    1.783390] softirqs last disabled at (133529): [<c10037b7>] do_softirq+0x55/0xa6
[    1.798767] PASSED
[    1.798767] Testing tracer wakeup: 
[    2.063646] Switching to clocksource tsc
[    2.133963] PASSED


On recent kernels it looks like

[    1.159875] Switching to clocksource tsc
[    1.439760] PASSED
[    1.440212] Testing tracer function_graph: 
[    1.449049] BUG: unable to handle kernel paging request at e42d01bc
[    1.449049] IP: [<c107586c>] ftrace_push_return_trace+0x6c/0xd0
[    1.449049] *pdpt = 0000000001221001 *pde = 0000000000000000 
[    1.449049] Thread overran stack, or stack corrupted
[    1.449049] Oops: 0002 [#1] PREEMPT 
[    1.449049] 
[    1.449049] Pid: -947754028, comm: 
[    1.449049] ������k������ Not tainted 3.0.0-rc1-t43 #45 Bochs Bochs
[    1.449049] EIP: 0060:[<c107586c>] EFLAGS: 00000016 CPU: 0
[    1.449049] EIP is at ftrace_push_return_trace+0x6c/0xd0
[    1.449049] EAX: c7826e50 EBX: c7826ae0 ECX: c106178c EDX: 1caa936c
[    1.449049] ESI: 5656002d EDI: 00000000 EBP: c7826f84 ESP: c7826f68
[    1.449049]  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[    1.449049] Process � (pid: -947753988, ti=c7826000 task=c7826ae0 task.ti=c7828000)
[    1.449049] Stack:
[    1.449049]  c106178d c7826f98 c1007a26 c106178c c7826fcc c1007a26 c106178c c7826fa8
[    1.449049]  c1014982 c7826fd8 c7826fd8 c1007a26 00000000 c7826ae0 c1007a26 c7941800
[    1.449049]  c7826fc8 c1162192 c1007a26 c7827010 c7826000 c1007a2b c783e95c 00000021
[    1.449049] Call Trace:
[    1.449049]  [<c106178d>] ? trace_clock_local+0x1d/0x50
[    1.449049]  [<c1007a26>] ? native_sched_clock+0x6/0x70
[    1.449049]  [<c106178c>] ? trace_clock_local+0x1c/0x50
[    1.449049]  [<c1007a26>] ? native_sched_clock+0x6/0x70
[    1.449049]  [<c106178c>] ? trace_clock_local+0x1c/0x50
[    1.449049]  [<c1014982>] prepare_ftrace_return+0x72/0xa0
[    1.449049]  [<c1007a26>] ? native_sched_clock+0x6/0x70
[    1.449049]  [<c1007a26>] ? native_sched_clock+0x6/0x70
...

or

[    0.126297] Testing tracer function: PASSED
[    0.260366] Testing dynamic ftrace: PASSED
[    0.498381] Testing dynamic ftrace ops #1: (1 0 1 1 0) (1 1 2 1 0) (2 1 3 1 6015) (2 2 4 1 6190) PASSED
[    0.557775] Testing dynamic ftrace ops #2: (1 0 1 6069 0) (1 1 2 6234 0) (2 1 3 1 2) (2 2 4 152 153) PASSED
[    0.637461] Testing tracer irqsoff: PASSED
[    0.650073] Testing tracer wakeup: PASSED
[    0.979100] Testing tracer wakeup_rt: 
[    1.118393] Switching to clocksource tsc
[    1.308726] PASSED
[    1.309162] Testing tracer function_graph: 
[    1.317873] Tracing recursion: depth[576]:HC[1023]:SC[255]:NMI[67108864]
[    1.317873] ------------[ cut here ]------------
[    1.317873] WARNING: at kernel/trace/ring_buffer.c:2224 trace_recursive_fail+0x30/0x90()
[    1.317873] Hardware name: Bochs
[    1.317873] Pid: 1, comm:  Not tainted 3.0.0-rc1-t43 #46
[    1.317873] Call Trace:
[    1.317873]  [<c1022fbd>] warn_slowpath_common+0x6d/0xa0
[    1.317873]  [<c105b3c0>] ? trace_recursive_fail+0x30/0x90
[    1.317873]  [<c105b3c0>] ? trace_recursive_fail+0x30/0x90
[    1.317873]  [<c1023010>] warn_slowpath_null+0x20/0x30
[    1.317873]  [<c105b3c0>] trace_recursive_fail+0x30/0x90
[    1.317873]  [<c105ce8c>] ring_buffer_lock_reserve+0x8c/0xc0
[    1.317873]  [<c106177a>] trace_buffer_lock_reserve+0x1a/0x50
[    1.317873]  [<c1069357>] __trace_graph_return+0x37/0x90
[    1.317873]  <IRQ> 
[    1.317873] BUG: unable to handle kernel NULL pointer dereference at 00000040
[    1.317873] IP: [<c1004f95>] print_context_stack+0x55/0x110
[    1.317873] *pde = 00000000 
[    1.317873] BUG: unable to handle kernel NULL pointer dereference at 0000004a
[    1.317873] IP: [<c1015e0e>] no_context+0x6e/0x160
[    1.317873] *pde = 00000000 
[    1.317873] Thread overran stack, or stack corrupted
[    1.317873] Oops: 0000 [#1] 
[    1.317873] 
[    1.317873] Pid: -947741524, comm:  Not tainted 3.0.0-rc1-t43 #46 Bochs Bochs
[    1.317873] EIP: 0060:[<c1015e0e>] EFLAGS: 00000006 CPU: 0
[    1.317873] EIP is at no_context+0x6e/0x160
[    1.317873] EAX: 00000002 EBX: c7829e78 ECX: c1023e56 EDX: c7829bc0
[    1.317873] ESI: 00000040 EDI: 00000000 EBP: c7829da8 ESP: c7829d88
[    1.317873]  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[    1.317873] Process  (pid: -947741524, ti=c7828000 task=c7829bc0 task.ti=c10019a0)
[    1.317873] Stack:
[    1.317873]  c117be6d 00000040 00000046 c7829bc0 c7829de8 00000000 00000040 c7829bc0
[    1.317873]  c7829de8 c1015f98 c1143805 c7829dd8 c1013c3d 00000000 0000004a 0000000e
[    1.317873]  00000000 c1408242 00000040 c140820e c7829e78 c1408242 00000040 c140820e
[    1.317873] Call Trace:
[    1.317873]  [<c1015f98>] __bad_area_nosemaphore+0x98/0x140
[    1.317873]  [<c1143805>] ? error_code+0x5d/0x64
[    1.317873]  [<c1013c3d>] ? prepare_ftrace_return+0x4d/0xa0
[    1.317873]  [<c1016057>] bad_area_nosemaphore+0x17/0x20
[    1.317873]  [<c10164ae>] do_page_fault+0x27e/0x410
[    1.317873]  [<c1023928>] ? console_unlock+0x1c8/0x1f0
[    1.317873]  [<c1016230>] ? mm_fault_error+0x130/0x130
[    1.317873]  [<c1016230>] ? mm_fault_error+0x130/0x130
...

Depending on the luck and boot time. All above was registered using qemu.
But I have this (or slightly different - depdnding on boot,
or kernel configuration - messages, just after trace function_graph testing
started) alsoon my laptop.

Reverting commit 44259b1abfaa8bb819d25d41d71e8e33e25dd36a on top of current
kernel make bug disapear.

Disabling CONFIG_FUNCTION_GRAPH_TRACER also make bug dispear.

Bug is preserved at multiple kernel configs.
I send one which is pretty minimized.


Regards,
Witek

-- 
Witold Baryluk

View attachment "config-minimized.txt" of type "text/plain" (20886 bytes)

Download attachment "oops-at-44259b1abfaa8bb819d25d41d71e8e33e25dd36a.txt.gz" of type "application/octet-stream" (5364 bytes)

Download attachment "signature.asc" of type "application/pgp-signature" (199 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ