[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <46CB0BE8.8010903@windriver.com>
Date: Tue, 21 Aug 2007 11:59:36 -0400
From: taoyue <yue.tao@...driver.com>
To: linux-kernel@...r.kernel.org, oleg@...sign.ru
CC: Mark Zhan <rongkai.zhan@...driver.com>,
Bruce Ashfield <bruce.ashfield@...driver.com>
Subject: [BUG]: posix timer: slab error 'double free'
Hi everyone:
A posix timer race condition is found in current kernel source tree.
Jeremy has actually
reported the same problem.
I write a simple stress test program for posix timer subsystem, to
reproduce the problem in the lastest mainline kernel.
My test program creates 200 threads, and each thread does the following job:
while (1) {
timer_create()
timer_settime()
sleep a while
timer_delete()
}
Please see my test program in the attachemnt "posix_timer_test.c". You
can compile my test program via the following command line:
gcc -static -o posix_timer_test.c posix_timer_test.c -lrt -lpthread
For my testing environment, you can refer to the three attachment files:
"dmesg.txt", "cpuinfo.txt", "config.txt"
In the pristine Linux-2.6.23-rc3, we get the following oops message:
slab error in cache_alloc_debugcheck_after(): cache `sigqueue': double
free, or memory outside object was overwritten
[<c0103941>] show_trace_log_lvl+0x1a/0x30
[<c0104593>] show_trace+0x12/0x14
[<c01045ab>] dump_stack+0x16/0x18
[<c015637b>] __slab_error+0x26/0x28
[<c0156852>] cache_alloc_debugcheck_after+0x134/0x204
[<c0157e12>] kmem_cache_alloc+0x5a/0xac
[<c0123394>] __sigqueue_alloc+0x25/0x62
[<c0124ad5>] sigqueue_alloc+0x15/0x1f
[<c012aca4>] sys_timer_create+0x3d/0x2d8
[<c01029e6>] syscall_call+0x7/0xb
=======================
dcdcd000: redzone 1:0xd84156c5635688c0, redzone 2:0xd84156c5635688c0
slab error in verify_redzone_free(): cache `sigqueue': double free detected
[<c0103941>] show_trace_log_lvl+0x1a/0x30
[<c0104593>] show_trace+0x12/0x14
[<c01045ab>] dump_stack+0x16/0x18
[<c015637b>] __slab_error+0x26/0x28
[<c0156bbf>] cache_free_debugcheck+0x1d9/0x298
[<c0156eaa>] kmem_cache_free+0x66/0xb5
[<c0122f4d>] __sigqueue_free+0x2f/0x32
[<c0123147>] __dequeue_signal+0xdc/0x174
[<c0124960>] dequeue_signal+0xbb/0x149
[<c012569e>] sys_rt_sigtimedwait+0x7f/0x240
[<c01029e6>] syscall_call+0x7/0xb
=======================
dd7f7000: redzone 1:0x9f911029d74e35b, redzone 2:0x9f911029d74e35b.
BUG: unable to handle kernel paging request at virtual address dd7f7f6c
printing eip:
c012abdb
*pde = 00075067
*pte = 1d7f7000
Oops: 0002 [#1]
PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in:
CPU: 1
EIP: 0060:[<c012abdb>] Not tainted VLI
EFLAGS: 00010086 (2.6.23-rc3-g2a677896 #1)
EIP is at posix_timer_event+0x14/0xa0
eax: 00000000 ebx: dcc1e900 ecx: 00000020 edx: 00000003
esi: dcc1e938 edi: dd7f7f6c ebp: dff5fe50 esp: dff5fe48
ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0068
Process swapper (pid: 0, ti=dff5e000 task=dff4eac0 task.ti=dff5e000)
Stack: c012b1a7 dcc1e900 dff5fe7c c012b1e0 00000001 c140ef50 dcc1e938
00000002
00000217 dcc1e908 c012b1a7 dcc1e938 c140ef50 dff5febc c012e3a5
00000000
dcc1e95c 00000000 e32a8edd ef84ee03 2e028890 000000f1 c140ef20
00000001
Call Trace:
[<c0103941>] show_trace_log_lvl+0x1a/0x30
[<c01039fc>] show_stack_log_lvl+0xa5/0xca
[<c0103c3c>] show_registers+0x21b/0x391
[<c0103ed3>] die+0x121/0x25e
[<c011114d>] do_page_fault+0x354/0x627
[<c02ec4ea>] error_code+0x72/0x78
[<c012b1e0>] posix_timer_fn+0x39/0x94
[<c012e3a5>] hrtimer_run_queues+0x150/0x181
[<c012225c>] run_timer_softirq+0x1d/0x1a9
[<c011f132>] __do_softirq+0x71/0xe0
[<c011f1e0>] do_softirq+0x3f/0x41
[<c011f335>] irq_exit+0x48/0x4a
[<c010d259>] smp_apic_timer_interrupt+0x5d/0x89
[<c010344c>] apic_timer_interrupt+0x28/0x30
[<c0100c0e>] cpu_idle+0x67/0x90
[<c03a5778>] start_secondary+0x157/0x15e
[<00000000>] _stext+0x3fefff50/0x19
=======================
Code: 89 44 24 04 c7 04 24 bc 53 33 c0 e8 85 05 ff ff 83 c4 08 5e 5f 5d
c3 55 89 e5 57 53 89 c3 31 c0 b9 20 00 00 00 8b 7b 34 83 c7 0c <f3> ab
8b 43 34 89 50 24 8b 53 34 8b 43 28 89 42 0c 8b 43 34 c7
EIP: [<c012abdb>] posix_timer_event+0x14/0xa0 SS:ESP 0068:dff5fe48
Kernel panic - not syncing: Fatal exception in interrupt
And I also apply the four patches from Oleg Nesterov from lkml:
http://lkm.org/lkml/2007/8/12/193
http://lkm.org/lkml/2007/8/12/194
http://lkm.org/lkml/2007/8/12/195
http://lkm.org/lkml/2007/8/12/196
After about ten hours, the kernel still panic. Here is its oops message:
slab error in verify_redzone_free(): cache `sigqueue': double free detected
[<c0103941>] show_trace_log_lvl+0x1a/0x30
[<c0104593>] show_trace+0x12/0x14
[<c01045ab>] dump_stack+0x16/0x18
[<c015638b>] __slab_error+0x26/0x28
[<c0156bcf>] cache_free_debugcheck+0x1d9/0x298
[<c0156eba>] kmem_cache_free+0x66/0xb5
[<c0122f4d>] __sigqueue_free+0x2f/0x32
[<c0123147>] __dequeue_signal+0xdc/0x174
[<c01248ca>] dequeue_signal+0x25/0x156
[<c01256a4>] sys_rt_sigtimedwait+0x7f/0x240
[<c01029e6>] syscall_call+0x7/0xb
=======================
df839000: redzone 1:0x9f911029d74e35b, redzone 2:0x9f911029d74e35b.
BUG: unable to handle kernel paging request at virtual address df839f68
printing eip:
c0124af3
*pde = 0007e067
*pte = 1f839000
Oops: 0000 [#1]
PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in:
CPU: 1
EIP: 0060:[<c0124af3>] Not tainted VLI
EFLAGS: 00010246 (2.6.23-rc3-g2a677896-dirty #2)
EIP is at sigqueue_free+0x7/0x6f
eax: df839f60 ebx: df839f60 ecx: df3b0000 edx: 00000000
esi: dd3c7120 edi: 00000213 ebp: df3b1e24 esp: df3b1e1c
ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
Process posix_timer_tes (pid: 1110, ti=df3b0000 task=dd77dac0
task.ti=df3b0000)
Stack: 00000213 dd3c7120 df3b1e34 c012a8c2 dd3c7120 dd3c7128 df3b1e50
c012a9ce
df755e48 df755e88 dd77df9c dd77dac0 00000000 df3b1e9c c011dab0
00000000
00000000 00000000 dd77df5c df3b1e8c c012307d 00000008 00000002
dd77df44
Call Trace:
[<c0103941>] show_trace_log_lvl+0x1a/0x30
[<c01039fc>] show_stack_log_lvl+0xa5/0xca
[<c0103c3c>] show_registers+0x21b/0x391
[<c0103ed3>] die+0x121/0x25e
[<c011114d>] do_page_fault+0x354/0x627
[<c02ec52a>] error_code+0x72/0x78
[<c012a8c2>] release_posix_timer+0x13/0x6c
[<c012a9ce>] exit_itimers+0xb3/0xe7
[<c011dab0>] do_exit+0x579/0x7d4
[<c011dd34>] do_group_exit+0x29/0x70
[<c0124e7c>] get_signal_to_deliver+0x282/0x43e
[<c0101fd2>] do_notify_resume+0x8b/0x767
[<c0102a82>] work_notifysig+0x13/0x19
=======================
Code: 1c 00 5b 5d c3 55 89 e5 64 a1 00 a0 3c c0 31 c9 ba d0 00 00 00 e8
8d e8 ff ff 85 c0 74 04 83 48 08 01 5d c3 55 89 e5 56 53 89 c3 <f6> 40
08 01 74 13 3b 00 75 13 83 63 08 fe 89 d8 e8 16 e4 ff ff
EIP: [<c0124af3>] sigqueue_free+0x7/0x6f SS:ESP 0068:df3b1e1c
Fixing recursive fault but reboot is needed!
Any help are appreciated.
Best regards
yue.tao
View attachment "config.txt" of type "text/plain" (25229 bytes)
View attachment "cpuinfo.txt" of type "text/plain" (2544 bytes)
View attachment "posix_timer_test.c" of type "text/x-csrc" (3208 bytes)
View attachment "dmesg.txt" of type "text/plain" (8579 bytes)
Powered by blists - more mailing lists