linux-kernel - [BUG]: posix timer: slab error 'double free'

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <46CB0BE8.8010903@windriver.com>
Date:	Tue, 21 Aug 2007 11:59:36 -0400
From:	taoyue <yue.tao@...driver.com>
To:	linux-kernel@...r.kernel.org, oleg@...sign.ru
CC:	Mark Zhan <rongkai.zhan@...driver.com>,
	Bruce Ashfield <bruce.ashfield@...driver.com>
Subject: [BUG]: posix timer: slab error 'double free'

Hi everyone:

A posix timer race condition is found in current kernel source tree. 
Jeremy has actually
reported the same problem.

I write a simple stress test program for posix timer subsystem, to 
reproduce the problem in the lastest mainline kernel.
My test program creates 200 threads, and each thread does the following job:

while (1) {
    timer_create()

    timer_settime()

    sleep a while

    timer_delete()
}

Please see my test program in the attachemnt "posix_timer_test.c". You 
can compile my test program via the following command line:

    gcc -static -o posix_timer_test.c  posix_timer_test.c -lrt -lpthread

For my testing environment, you can refer to the three attachment files: 
"dmesg.txt", "cpuinfo.txt", "config.txt"

In the pristine Linux-2.6.23-rc3, we get the following oops message:

slab error in cache_alloc_debugcheck_after(): cache `sigqueue': double 
free, or memory outside object was overwritten
 [<c0103941>] show_trace_log_lvl+0x1a/0x30
 [<c0104593>] show_trace+0x12/0x14
 [<c01045ab>] dump_stack+0x16/0x18
 [<c015637b>] __slab_error+0x26/0x28
 [<c0156852>] cache_alloc_debugcheck_after+0x134/0x204
 [<c0157e12>] kmem_cache_alloc+0x5a/0xac
 [<c0123394>] __sigqueue_alloc+0x25/0x62
 [<c0124ad5>] sigqueue_alloc+0x15/0x1f
 [<c012aca4>] sys_timer_create+0x3d/0x2d8
 [<c01029e6>] syscall_call+0x7/0xb
 =======================
dcdcd000: redzone 1:0xd84156c5635688c0, redzone 2:0xd84156c5635688c0
slab error in verify_redzone_free(): cache `sigqueue': double free detected
 [<c0103941>] show_trace_log_lvl+0x1a/0x30
 [<c0104593>] show_trace+0x12/0x14
 [<c01045ab>] dump_stack+0x16/0x18
 [<c015637b>] __slab_error+0x26/0x28
 [<c0156bbf>] cache_free_debugcheck+0x1d9/0x298
 [<c0156eaa>] kmem_cache_free+0x66/0xb5
 [<c0122f4d>] __sigqueue_free+0x2f/0x32
 [<c0123147>] __dequeue_signal+0xdc/0x174
 [<c0124960>] dequeue_signal+0xbb/0x149
 [<c012569e>] sys_rt_sigtimedwait+0x7f/0x240
 [<c01029e6>] syscall_call+0x7/0xb
 =======================
dd7f7000: redzone 1:0x9f911029d74e35b, redzone 2:0x9f911029d74e35b.
BUG: unable to handle kernel paging request at virtual address dd7f7f6c
 printing eip:
c012abdb
*pde = 00075067
*pte = 1d7f7000
Oops: 0002 [#1]
PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in:
CPU:    1
EIP:    0060:[<c012abdb>]    Not tainted VLI
EFLAGS: 00010086   (2.6.23-rc3-g2a677896 #1)
EIP is at posix_timer_event+0x14/0xa0
eax: 00000000   ebx: dcc1e900   ecx: 00000020   edx: 00000003
esi: dcc1e938   edi: dd7f7f6c   ebp: dff5fe50   esp: dff5fe48
ds: 007b   es: 007b   fs: 00d8  gs: 0000  ss: 0068
Process swapper (pid: 0, ti=dff5e000 task=dff4eac0 task.ti=dff5e000)
Stack: c012b1a7 dcc1e900 dff5fe7c c012b1e0 00000001 c140ef50 dcc1e938 
00000002
       00000217 dcc1e908 c012b1a7 dcc1e938 c140ef50 dff5febc c012e3a5 
00000000
       dcc1e95c 00000000 e32a8edd ef84ee03 2e028890 000000f1 c140ef20 
00000001
Call Trace:
 [<c0103941>] show_trace_log_lvl+0x1a/0x30
 [<c01039fc>] show_stack_log_lvl+0xa5/0xca
 [<c0103c3c>] show_registers+0x21b/0x391
 [<c0103ed3>] die+0x121/0x25e
 [<c011114d>] do_page_fault+0x354/0x627
 [<c02ec4ea>] error_code+0x72/0x78
 [<c012b1e0>] posix_timer_fn+0x39/0x94
 [<c012e3a5>] hrtimer_run_queues+0x150/0x181
 [<c012225c>] run_timer_softirq+0x1d/0x1a9
 [<c011f132>] __do_softirq+0x71/0xe0
 [<c011f1e0>] do_softirq+0x3f/0x41
 [<c011f335>] irq_exit+0x48/0x4a
 [<c010d259>] smp_apic_timer_interrupt+0x5d/0x89
 [<c010344c>] apic_timer_interrupt+0x28/0x30
 [<c0100c0e>] cpu_idle+0x67/0x90
 [<c03a5778>] start_secondary+0x157/0x15e
 [<00000000>] _stext+0x3fefff50/0x19
 =======================
Code: 89 44 24 04 c7 04 24 bc 53 33 c0 e8 85 05 ff ff 83 c4 08 5e 5f 5d 
c3 55 89 e5 57 53 89 c3 31 c0 b9 20 00 00 00 8b 7b 34 83 c7 0c <f3> ab 
8b 43 34 89 50 24 8b 53 34 8b 43 28 89 42 0c 8b 43 34 c7
EIP: [<c012abdb>] posix_timer_event+0x14/0xa0 SS:ESP 0068:dff5fe48
Kernel panic - not syncing: Fatal exception in interrupt

And I also apply the four patches from Oleg Nesterov from lkml:

http://lkm.org/lkml/2007/8/12/193
http://lkm.org/lkml/2007/8/12/194
http://lkm.org/lkml/2007/8/12/195
http://lkm.org/lkml/2007/8/12/196

After about ten hours, the kernel still panic. Here is its oops message:

slab error in verify_redzone_free(): cache `sigqueue': double free detected
 [<c0103941>] show_trace_log_lvl+0x1a/0x30
 [<c0104593>] show_trace+0x12/0x14
 [<c01045ab>] dump_stack+0x16/0x18
 [<c015638b>] __slab_error+0x26/0x28
 [<c0156bcf>] cache_free_debugcheck+0x1d9/0x298
 [<c0156eba>] kmem_cache_free+0x66/0xb5
 [<c0122f4d>] __sigqueue_free+0x2f/0x32
 [<c0123147>] __dequeue_signal+0xdc/0x174
 [<c01248ca>] dequeue_signal+0x25/0x156
 [<c01256a4>] sys_rt_sigtimedwait+0x7f/0x240
 [<c01029e6>] syscall_call+0x7/0xb
 =======================
df839000: redzone 1:0x9f911029d74e35b, redzone 2:0x9f911029d74e35b.
BUG: unable to handle kernel paging request at virtual address df839f68
 printing eip:
c0124af3
*pde = 0007e067
*pte = 1f839000
Oops: 0000 [#1]
PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in:
CPU:    1
EIP:    0060:[<c0124af3>]    Not tainted VLI
EFLAGS: 00010246   (2.6.23-rc3-g2a677896-dirty #2)
EIP is at sigqueue_free+0x7/0x6f
eax: df839f60   ebx: df839f60   ecx: df3b0000   edx: 00000000
esi: dd3c7120   edi: 00000213   ebp: df3b1e24   esp: df3b1e1c
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process posix_timer_tes (pid: 1110, ti=df3b0000 task=dd77dac0 
task.ti=df3b0000)
Stack: 00000213 dd3c7120 df3b1e34 c012a8c2 dd3c7120 dd3c7128 df3b1e50 
c012a9ce
       df755e48 df755e88 dd77df9c dd77dac0 00000000 df3b1e9c c011dab0 
00000000
       00000000 00000000 dd77df5c df3b1e8c c012307d 00000008 00000002 
dd77df44
Call Trace:
 [<c0103941>] show_trace_log_lvl+0x1a/0x30
 [<c01039fc>] show_stack_log_lvl+0xa5/0xca
 [<c0103c3c>] show_registers+0x21b/0x391
 [<c0103ed3>] die+0x121/0x25e
 [<c011114d>] do_page_fault+0x354/0x627
 [<c02ec52a>] error_code+0x72/0x78
 [<c012a8c2>] release_posix_timer+0x13/0x6c
 [<c012a9ce>] exit_itimers+0xb3/0xe7
 [<c011dab0>] do_exit+0x579/0x7d4
 [<c011dd34>] do_group_exit+0x29/0x70
 [<c0124e7c>] get_signal_to_deliver+0x282/0x43e
 [<c0101fd2>] do_notify_resume+0x8b/0x767
 [<c0102a82>] work_notifysig+0x13/0x19
 =======================
Code: 1c 00 5b 5d c3 55 89 e5 64 a1 00 a0 3c c0 31 c9 ba d0 00 00 00 e8 
8d e8 ff ff 85 c0 74 04 83 48 08 01 5d c3 55 89 e5 56 53 89 c3 <f6> 40 
08 01 74 13 3b 00 75 13 83 63 08 fe 89 d8 e8 16 e4 ff ff
EIP: [<c0124af3>] sigqueue_free+0x7/0x6f SS:ESP 0068:df3b1e1c
Fixing recursive fault but reboot is needed!

Any help are appreciated.

Best regards

yue.tao


View attachment "config.txt" of type "text/plain" (25229 bytes)

View attachment "cpuinfo.txt" of type "text/plain" (2544 bytes)

View attachment "posix_timer_test.c" of type "text/x-csrc" (3208 bytes)

View attachment "dmesg.txt" of type "text/plain" (8579 bytes)