lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 17 Jul 2007 16:12:29 -0700 (PDT)
From:	Jeremy Katz <jeremy.katz@...driver.com>
To:	Thomas Gleixner <tglx@...utronix.de>
cc:	linux-kernel@...r.kernel.org, Andrew Morton <akpm@...l.org>,
	Ingo Molnar <mingo@...e.hu>, Oleg Nesterov <oleg@...sign.ru>,
	Stable Team <stable@...nel.org>
Subject: Re: [PATCH] posix-timer: fix deletion race

On Tue, 17 Jul 2007, Thomas Gleixner wrote:

> On Tue, 2007-07-17 at 11:39 -0700, Jeremy Katz wrote:
>> I tried the patch with my test case, but still see the issue.
>> Here's my explanation of the double free race:
>> CPU 0					CPU 1
>> sys_timer_delete():
>>  	lock_timer();
>>  	...
>>  	unlock_timer();			itimer_delete()
>>  	release_posix_timer():			spin_lock_irqsave(timer...)
>>  		...				...
>>  		sigqueue_free()			unlock_timer()
>>  		...				sigqueue_free()
>>  							BUG_ON(!(q->flags...
>
> With 2.6.14 or with current mainline ?

I haven't been keeping notes quite as studiously as I should have been, 
but this just occurred with 2.6.22.1 + the hrt6 patch + your proposed fix:

------------[ cut here ]------------
Kernel BUG at c0125987 [verbose debug info unavailable]
invalid opcode: 0000 [#1]
SMP
Modules linked in:
CPU:    0
EIP:    0060:[<c0125987>]    Not tainted VLI
EFLAGS: 00010246   (2.6.22-hrt1-WR1.4aq_cgl #1)
EIP is at sigqueue_free+0x23/0x72
eax: 00000000   ebx: f713a7d8   ecx: f79e25e0   edx: 00000202
esi: f6f4fa1c   edi: 00000283   ebp: f7207e54   esp: f7207e4c
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process hrtm_test (pid: 19369, ti=f7207000 task=f729ea50 task.ti=f7207000)
Stack: 00000202 f6f4fa1c f7207e64 c012c7ca f6f4fa1c f6f4fa24 f7207e78 
c012d110
        f77463c0 f77463fc 00000001 f7207e88 c012d14b f77463c0 f729ea50 
f7207eb4
        c011e57c f729eea4 00000006 f729ea50 f7207ea4 c01244e8 00000006 
f77463c0
Call Trace:
  [<c0103504>] show_trace_log_lvl+0x1a/0x30
  [<c01035bb>] show_stack_log_lvl+0x8d/0xaa
  [<c01037f5>] show_registers+0x1cd/0x2cb
  [<c0103a4a>] die+0x113/0x207
  [<c0395cf5>] do_trap+0x8f/0xc6
  [<c0103d35>] do_invalid_op+0x88/0x92
  [<c0395ac2>] error_code+0x72/0x78
  [<c012c7ca>] release_posix_timer+0x1b/0x7a
  [<c012d110>] itimer_delete+0x9b/0xc0
  [<c012d14b>] exit_itimers+0x16/0x21
  [<c011e57c>] do_exit+0x300/0x3e8
  [<c011e6b2>] do_group_exit+0x29/0x6f
  [<c012643a>] get_signal_to_deliver+0x210/0x298
  [<c010256d>] do_signal+0x5c/0x158
  [<c01026a5>] do_notify_resume+0x3c/0x3f
  [<c0102866>] work_notifysig+0x13/0x19

> I doubt that this can happen, as itimer_delete() is only called, when
> there are no more references to the shared sig struct. At this point no
> other related thread can run sys_timer_delete().

I came to the above explanation after failing to find another path that 
clears the SIGQUEUE_PREALLOC flag.  Some form of memory corruption could 
be to blame.

>> the timer fires during deletion:
>> CPU 0					CPU 1
>> sys_timer_delete():
>>  	lock_timer();
>>  	...
>>  	unlock_timer();			posix_timer_fn()
>>  	release_posix_timer():			spin_lock_irqsave(timer...)
>>  		...				...
>>  		sigqueue_free()			posix_timer_event()
>>  		...					...
>>  							send_sigqueue()
>>  								BUG_ON(!(q->flags
>>
>
> I do not buy that either. In sys_timer_delete() the active timer is
> canceled. If we can not cancel it then we retry until the timer callback
> has been processed. This is even true for 2.6.14 (pre hrtimer).
>
>> --- linux-2.6.14-cgl.original/kernel/posix-timers.c	2007-07-11 16:06:47.000000000 -0700
>> +++ linux-2.6.14-cgl/kernel/posix-timers.c	2007-07-16 15:25:43.000000000 -0700
>> @@ -339,7 +339,9 @@ static int posix_timer_fn(void *data)
>>   	int si_private = 0;
>>   	int ret = HRTIMER_NORESTART;
>>
>> -	spin_lock_irqsave(&timr->it_lock, flags);
>> +	if(lock_timer(timr->it_id, &flags) == NULL) {
>> +		return ret;
>> +	}
>
> This is wrong. You look at something which is not valid anymore, if your
> theory is correct. You are just pampering over the real bug.

Noticed right after I sent the patch out.  My goal was to validate and 
lock the timer through a single interface, but itimer_delete() and 
posix_timer_fn() currently recieve a k_itimer * instead of a timer_t.  I 
should have reworked them.

>> @@ -835,7 +844,9 @@ static inline void itimer_delete(struct
>>   	unsigned long flags;
>>
>>   retry_delete:
>> -	spin_lock_irqsave(&timer->it_lock, flags);
>> +	/* timer already deleted? */
>> +	if (lock_timer(timer->it_id, &flags) == NULL)
>> +		return;
>
> Same here.

Thanks for the reply,
Jeremy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ