linux-kernel - Re: [PATCH] posix-timer: fix deletion race

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.63.0707231858420.9759@localhost.localdomain>
Date:	Mon, 23 Jul 2007 19:07:37 -0700 (PDT)
From:	Jeremy Katz <jeremy.katz@...driver.com>
To:	Oleg Nesterov <oleg@...sign.ru>
cc:	Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org,
	Andrew Morton <akpm@...l.org>, Ingo Molnar <mingo@...e.hu>,
	Stable Team <stable@...nel.org>
Subject: Re: [PATCH] posix-timer: fix deletion race

On Fri, 20 Jul 2007, Oleg Nesterov wrote:

> On 07/18, Jeremy Katz wrote:
>>
>> On Wed, 18 Jul 2007, Oleg Nesterov wrote:
>>
>>> Jeremy, I agree with Thomas that your patch should not be right, but it
>>> does make a difference. Perhaps this is just the timing, but who knows.
>>> Could you add some printk's to be sure that lock_timer() actually fails
>>> while it never should?
>>
>> Agreed.
>>
>> Unfortunately, adding any significant output appears to alter the
>> situation to the point where the issue either does not occur, or takes
>> significantly longer to surface.
>
> No, no, I didn't mean any significant output. You changed itimer_delete()
>
> 	>  -       spin_lock_irqsave(&timer->it_lock, flags);
> 	>  +       /* timer already deleted? */
> 	>  +       if (lock_timer(timer->it_id, &flags) == NULL)
> 	>  +               return;
>
> This change should not help, lock_timer() should always succeed here.
> But since it makes a difference, we can make something like
>
> 	if (lock_timer(timer->it_id, &flags) == NULL) {
> 		printk("Impossible! but it happened.\n");
> 		return;
> 	}
>
> The same for posix_timer_fn().

Ahh, of course.  I did try that at some point, and remember seeing at 
least the occasional failure.  This time, taking the spinlock and then 
checking for a valid timer ID, I did not see the locking fail.  I did see 
the attempt to use a freed sigqueue, further suggesting my 'fix' merely 
altered the timing sufficiently to hide the issue.

> I still can't believe we have a double-free problem, this looks imposiible.
> Do you see the
>
> 	"idr_remove called for id=%d which is not allocated.\n"
>
> in syslog?

No.  I also added some accounting with atomic counters, and don't see 
evidence of a second call to release_posix_timer.

> Could you try the patch below? Perhaps we have some wierd problem with
> ->sigq corruption.

Tried, with apparent effect.

To add to the body of data: Turning off hyperthreading in the hardware 
does not resolve the issue.  Limiting the system to one CPU does appear to 
work.


Jeremy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/