linux-kernel - Re: [PATCH] posix-timers: Do not modify an already queued timer signal

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080720110856.GC143@tv-sign.ru>
Date:	Sun, 20 Jul 2008 15:08:56 +0400
From:	Oleg Nesterov <oleg@...sign.ru>
To:	Roland McGrath <roland@...hat.com>
Cc:	Mark McLoughlin <markmc@...hat.com>, linux-kernel@...r.kernel.org,
	kvm@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH] posix-timers: Do not modify an already queued timer signal

On 07/19, Roland McGrath wrote:
>
> I think the solution we want is to make sure that a timer whose
> expiration signal is still pending never gets a second notification.
>
> POSIX says, "The effect of disarming or resetting a timer with pending
> expiration notifications is unspecified."  This gives us lots of
> latitude.  I think the best behavior is the one that comports with the
> general specification, "Only a single signal shall be queued to the
> process for a given timer at any point in time."
>
> The it_requeue_pending/si_sys_private logic is indeed intended to
> address this case.  It predates my involvement of the code, and I
> agree it has always looked oddly hairy.  Its plan is to make sure that
> dequeue_signal's call to do_schedule_next_timer does not re-arm the
> timer when an intervening timer_settime call has already done so.  As
> Mark found, this is buggy because it's not really safe for the timer
> to go off (call posix_timer_event again) before the dequeue.
>
> In the signals code, si_sys_private is used purely as a flag that
> dequeue_signal needs to bother with dropping the siglock to call
> do_schedule_next_timer.  Its use as a serialization counter between
> timer_settime and do_schedule_next_timer is entirely local to the
> posix-timers code.

Yes, thanks, I see. But does it have any meaning for the user-space?

Let me repeat. Can't we make a simple fix for now for this nasty and
ancient bug, before we do the more clever changes,

	--- kernel/posix-timers.c
	+++ kernel/posix-timers.c
	@@ -298,12 +298,10 @@ void do_schedule_next_timer(struct sigin
	 
	 int posix_timer_event(struct k_itimer *timr,int si_private)
	 {
	-	memset(&timr->sigq->info, 0, sizeof(siginfo_t));
		timr->sigq->info.si_sys_private = si_private;
		/* Send signal to the process that owns this timer.*/
	 
		timr->sigq->info.si_signo = timr->it_sigev_signo;
	-	timr->sigq->info.si_errno = 0;
		timr->sigq->info.si_code = SI_TIMER;
		timr->sigq->info.si_tid = timr->it_id;
		timr->sigq->info.si_value = timr->it_sigev_value;
	@@ -435,6 +433,7 @@ static struct k_itimer * alloc_posix_tim
			kmem_cache_free(posix_timers_cache, tmr);
			tmr = NULL;
		}
	+	memset(&timr->sigq->info, 0, sizeof(siginfo_t));
		return tmr;
	 }
	 

?

Suppose ->sigq is queued. In that case "si_sys_private = si_private"
can race with dequeue_signal/do_schedule_next_timer, but everything
looks OK because the timer is locked.

If dequeue_signal() sees the old value of si_sys_private, it won't
call do_schedule_next_timer(). If it sees the new value, do_schedule_next_timer()
will wait for ->it_lock.

Yes, in both cases we can queue ->sigq again instead of incrementing
info.si_overrun. But this is not worse than we have now, and afaics
this is also possible with timr->fired, please see below.

> What userland should observe is that when the new timer expiration
> time arrives while the previously-fired timer signal is still pending,
> there is no new signal, and the timer accrues an overrun.
>
> An obvious idea is simply not to re-arm in timer_settime when the
> timer already has a pending notification.  When the signal gets
> dequeued, do_schedule_next_timer will arm it then.  However, the way
> we dispatch to different clocks' timer_set functions makes it awkward
> to validate and store a new setting and fetch the old value without
> actually arming the timer.  Also, the logic for one-shot timers has
> more wrinkles.
>
> So I think the thing to do is arm the timer in timer_settime even when
> it's already pending (as we do now), but have posix_timer_event detect
> a firing when already queued and just bump the overrun count.
>
> We can do away with it_requeue_pending, and make si_sys_private purely
> a simple 0/1 flag for whether to call do_schedule_next_timer.  To
> optimize the extra checks, we use different bookkeeping fields in
> struct k_itimer.
>
> Have a flag that means "might be queued".  do_schedule_next_timer
> replaces:
>
> 	if (timr && timr->it_requeue_pending == info->si_sys_private) {
> 		...
> 	}
>
> with:
>
> 	if (timr && timr->fired) {
> 		timr->fired = 0;
> 		...
> 	}
>
> That avoids races with timer_settime, which does (after timer_set
> call, timer still locked):
>
> 	if (timr->fired) {
> 		spin_lock(&current->sighand->siglock);
> 		if (list_empty(&timr->sigq->list))
> 			timr->fired = 0;
> 		spin_unlock(&current->sighand->siglock);
> 	}
>
> In posix_timer_event (timer is locked), check:
>
> 	if (timr->fired) {
> 		spin_lock(&current->sighand->siglock);
> 		if (list_empty(&timr->sigq->list))
> 			timr->fired = 0;
> 		spin_unlock(&current->sighand->siglock);

(we can't use current->sighand->siglock, this is irq context, we should
 use timr->it_process->group_leader...)

Suppose that the signal was already dequeued but sigq->info wasn't copied
to the user-space. The thread which does dequeue_signal() unlocked siglock
and waits for ->it_lock.

posix_timer_event() sees list_empty(), proceeds and calls send_sigqueue().
Now we requeue this ->sigq instead of incrementing si_overrun.

The thread which does dequeue_signal() continues and re-schedules the
timer while ->sigq is queued. Then it copies sigq->info to the user space.

The same problem as with the simple patch above. Not very bad though,
afaics. The user space just recieves 2 signals instead of one with
the right value of si_overrun.



While we are here, off-topics question. posix_timer_event() does

	if (timr->it_sigev_notify & SIGEV_THREAD_ID) {
		struct task_struct *leader;
		int ret = send_sigqueue(timr->sigq, timr->it_process, 0);

		if (likely(ret >= 0))
			return ret;

		timr->it_sigev_notify = SIGEV_SIGNAL;
		leader = timr->it_process->group_leader;
		put_task_struct(timr->it_process);
		timr->it_process = leader;
	}

	return send_sigqueue(timr->sigq, timr->it_process, 1);

Is it really useful to convert the SIGEV_THREAD_ID timer to the group-wide
timer when the thread dies?

This uglifies de_thread(), but more importantly this doesn't work reliably.
Suppose that timr->it_process dies with the pending ->sigq. In that case
the timer stops anyway.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/