linux-kernel - Re: [PATCH 08/14] hrtimer: Allow hrtimer::function() to free the timer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1433922411.23588.132.camel@odin.com>
Date:	Wed, 10 Jun 2015 10:46:51 +0300
From:	Kirill Tkhai <ktkhai@...n.com>
To:	Oleg Nesterov <oleg@...hat.com>
CC:	Peter Zijlstra <peterz@...radead.org>, <umgwanakikbuti@...il.com>,
	<mingo@...e.hu>, <ktkhai@...allels.com>, <rostedt@...dmis.org>,
	<tglx@...utronix.de>, <juri.lelli@...il.com>,
	<pang.xunlei@...aro.org>, <wanpeng.li@...ux.intel.com>,
	<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 08/14] hrtimer: Allow hrtimer::function() to free the
 timer

Hi, Oleg,

В Вт, 09/06/2015 в 23:33 +0200, Oleg Nesterov пишет:
> On 06/08, Peter Zijlstra wrote:
> >
> > On Mon, Jun 08, 2015 at 11:14:17AM +0200, Peter Zijlstra wrote:
> > > > Finally. Suppose that timer->function() returns HRTIMER_RESTART
> > > > and hrtimer_active() is called right after __run_hrtimer() sets
> > > > cpu_base->running = NULL. I can't understand why hrtimer_active()
> > > > can't miss ENQUEUED in this case. We have wmb() in between, yes,
> > > > but then hrtimer_active() should do something like
> > > >
> > > > 	active = cpu_base->running == timer;
> > > > 	if (!active) {
> > > > 		rmb();
> > > > 		active = state != HRTIMER_STATE_INACTIVE;
> > > > 	}
> > > >
> > > > No?
> > >
> > > Hmm, good point. Let me think about that. It would be nice to be able to
> > > avoid more memory barriers.
> >
> > So your scenario is:
> >
> > 				[R] seq
> > 				  RMB
> > [S] ->state = ACTIVE
> >   WMB
> > [S] ->running = NULL
> > 				[R] ->running (== NULL)
> > 				[R] ->state (== INACTIVE; fail to observe
> > 				             the ->state store due to
> > 					     lack of order)
> > 				  RMB
> > 				[R] seq (== seq)
> > [S] seq++
> >
> > Conversely, if we re-order the (first) seq++ store such that it comes
> > first:
> >
> > [S] seq++
> >
> > 				[R] seq
> > 				  RMB
> > 				[R] ->running (== NULL)
> > [S] ->running = timer;
> >   WMB
> > [S] ->state = INACTIVE
> > 				[R] ->state (== INACTIVE)
> > 				  RMB
> > 				[R] seq (== seq)
> >
> > And we have another false negative.
> >
> > And in this case we need the read order the other way around, we'd need:
> >
> > 	active = timer->state != HRTIMER_STATE_INACTIVE;
> > 	if (!active) {
> > 		smp_rmb();
> > 		active = cpu_base->running == timer;
> > 	}
> >
> > Now I think we can fix this by either doing:
> >
> > 	WMB
> > 	seq++
> > 	WMB
> >
> > On both sides of __run_hrtimer(), or do
> >
> > bool hrtimer_active(const struct hrtimer *timer)
> > {
> > 	struct hrtimer_cpu_base *cpu_base;
> > 	unsigned int seq;
> >
> > 	do {
> > 		cpu_base = READ_ONCE(timer->base->cpu_base);
> > 		seq = raw_read_seqcount(&cpu_base->seq);
> >
> > 		if (timer->state != HRTIMER_STATE_INACTIVE)
> > 			return true;
> >
> > 		smp_rmb();
> >
> > 		if (cpu_base->running == timer)
> > 			return true;
> >
> > 		smp_rmb();
> >
> > 		if (timer->state != HRTIMER_STATE_INACTIVE)
> > 			return true;
> >
> > 	} while (read_seqcount_retry(&cpu_base->seq, seq) ||
> > 		 cpu_base != READ_ONCE(timer->base->cpu_base));
> >
> > 	return false;
> > }
> 
> You know, I simply can't convince myself I understand why this code
> correct... or not.
> 
> But contrary to what I said before, I agree that we need to recheck
> timer->base. This probably needs more discussion, to me it is very
> unobvious why we can trust this cpu_base != READ_ONCE() check. Yes,
> we have a lot of barriers, but they do not pair with each other. Lets
> ignore this for now.
> 
> > And since __run_hrtimer() is the more performance critical code, I think
> > it would be best to reduce the amount of memory barriers there.
> 
> Yes, but wmb() is cheap on x86... Perhaps we can make this code
> "obviously correct" ?
> 
> 
> How about the following..... We add cpu_base->seq as before but
> limit its "write" scope so that we cam use the regular read/retry.
> 
> So,
> 
> 	hrtimer_active(timer)
> 	{
> 
> 		do {
> 			base = READ_ONCE(timer->base->cpu_base);
> 			seq = read_seqcount_begin(&cpu_base->seq);
> 
> 			if (timer->state & ENQUEUED ||
> 			    base->running == timer)
> 				return true;
> 
> 		} while (read_seqcount_retry(&cpu_base->seq, seq) ||
> 			 base != READ_ONCE(timer->base->cpu_base));
> 
> 		return false;
> 	}
> 
> And we need to avoid the races with 2 transitions in __run_hrtimer().
> 
> The first race is trivial, we change __run_hrtimer() to do
> 
> 	write_seqcount_begin(cpu_base->seq);
> 	cpu_base->running = timer;
> 	__remove_hrtimer(timer);	// clears ENQUEUED
> 	write_seqcount_end(cpu_base->seq);

We use seqcount, because we are afraid that hrtimer_active() may miss
timer->state or cpu_base->running, when we are clearing it.

If we use two pairs of write_seqcount_{begin,end} in __run_hrtimer(),
we may protect only the places where we do that:

	cpu_base->running = timer;
	write_seqcount_begin(cpu_base->seq);
	__remove_hrtimer(timer);	// clears ENQUEUED
	write_seqcount_end(cpu_base->seq);

	....

	timer->state |= HRTIMER_STATE_ENQUEUED;
	write_seqcount_begin(cpu_base->seq);
	base->running = NULL;
	write_seqcount_end(cpu_base->seq);

> 
> and hrtimer_active() obviously can't race with this section.
> 
> Then we change enqueue_hrtimer()
> 
> 
> 	+	bool need_lock = base->cpu_base->running == timer;
> 	+	if (need_lock)
> 	+		write_seqcount_begin(cpu_base->seq);
> 	+
> 		timer->state |= HRTIMER_STATE_ENQUEUED;
> 	+
> 	+	if (need_lock)
> 	+		write_seqcount_end(cpu_base->seq);
> 
> 
> Now. If the timer is re-queued by the time __run_hrtimer() clears
> ->running we have the following sequence:
> 
> 	write_seqcount_begin(cpu_base->seq);
> 	timer->state |= HRTIMER_STATE_ENQUEUED;
> 	write_seqcount_end(cpu_base->seq);
> 
> 	base->running = NULL;
> 
> and I think this should equally work, because in this case we do not
> care if hrtimer_active() misses "running = NULL".
> 
> Yes, we only have this 2nd write_seqcount_begin/end if the timer re-
> arms itself, but otherwise we do not race. If another thread does
> hrtime_start() in between we can pretend that hrtimer_active() hits
> the "inactive".
> 
> What do you think?
> 
> 
> And. Note that we can rewrite these 2 "write" critical sections in
> __run_hrtimer() and enqueue_hrtimer() as
> 
> 	cpu_base->running = timer;
> 
> 	write_seqcount_begin(cpu_base->seq);
> 	write_seqcount_end(cpu_base->seq);
> 
> 	__remove_hrtimer(timer);
> 
> and
> 
> 	timer->state |= HRTIMER_STATE_ENQUEUED;
> 
> 	write_seqcount_begin(cpu_base->seq);
> 	write_seqcount_end(cpu_base->seq);
> 
> 	base->running = NULL;
> 
> So we can probably use write_seqcount_barrier() except I am not sure
> about the 2nd wmb...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/