lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150610223736.GL3644@twins.programming.kicks-ass.net>
Date:	Thu, 11 Jun 2015 00:37:36 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Oleg Nesterov <oleg@...hat.com>
Cc:	umgwanakikbuti@...il.com, mingo@...e.hu, ktkhai@...allels.com,
	rostedt@...dmis.org, tglx@...utronix.de, juri.lelli@...il.com,
	pang.xunlei@...aro.org, wanpeng.li@...ux.intel.com,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 08/14] hrtimer: Allow hrtimer::function() to free the
 timer

On Tue, Jun 09, 2015 at 11:33:18PM +0200, Oleg Nesterov wrote:

> And. Note that we can rewrite these 2 "write" critical sections in
> __run_hrtimer() and enqueue_hrtimer() as
> 
> 	cpu_base->running = timer;
> 
> 	write_seqcount_begin(cpu_base->seq);
> 	write_seqcount_end(cpu_base->seq);
> 
> 	__remove_hrtimer(timer);
> 
> and
> 
> 	timer->state |= HRTIMER_STATE_ENQUEUED;
> 
> 	write_seqcount_begin(cpu_base->seq);
> 	write_seqcount_end(cpu_base->seq);
> 
> 	base->running = NULL;
> 
> So we can probably use write_seqcount_barrier() except I am not sure
> about the 2nd wmb...

Which second wmb?

In any case, you use that transform from your reply to Kirill, and I
cannot currently see a hole in that. Lets call this transformation A. It
gets us the quoted bit above.

Now the above is:

	seq++;
	smp_wmb();
	smp_wmb();
	seq++;

Now, double barriers are pointless, so I think we can all agree that the
above is identical to the below. Lets call this tranformation B.

	seq++;
	smp_wmb();
	seq++;

And then because you use the traditional seqcount read side, which
stalls when seq&1, we can transform the above into this. Transformation
C.

	smp_wmb();
	seq += 2;

Which is write_seqcount_barrier(), as you say above.

And since there are no odd numbers possible in that scheme, its
identical to my modified read side with the single increment. Transform
D.

The only difference at this point is that I have my seq increment on the
'wrong' side on the first state.

	cpu_base->running = timer;

	seq++;
	smp_wmb();

	timer->state = 0;

	...

	timer->state = 1;

	smp_wmb();
	seq++;

	cpu_base->running = NULL;

Which, per my previous mail provides the following:

[S] seq++
                                [R] seq
                                    RMB
                                [R] ->running (== NULL)
[S] ->running = timer;
    WMB
[S] ->state = INACTIVE
                                [R] ->state (== INACTIVE)
                                    RMB
                                [R] seq (== seq)

Which is where we had to modify the read side to do:

		[R] ->state
		    RMB
		[R] ->running

Now, if we use write_seqcount_barrier() that would become:

	__run_hrtimer()				hrtimer_active()

	[S] ->running = timer;			[R] seq
	    WMB					    RMB
	[S] seq += 2;				[R] ->running
	[S] ->state = 0;			[R] ->state
						    RMB
						[R] seq


Which we can reorder like:

						[R] seq
						    RMB
						[R] ->running (== NULL)
	[S] ->running = timer
	    WMB
	[S] ->state = 0
						[R] ->state (== 0)
						   RMB
						[R] seq (== seq)
	[S] seq += 2


Which still gives us that false negative and would still require the
read side to be modified to do:

		[R] ->state
		    RMB
		[R] ->running

IOW, one of our transforms (A-D) is faulty for it requires a
modification to the read side.

I suspect its T-C, where we loose the odd count that holds up the read
side.

Because the moment we go from:

	Y = true;
	seq++
	WMB
	seq++
	X = false;

to:

	Y = true;
	WMB
	seq += 2;
	X = false;

It becomes possible to re-order like:

	Y = true;
	WMB
	X = false

	seq += 2;

And we loose our read order; or rather, where previously we ordered the
read side by seq, the seq increments are no longer ordered.

With this I think we can prove my code correct, however it also suggests
that:

	cpu_base->running = timer;
	seq++;
	smp_wmb();
	seq++;
	timer->state = 0;

	...

	timer->state = 1;
	seq++;
	smp_wmb();
	seq++;
	cpu_base->running = NULL;

vs
	hrtimer_active(timer)
        {

                do {
                        base = READ_ONCE(timer->base->cpu_base);
                        seq = read_seqcount_begin(&cpu_base->seq);

                        if (timer->state & ENQUEUED ||
                            base->running == timer)
                                return true;

                } while (read_seqcount_retry(&cpu_base->seq, seq) ||
                         base != READ_ONCE(timer->base->cpu_base));

                return false;
        }

Is the all-round cheapest solution. Those extra seq increments are
almost free on all archs as the cacheline will be hot and modified on
the local cpu.

Only under the very rare condition of a concurrent hrtimer_active() call
will that seq line be pulled into shared state.


I shall go sleep now, and update my patch tomorrow, lets see if I will
still agree with myself after a sleep :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ