linux-kernel - Re: Additional compiler barrier required in sched_preempt_enable_no

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160516105557.GL3193@twins.programming.kicks-ass.net>
Date:	Mon, 16 May 2016 12:55:57 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	Vikram Mulukutla <markivx@...eaurora.org>,
	linux-kernel@...r.kernel.org
Subject: Re: Additional compiler barrier required in
 sched_preempt_enable_no_resched?

On Sat, May 14, 2016 at 05:39:37PM +0200, Thomas Gleixner wrote:
> I have a hard time to understand why the compiler optimizes out stuff w/o that
> patch.
> 
> We already have:
> 
> #define preempt_disable() \
> do { \
>         preempt_count_inc(); \
>         barrier(); \
> } while (0)
> 
> #define sched_preempt_enable_no_resched() \
> do { \
>         barrier(); \
>         preempt_count_dec(); \
> } while (0)
> 
> #define preempt_enable() \
> do { \
>         barrier(); \
>         if (unlikely(preempt_count_dec_and_test())) \
>                 __preempt_schedule(); \
> } while (0)
> 
> So the barriers already forbid that the compiler reorders code. How on earth
> is the compiler supposed to optimize the dec/inc out?

Order things like:

> #define sched_preempt_enable_no_resched() \
> do { \
>         barrier(); \
>         preempt_count_dec(); \
> } while (0)

> #define preempt_disable() \
> do { \
>         preempt_count_inc(); \
>         barrier(); \
> } while (0)

And there is no barrier between the dec and inc, and a smarty pants
compiler could just decide to forgo the update, since in program order
there is no observable difference either way.

Making the thing volatile tells the compiler there can be external
observations of the memory and it cannot assume things like that and
must emit the operations.

You're right in that the 'proper' sequence:

> #define preempt_enable() \
> do { \
>         barrier(); \
>         if (unlikely(preempt_count_dec_and_test())) \
>                 __preempt_schedule(); \
> } while (0)

> #define preempt_disable() \
> do { \
>         preempt_count_inc(); \
>         barrier(); \
> } while (0)

Has a higher chance of succeeding to emit the operations to memory; but
an even smarter pants compiler might figure doing something like:

	if (preempt_count() == 1)
		__preempt_schedule();

is equivalent and emits that instead, not bothering to modify the actual
variable at all -- the program as specified cannot tell the difference
etc..

Also; in the case of !PREEMPT && PREEMPT_COUNT, the normal:

	preempt_disable();
	preempt_enable();

sequence turns into the first case again.

So I'll go write a proper changelog for the volatile thing and get it
merged with a Cc to stable.