[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <57376E5E.50207@codeaurora.org>
Date: Sat, 14 May 2016 11:28:46 -0700
From: Vikram Mulukutla <markivx@...eaurora.org>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Peter Zijlstra <peterz@...radead.org>, linux-kernel@...r.kernel.org
Subject: Re: Additional compiler barrier required in
sched_preempt_enable_no_resched?
On 5/14/2016 8:39 AM, Thomas Gleixner wrote:
> On Fri, 13 May 2016, Vikram Mulukutla wrote:
>> On 5/13/2016 7:58 AM, Peter Zijlstra wrote:
>>> diff --git a/include/asm-generic/preempt.h b/include/asm-generic/preempt.h
>>> index 5d8ffa3e6f8c..c1cde3577551 100644
>>> --- a/include/asm-generic/preempt.h
>>> +++ b/include/asm-generic/preempt.h
>>> @@ -7,10 +7,10 @@
>>>
>>> static __always_inline int preempt_count(void)
>>> {
>>> - return current_thread_info()->preempt_count;
>>> + return READ_ONCE(current_thread_info()->preempt_count);
>>> }
>>>
>>> -static __always_inline int *preempt_count_ptr(void)
>>> +static __always_inline volatile int *preempt_count_ptr(void)
>>> {
>>> return ¤t_thread_info()->preempt_count;
>>> }
>>>
>>
>> Thanks Peter, this patch worked for me. The compiler no longer optimizes out
>> the increment/decrement of the preempt_count.
>
> I have a hard time to understand why the compiler optimizes out stuff w/o that
> patch.
>
> We already have:
>
> #define preempt_disable() \
> do { \
> preempt_count_inc(); \
> barrier(); \
> } while (0)
>
> #define sched_preempt_enable_no_resched() \
> do { \
> barrier(); \
> preempt_count_dec(); \
> } while (0)
>
> #define preempt_enable() \
> do { \
> barrier(); \
> if (unlikely(preempt_count_dec_and_test())) \
> __preempt_schedule(); \
> } while (0)
>
> So the barriers already forbid that the compiler reorders code. How on earth
> is the compiler supposed to optimize the dec/inc out?
>
While I cannot claim more than a very rudimentary knowledge of
compilers, this was the initial reaction that I had as well. But then
the barriers are in the wrong spots for the way the code was used in the
driver in question. preempt_disable has the barrier() _after_ the
increment, and sched_preempt_enable_no_resched has it _before_ the
decrement. Therefore, if one were to use preempt_enable_no_resched
followed by preempt_disable, there is no compiler barrier between the
decrement and the increment statements. Whether this should translate to
such a seemingly aggressive optimization is something I'm not completely
sure of.
> There is more code than the preempt stuff depending on barrier() and expecting
> that the compiler does not optimize out things at will. Are we supposed to
> hunt all occurences and amend them with READ_ONCE or whatever one by one?
>
I think the barrier is working as intended for the sequence of
preempt_disable followed by preempt_enable_no_resched.
> Thanks,
>
> tglx
>
>
>
As a simple experiment I used gcc on x86 on the following C program.
This was really to convince myself rather than you and Peter!
#include <stdio.h>
#define barrier() __asm__ __volatile__("": : :"memory")
struct thread_info {
int pc;
};
#define preempt_count() \
ti_ptr->pc
#define preempt_disable() \
do \
{ \
preempt_count() += 1; \
barrier(); \
} \
while (0)
#define preempt_enable() \
do \
{ \
barrier(); \
preempt_count() -= 1; \
} \
while (0)
struct thread_info *ti_ptr;
int main(void)
{
struct thread_info ti;
ti_ptr = &ti;
int b;
preempt_enable();
b = b + 500;
preempt_disable();
printf("b = %d\n", b);
return 0;
}
With gcc -O2 I get this (the ARM compiler behaves similarly):
0000000000400470 <main>:
400470: 48 83 ec 18 sub $0x18,%rsp
400474: 48 89 25 cd 0b 20 00 mov %rsp,0x200bcd(%rip)
40047b: ba f4 01 00 00 mov $0x1f4,%edx
400480: be 14 06 40 00 mov $0x400614,%esi
400485: bf 01 00 00 00 mov $0x1,%edi
40048a: 31 c0 xor %eax,%eax
40048c: e8 cf ff ff ff callq 400460 <__printf_chk@plt>
400491: 31 c0 xor %eax,%eax
400493: 48 83 c4 18 add $0x18,%rsp
400497: c3 retq
If I swap preempt_enable and preempt_disable I get this:
0000000000400470 <main>:
400470: 48 83 ec 18 sub $0x18,%rsp
400474: 48 89 25 cd 0b 20 00 mov %rsp,0x200bcd(%rip)
40047b: 83 04 24 01 addl $0x1,(%rsp)
40047f: 48 8b 05 c2 0b 20 00 mov 0x200bc2(%rip),%rax
400486: ba f4 01 00 00 mov $0x1f4,%edx
40048b: be 14 06 40 00 mov $0x400614,%esi
400490: bf 01 00 00 00 mov $0x1,%edi
400495: 83 28 01 subl $0x1,(%rax)
400498: 31 c0 xor %eax,%eax
40049a: e8 c1 ff ff ff callq 400460 <__printf_chk@plt>
40049f: 31 c0 xor %eax,%eax
4004a1: 48 83 c4 18 add $0x18,%rsp
4004a5: c3 retq
Note: If I place ti_ptr on the stack, the ordering/barriers no longer
matter, the inc/dec is always optimized out. I suppose the compiler does
treat current_thread_info as coming from a different memory location
rather than the current stack frame. In any case, IMHO the volatile
preempt_count patch is necessary.
Thanks,
Vikram
--
Powered by blists - more mailing lists