lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Sat, 14 May 2016 11:28:46 -0700 From: Vikram Mulukutla <markivx@...eaurora.org> To: Thomas Gleixner <tglx@...utronix.de> Cc: Peter Zijlstra <peterz@...radead.org>, linux-kernel@...r.kernel.org Subject: Re: Additional compiler barrier required in sched_preempt_enable_no_resched? On 5/14/2016 8:39 AM, Thomas Gleixner wrote: > On Fri, 13 May 2016, Vikram Mulukutla wrote: >> On 5/13/2016 7:58 AM, Peter Zijlstra wrote: >>> diff --git a/include/asm-generic/preempt.h b/include/asm-generic/preempt.h >>> index 5d8ffa3e6f8c..c1cde3577551 100644 >>> --- a/include/asm-generic/preempt.h >>> +++ b/include/asm-generic/preempt.h >>> @@ -7,10 +7,10 @@ >>> >>> static __always_inline int preempt_count(void) >>> { >>> - return current_thread_info()->preempt_count; >>> + return READ_ONCE(current_thread_info()->preempt_count); >>> } >>> >>> -static __always_inline int *preempt_count_ptr(void) >>> +static __always_inline volatile int *preempt_count_ptr(void) >>> { >>> return ¤t_thread_info()->preempt_count; >>> } >>> >> >> Thanks Peter, this patch worked for me. The compiler no longer optimizes out >> the increment/decrement of the preempt_count. > > I have a hard time to understand why the compiler optimizes out stuff w/o that > patch. > > We already have: > > #define preempt_disable() \ > do { \ > preempt_count_inc(); \ > barrier(); \ > } while (0) > > #define sched_preempt_enable_no_resched() \ > do { \ > barrier(); \ > preempt_count_dec(); \ > } while (0) > > #define preempt_enable() \ > do { \ > barrier(); \ > if (unlikely(preempt_count_dec_and_test())) \ > __preempt_schedule(); \ > } while (0) > > So the barriers already forbid that the compiler reorders code. How on earth > is the compiler supposed to optimize the dec/inc out? > While I cannot claim more than a very rudimentary knowledge of compilers, this was the initial reaction that I had as well. But then the barriers are in the wrong spots for the way the code was used in the driver in question. preempt_disable has the barrier() _after_ the increment, and sched_preempt_enable_no_resched has it _before_ the decrement. Therefore, if one were to use preempt_enable_no_resched followed by preempt_disable, there is no compiler barrier between the decrement and the increment statements. Whether this should translate to such a seemingly aggressive optimization is something I'm not completely sure of. > There is more code than the preempt stuff depending on barrier() and expecting > that the compiler does not optimize out things at will. Are we supposed to > hunt all occurences and amend them with READ_ONCE or whatever one by one? > I think the barrier is working as intended for the sequence of preempt_disable followed by preempt_enable_no_resched. > Thanks, > > tglx > > > As a simple experiment I used gcc on x86 on the following C program. This was really to convince myself rather than you and Peter! #include <stdio.h> #define barrier() __asm__ __volatile__("": : :"memory") struct thread_info { int pc; }; #define preempt_count() \ ti_ptr->pc #define preempt_disable() \ do \ { \ preempt_count() += 1; \ barrier(); \ } \ while (0) #define preempt_enable() \ do \ { \ barrier(); \ preempt_count() -= 1; \ } \ while (0) struct thread_info *ti_ptr; int main(void) { struct thread_info ti; ti_ptr = &ti; int b; preempt_enable(); b = b + 500; preempt_disable(); printf("b = %d\n", b); return 0; } With gcc -O2 I get this (the ARM compiler behaves similarly): 0000000000400470 <main>: 400470: 48 83 ec 18 sub $0x18,%rsp 400474: 48 89 25 cd 0b 20 00 mov %rsp,0x200bcd(%rip) 40047b: ba f4 01 00 00 mov $0x1f4,%edx 400480: be 14 06 40 00 mov $0x400614,%esi 400485: bf 01 00 00 00 mov $0x1,%edi 40048a: 31 c0 xor %eax,%eax 40048c: e8 cf ff ff ff callq 400460 <__printf_chk@plt> 400491: 31 c0 xor %eax,%eax 400493: 48 83 c4 18 add $0x18,%rsp 400497: c3 retq If I swap preempt_enable and preempt_disable I get this: 0000000000400470 <main>: 400470: 48 83 ec 18 sub $0x18,%rsp 400474: 48 89 25 cd 0b 20 00 mov %rsp,0x200bcd(%rip) 40047b: 83 04 24 01 addl $0x1,(%rsp) 40047f: 48 8b 05 c2 0b 20 00 mov 0x200bc2(%rip),%rax 400486: ba f4 01 00 00 mov $0x1f4,%edx 40048b: be 14 06 40 00 mov $0x400614,%esi 400490: bf 01 00 00 00 mov $0x1,%edi 400495: 83 28 01 subl $0x1,(%rax) 400498: 31 c0 xor %eax,%eax 40049a: e8 c1 ff ff ff callq 400460 <__printf_chk@plt> 40049f: 31 c0 xor %eax,%eax 4004a1: 48 83 c4 18 add $0x18,%rsp 4004a5: c3 retq Note: If I place ti_ptr on the stack, the ordering/barriers no longer matter, the inc/dec is always optimized out. I suppose the compiler does treat current_thread_info as coming from a different memory location rather than the current stack frame. In any case, IMHO the volatile preempt_count patch is necessary. Thanks, Vikram --
Powered by blists - more mailing lists