linux-kernel - Re: [PATCH 1/5] x86/percpu: Differentiate this_cpu_{}() and __this_cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <634996DB-18B4-4008-82C0-ADD7A6D84B22@vmware.com>
Date:   Wed, 27 Feb 2019 18:55:37 +0000
From:   Nadav Amit <namit@...are.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
CC:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>, Borislav Petkov <bp@...en8.de>,
        Thomas Gleixner <tglx@...utronix.de>,
        Andrew Lutomirski <luto@...nel.org>,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
        Matthew Wilcox <willy@...radead.org>
Subject: Re: [PATCH 1/5] x86/percpu: Differentiate this_cpu_{}() and
 __this_cpu_{}()

> On Feb 27, 2019, at 9:57 AM, Nadav Amit <namit@...are.com> wrote:
> 
>> On Feb 27, 2019, at 8:14 AM, Linus Torvalds <torvalds@...ux-foundation.org> wrote:
>> 
>> On Wed, Feb 27, 2019 at 2:16 AM Peter Zijlstra <peterz@...radead.org> wrote:
>>> Nadav Amit reported that commit:
>>> 
>>> b59167ac7baf ("x86/percpu: Fix this_cpu_read()")
>>> 
>>> added a bunch of constraints to all sorts of code; and while some of
>>> that was correct and desired, some of that seems superfluous.
>> 
>> Trivial (but entirely untested) patch attached.
>> 
>> That said, I didn't actually check how it affects code generation.
>> Nadav, would you check the code sequences you originally noticed?
> 
> The original issue was raised while I was looking into a dropped patch of
> Matthew Wilcox that caused code size increase [1]. As a result I noticed
> that Peter’s patch caused big changes to the generated assembly across the
> kernel - I did not have a specific scenario that I cared about.
> 
> The patch you sent (“+m/-volatile”) does increase the code size by 1728
> bytes. Although code size is not the only metric for “code optimization”,
> the original patch of Peter (“volatile”) only increased the code size by 201
> bytes. Peter’s original change also affected only 72 functions vs 228 that
> impacted by the new patch.
> 
> I’ll have a look at some specific function assembly, but overall, the “+m”
> approach might prevent even more code optimizations than the “volatile” one.
> 
> I’ll send an example or two later.

Here is one example:

Dump of assembler code for function event_filter_pid_sched_wakeup_probe_pre:
   0xffffffff8117c510 <+0>:	push   %rbp
   0xffffffff8117c511 <+1>:	mov    %rsp,%rbp
   0xffffffff8117c514 <+4>:	push   %rbx
   0xffffffff8117c515 <+5>:	mov    0x28(%rdi),%rax
   0xffffffff8117c519 <+9>:	mov    %gs:0x78(%rax),%dl
   0xffffffff8117c51d <+13>:	test   %dl,%dl
   0xffffffff8117c51f <+15>:	je     0xffffffff8117c535 <event_filter_pid_sched_wakeup_probe_pre+37>
   0xffffffff8117c521 <+17>:	mov    %rdi,%rax
   0xffffffff8117c524 <+20>:	mov    0x78(%rdi),%rdi
   0xffffffff8117c528 <+24>:	mov    0x28(%rax),%rbx   # REDUNDANT
   0xffffffff8117c52c <+28>:	callq  0xffffffff81167830 <trace_ignore_this_task>
   0xffffffff8117c531 <+33>:	mov    %al,%gs:0x78(%rbx)
   0xffffffff8117c535 <+37>:	pop    %rbx
   0xffffffff8117c536 <+38>:	pop    %rbp
   0xffffffff8117c537 <+39>:	retq   

The instruction at 0xffffffff8117c528 is redundant, and does not exist
without the recent patch. It seems to be a result of no-strict-aliasing,
which due to the new "memory write” (“+m”) causes the compiler to re-read
the data.