[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <DD7599E0-BB19-4BEC-B598-B0B29ACF5A36@zytor.com>
Date: Mon, 18 Jan 2016 17:33:03 -0800
From: "H. Peter Anvin" <hpa@...or.com>
To: Borislav Petkov <bp@...e.de>, Brian Gerst <brgerst@...il.com>
CC: the arch/x86 maintainers <x86@...nel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...nel.org>,
Denys Vlasenko <dvlasenk@...hat.com>,
Andy Lutomirski <luto@...capital.net>,
Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH] x86: static_cpu_has_safe: discard dynamic check after init
On January 18, 2016 5:10:26 PM PST, Borislav Petkov <bp@...e.de> wrote:
>On Mon, Jan 18, 2016 at 07:51:07PM +0100, Borislav Petkov wrote:
>> [45] .static_cpu_has PROGBITS ffffffff97aa655b 16ea655b
>> 00000000000002df 0000000000000000 AX 0 0 1
>
>Ok, staring at this section was wrong. I went and looked at
>the .s file and now it clicked: gcc inlines those calls to
>__static_cpu_has_safe of the dynamic jump target:
>
> .loc 4 538 0
> movl $125, %edi #,
> call __static_cpu_has_safe #
>.LBE885:
>.LBE886:
>.LBE914:
> .loc 1 240 0
> testb %al, %al # D.30157
> je .L150 #,
> jmp .L151 #
>.L152:
>.LBB915:
>.LBB909:
>.LBB905:
>.LBB893:
>.LBB892:
> .loc 4 538 0
> movl $154, %edi #,
> call __static_cpu_has_safe #
>.LBE892:
>.LBE893:
>.LBE905:
> .loc 7 431 0
> testb %al, %al # D.30157
> jne .L154 #,
>
>which turn into:
>
> 751: bf 7d 00 00 00 mov $0x7d,%edi
> 756: e8 00 00 00 00 callq 75b <fpu__copy+0xab>
> 75b: 84 c0 test %al,%al
> 75d: 74 a3 je 702 <fpu__copy+0x52>
> 75f: eb 90 jmp 6f1 <fpu__copy+0x41>
> 761: bf 9a 00 00 00 mov $0x9a,%edi
> 766: e8 00 00 00 00 callq 76b <fpu__copy+0xbb>
> 76b: 84 c0 test %al,%al
>
>That's like 28 bytes in this particular case, which is 14 per call
>site,
>on average.
>
>So grepping through my tailored vmlinux, it has 35 entries with
>X86_FEATURE_ALWAYS in .altinstructions X 14 = 490 bytes.
>
>So yeah, we probably should do this, allyesconfig should give more
>savings.
>
>One thing I'd probably do differently is not call the throwaway section
>.static_cpu_has but something like .altinstr_temporary or so and put it
>after the replacement insns:
>
>.altinstr_replacement : AT(ADDR(.altinstr_replacement) - LOAD_OFFSET) {
> *(.altinstr_replacement)
> *(.altinstr_temporary)
> }
>
>so that we know those instructions belong to the alternatives
>mechanism.
>They'll get discared too, of course.
>
>I could just as well be talking a lot of crap, it is waay too late
>here.
Why the f do we call a subroutine for what amounts to a single bt or test instruction?
--
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.
Powered by blists - more mailing lists