[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrUB--yazgvVhGs6eiJ9DCCQDxfh1xyKQrLrVG=tahoHEA@mail.gmail.com>
Date: Mon, 31 Aug 2015 18:37:58 -0700
From: Andy Lutomirski <luto@...capital.net>
To: Brian Gerst <brgerst@...il.com>
Cc: X86 ML <x86@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...nel.org>,
"H. Peter Anvin" <hpa@...or.com>,
Denys Vlasenko <dvlasenk@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH 0/7] x86 vdso32 cleanups
On Mon, Aug 31, 2015 at 6:19 PM, Andy Lutomirski <luto@...capital.net> wrote:
>
> On Sun, Aug 30, 2015 at 7:52 PM, Andy Lutomirski <luto@...capital.net> wrote:
>>
>> On Sun, Aug 30, 2015 at 2:18 PM, Brian Gerst <brgerst@...il.com> wrote:
>> > On Sat, Aug 29, 2015 at 12:10 PM, Andy Lutomirski <luto@...capital.net> wrote:
>> >> On Sat, Aug 29, 2015 at 8:20 AM, Brian Gerst <brgerst@...il.com> wrote:
>> >>> This patch set contains several cleanups to the 32-bit VDSO. The
>> >>> main change is to only build one VDSO image, and select the syscall
>> >>> entry point at runtime.
>> >>
>> >> Oh no, we have dueling patches!
>> >>
>> >> I have a 2/3 finished series that cleans up the AT_SYSINFO mess
>> >> differently, as I outlined earlier. I've only done the compat and
>> >> common bits (no 32-bit native support quite yet), and it enters
>> >> successfully on Intel using SYSENTER and on (fake) AMD using SYSCALL.
>> >> The SYSRET bit isn't there yet.
>> >>
>> >> Other than some ifdeffery, the final system_call.S looks like this:
>> >>
>> >> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/tree/arch/x86/entry/vdso/vdso32/system_call.S?h=x86/entry_compat
>> >>
>> >> The meat is (sorry for whitespace damage):
>> >>
>> >> .text
>> >> .globl __kernel_vsyscall
>> >> .type __kernel_vsyscall,@function
>> >> ALIGN
>> >> __kernel_vsyscall:
>> >> CFI_STARTPROC
>> >> /*
>> >> * Reshuffle regs so that all of any of the entry instructions
>> >> * will preserve enough state.
>> >> */
>> >> pushl %edx
>> >> CFI_ADJUST_CFA_OFFSET 4
>> >> CFI_REL_OFFSET edx, 0
>> >> pushl %ecx
>> >> CFI_ADJUST_CFA_OFFSET 4
>> >> CFI_REL_OFFSET ecx, 0
>> >> movl %esp, %ecx
>> >>
>> >> #ifdef CONFIG_X86_64
>> >> /* If SYSENTER is available, use it. */
>> >> ALTERNATIVE_2 "", "sysenter", X86_FEATURE_SYSENTER32, \
>> >> "syscall", X86_FEATURE_SYSCALL32
>> >> #endif
>> >>
>> >> /* Enter using int $0x80 */
>> >> movl (%esp), %ecx
>> >> int $0x80
>> >> GLOBAL(int80_landing_pad)
>> >>
>> >> /* Restore ECX and EDX in case they were clobbered. */
>> >> popl %ecx
>> >> CFI_RESTORE ecx
>> >> CFI_ADJUST_CFA_OFFSET -4
>> >> popl %edx
>> >> CFI_RESTORE edx
>> >> CFI_ADJUST_CFA_OFFSET -4
>> >> ret
>> >> CFI_ENDPROC
>> >>
>> >> .size __kernel_vsyscall,.-__kernel_vsyscall
>> >> .previous
>> >>
>> >> And that's it.
>> >>
>> >> What do you think? This comes with massively cleaned up kernel-side
>> >> asm as well as a test case that actually validates the CFI directives.
>> >>
>> >> Certainly, a bunch of your patches make sense regardless, and I'll
>> >> review them and add them to my queue soon.
>> >>
>> >> --Andy
>> >
>> > How does the performance compare to the original? Looking at the
>> > disassembly, there are two added function calls, and it reloads the
>> > args from the stack instead of just shuffling registers.
>>
>> The replacement is dramatically faster, which means I probably
>> benchmarked it wrong. I'll try again in a day or two.
>
>
> It's enough slower to be problematic. I need to figure out how to trace it properly. (Hmm? Maybe it's time to learn how to get perf on the host to trace a KVM guest.)
>
> Everything is and was hilariously slow with context tracking on. That needs to get fixed, and hopefully once this entry stuff is done someone will do the other end of it.
>
I got random errors from perf kvm, but I think I found at least part
of the issue. The two irqs_disabled() calls in common.c are kind of
expensive. I should disable them on non-lockdep kernels.
The context tracking hooks are also too expensive, even when disabled.
I should do something to optimize those. Hello, static keys? This
doesn't affect syscalls, though.
With context tracking off and the irqs_disabled checks commented out,
we're probably doing well enough. We can always tweak the C code and
aggressively force inlining if we want a few cycles back.
--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists