linux-kernel - Re: [PATCH v4 0/5] x86: two-phase syscall tracing and seccomp fastpath

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrUafpWfnbfZzgu3qSGqyxcG0+6A=A1RE8g++=GrQKD93Q@mail.gmail.com>
Date:	Wed, 30 Jul 2014 10:25:10 -0700
From:	Andy Lutomirski <luto@...capital.net>
To:	Frederic Weisbecker <fweisbec@...il.com>
Cc:	Oleg Nesterov <oleg@...hat.com>, "H. Peter Anvin" <hpa@...or.com>,
	linux-arch <linux-arch@...r.kernel.org>, X86 ML <x86@...nel.org>,
	LSM List <linux-security-module@...r.kernel.org>,
	Linux MIPS Mailing List <linux-mips@...ux-mips.org>,
	"linux-arm-kernel@...ts.infradead.org" 
	<linux-arm-kernel@...ts.infradead.org>,
	Alexei Starovoitov <ast@...mgrid.com>,
	Will Drewry <wad@...omium.org>,
	Kees Cook <keescook@...omium.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v4 0/5] x86: two-phase syscall tracing and seccomp fastpath

On Wed, Jul 30, 2014 at 9:59 AM, Frederic Weisbecker <fweisbec@...il.com> wrote:
> On Tue, Jul 29, 2014 at 04:30:58PM -0700, Andy Lutomirski wrote:
>> On Tue, Jul 29, 2014 at 1:54 PM, Andy Lutomirski <luto@...capital.net> wrote:
>> > On Jul 29, 2014 12:22 PM, "Oleg Nesterov" <oleg@...hat.com> wrote:
>> >>
>> >> Andy, to avoid the confusion: I am not trying to review this changes.
>> >> As you probably know my understanding of asm code in entry.S is very
>> >> limited.
>> >>
>> >> Just a couple of questions to ensure I understand this correctly.
>> >>
>> >> On 07/28, Andy Lutomirski wrote:
>> >> >
>> >> > This is both a cleanup and a speedup.  It reduces overhead due to
>> >> > installing a trivial seccomp filter by 87%.  The speedup comes from
>> >> > avoiding the full syscall tracing mechanism for filters that don't
>> >> > return SECCOMP_RET_TRACE.
>> >>
>> >> And only after I look at 5/5 I _seem_ to actually understand where
>> >> this speedup comes from.
>> >>
>> >> So. Currently tracesys: path always lead to "iret" after syscall, with
>> >> this change we can avoid it if phase_1() returns zero, correct?
>> >>
>> >> And, this also removes the special TIF_SYSCALL_AUDIT-only case in entry.S,
>> >> cool.
>> >>
>> >> I am wondering if we can do something similar with do_notify_resume() ?
>> >>
>> >>
>> >> Stupid question. To simplify, lets forget that syscall_trace_enter()
>> >> already returns the value. Can't we simplify the asm code if we do
>> >> not export 2 functions, but make syscall_trace_enter() return
>> >> "bool slow_path_is_needed". So that "tracesys:" could do
>> >>
>> >>         // pseudo code
>> >>
>> >> tracesys:
>> >>         SAVE_REST
>> >>         FIXUP_TOP_OF_STACK
>> >>
>> >>         call syscall_trace_enter
>> >>
>> >>         if (!slow_path_is_needed) {
>> >>                 addq REST_SKIP, %rsp
>> >>                 jmp system_call_fastpath
>> >>         }
>> >>
>> >>         ...
>> >>
>> >> ?
>> >>
>> >> Once again, I am just curious, it is not that I actually suggest to consider
>> >> this option.
>> >
>> > We could, but this would lose a decent amount of the speedup.  I could
>> > try it and benchmark it, but I'm guessing that the save and restore is
>> > kind of expensive.  This will make audit slower than it currently is,
>> > which may also annoy some people.  (Not me.)
>> >
>> > I'm also not convinced that it would be much simpler.  My code is currently:
>> >
>> > tracesys:
>> >     leaq -REST_SKIP(%rsp), %rdi
>> >     movq $AUDIT_ARCH_X86_64, %rsi
>> >     call syscall_trace_enter_phase1
>> >     test %rax, %rax
>> >     jnz tracesys_phase2        /* if needed, run the slow path */
>> >     LOAD_ARGS 0            /* else restore clobbered regs */
>> >     jmp system_call_fastpath    /*      and return to the fast path */
>> >
>> > tracesys_phase2:
>> >     SAVE_REST
>> >     FIXUP_TOP_OF_STACK %rdi
>> >     movq %rsp, %rdi
>> >     movq $AUDIT_ARCH_X86_64, %rsi
>> >     movq %rax,%rdx
>> >     call syscall_trace_enter_phase2
>> >
>> >     LOAD_ARGS ARGOFFSET, 1
>> >     RESTORE_REST
>> >
>> >     ... slow path here ...
>> >
>> > It would end up looking more like (totally untested):
>> >
>> > tracesys:
>> >     SAVE_REST
>> >     FIXUP_TOP_OF_STACK %rdi
>> >     mov %rsp, %rdi
>> >     movq $AUDIT_ARCH_X86_64, %rsi
>> >     call syscall_trace_enter
>> >     LOAD_ARGS
>> >     RESTORE_REST
>> >     test [whatever condition]
>> >     j[cond] system_call_fastpath
>> >
>> >     ... slow path here ...
>> >
>> > So it's a bit simpler.  Oddly, the ia32entry code doesn't have this
>> > multiple syscall path distinction.
>> >
>> > SAVE_REST is 6 movq instructions and a subq.  FIXUP_TOP_OF_STACK is 7
>> > movqs (and 8 if I ever get my way).  RESTORE_TOP_OF_STACK is 4.
>> > RESTORE_REST is 6 movqs and an adsq.  So we're talking about avoiding
>> > 21 movqs, and addq, and a subq.  That may be significant.  (And I
>> > suspect that the difference is much larger on platforms like arm64,
>> > but that's a separate issue.)
>>
>> To put some more options on the table: there's an argument to be made
>> that the whole fast-path/slow-path split isn't worth it.  We could
>> unconditionally set up a full frame for all syscalls.  This means:
>>
>> No FIXUP_TOP_OF_STACK.  Instead, the system_call entry sets up RSP,
>> SS, CS, RCX, and EFLAGS right away.  That's five stores for all
>> syscalls instead of two loads and five stores for syscalls that need
>> it.  But it also gets rid of RESTORE_TOP_OF_STACK completely.
>>
>> No more bugs involving C code that assumes a full stack frame when no
>> such frame exists inside syscall code.
>>
>> We could possibly remove a whole bunch of duplicated code.
>>
>> The upshot would be simpler code, faster slow-path syscalls, and
>> slower fast-path syscalls (but probably not much slower).  On the
>> other hand, there's zero chance that this would be ready for 3.17.
>
> Indeed.
>
> If we ever take that direction (ie: remove that partial frame optimization),
> there is that common part that we can find in many archs when they return
> from syscalls, exceptions or interrupts which could be more or less consolidated as:
>
> void sysret_check(struct pt_regs *regs)
> {
>           while(test_thread_flag(TIF_ALLWORK_MASK)) {
>               int resched = need_resched();
>
>               local_irq_enable();
>               if (resched)
>                   schedule();
>               else
>                   do_notify_resume(regs);
>               local_irq_disable()
>            }
> }
>
> But well, probably the syscall fastpath is still worth it.

And yet x86_64 has this code implemented in assembly even in the
slowpath.  Go figure.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/