linux-kernel - Re: Proposal for finishing the 64-bit x86 syscall cleanup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150825081841.GA19412@gmail.com>
Date:	Tue, 25 Aug 2015 10:18:41 +0200
From:	Ingo Molnar <mingo@...nel.org>
To:	Andy Lutomirski <luto@...capital.net>
Cc:	X86 ML <x86@...nel.org>, Denys Vlasenko <dvlasenk@...hat.com>,
	Brian Gerst <brgerst@...il.com>,
	Borislav Petkov <bp@...en8.de>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Jan Beulich <jbeulich@...e.com>
Subject: Re: Proposal for finishing the 64-bit x86 syscall cleanup


* Andy Lutomirski <luto@...capital.net> wrote:

> Hi all-
> 
> I want to (try to) mostly or fully get rid of the messy bits (as
> opposed to the hardware-bs-forced bits) of the 64-bit syscall asm.
> There are two major conceptual things that are in the way.
> 
> Thing 1: partial pt_regs
> 
> 64-bit fast path syscalls don't fully initialize pt_regs: bx, bp, and
> r12-r15 are uninitialized.  Some syscalls require them to be
> initialized, and they have special awful stubs to do it.  The entry
> and exit tracing code (except for phase1 tracing) also need them
> initialized, and they have their own messy initialization.  Compat
> syscalls are their own private little mess here.
> 
> This gets in the way of all kinds of cleanups, because C code can't
> switch between the full and partial pt_regs states.
> 
> I can see two ways out.  We could remove the optimization entirely,
> which consists of pushing and popping six more registers and adds
> about ten cycles to fast path syscalls on Sandy Bridge.  It also
> simplifies and presumably speeds up the slow paths.

So out of hundreds of regular system calls there's only a handful of such system 
calls:

triton:~/tip> git grep stub arch/x86/entry/syscalls/
arch/x86/entry/syscalls/syscall_32.tbl:2        i386    fork                    sys_fork                        stub32_fork
arch/x86/entry/syscalls/syscall_32.tbl:11       i386    execve                  sys_execve                      stub32_execve
arch/x86/entry/syscalls/syscall_32.tbl:119      i386    sigreturn               sys_sigreturn                   stub32_sigreturn
arch/x86/entry/syscalls/syscall_32.tbl:120      i386    clone                   sys_clone                       stub32_clone
arch/x86/entry/syscalls/syscall_32.tbl:173      i386    rt_sigreturn            sys_rt_sigreturn                stub32_rt_sigreturn
arch/x86/entry/syscalls/syscall_32.tbl:190      i386    vfork                   sys_vfork                       stub32_vfork
arch/x86/entry/syscalls/syscall_32.tbl:358      i386    execveat                sys_execveat                    stub32_execveat
arch/x86/entry/syscalls/syscall_64.tbl:15       64      rt_sigreturn            stub_rt_sigreturn
arch/x86/entry/syscalls/syscall_64.tbl:56       common  clone                   stub_clone
arch/x86/entry/syscalls/syscall_64.tbl:57       common  fork                    stub_fork
arch/x86/entry/syscalls/syscall_64.tbl:58       common  vfork                   stub_vfork
arch/x86/entry/syscalls/syscall_64.tbl:59       64      execve                  stub_execve
arch/x86/entry/syscalls/syscall_64.tbl:322      64      execveat                stub_execveat
arch/x86/entry/syscalls/syscall_64.tbl:513      x32     rt_sigreturn            stub_x32_rt_sigreturn
arch/x86/entry/syscalls/syscall_64.tbl:520      x32     execve                  stub_x32_execve
arch/x86/entry/syscalls/syscall_64.tbl:545      x32     execveat                stub_x32_execveat

and none of them are super performance critical system calls, so no way would I go 
for unconditionally saving/restoring all of ptregs, just to make it a bit simpler 
for these syscalls.

> We could also annotate with syscalls need full regs and jump to the
> slow path for them.  This would leave the fast path unchanged (we
> could duplicate the sys call table so that regs-requiring syscalls
> would turn into some asm that switches to the slow path).  We'd make
> the syscall table say something like:
> 
> 59      64      execve                  sys_execve:regs
> 
> The fast path would have exactly identical performance and the slow
> path would presumably speed up.  The down side would be additional
> complexity.

The 'fast path performance unchanged' aspect definitely gives me warm fuzzy 
feelings.

Your suggested annotation would essentially be a syntactical cleanup, in that we'd 
auto-generate the stubs during build, instead of the current ugly open coded 
stubs? Or did you have something else in mind?


> Thing 2: vdso compilation with binutils that doesn't support .cfi directives
> 
> Userspace debuggers really like having the vdso properly
> CFI-annotated, and the 32-bit fast syscall entries are annotatied
> manually in hexidecimal.  AFAIK Jan Beulich is the only person who
> understands it.
> 
> I want to be able to change the entries a little bit to clean them up
> (and possibly rework the SYSCALL32 and SYSENTER register tricks, which
> currently suck), but it's really, really messy right now because of
> the hex CFI stuff.  Could we just drop the CFI annotations if the
> binutils version is too old or even just require new enough binutils
> to build 32-bit and compat kernels?

We could also test for those directives and not generate debuginfo on such 
tooling. Not generating debuginfo is still much better than failing the build.

I'm all for removing the hexa encoded debuginfo.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/