[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160526222943.GA16729@MBP.local>
Date: Thu, 26 May 2016 23:29:45 +0100
From: Catalin Marinas <catalin.marinas@....com>
To: Yury Norov <ynorov@...iumnetworks.com>
Cc: David Miller <davem@...emloft.net>, arnd@...db.de,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
linux-doc@...r.kernel.org, linux-arch@...r.kernel.org,
linux-s390@...r.kernel.org, libc-alpha@...rceware.org,
schwidefsky@...ibm.com, heiko.carstens@...ibm.com,
pinskia@...il.com, broonie@...nel.org, joseph@...esourcery.com,
christoph.muellner@...obroma-systems.com,
bamvor.zhangjian@...wei.com, szabolcs.nagy@....com,
klimov.linux@...il.com, Nathan_Lynch@...tor.com, agraf@...e.de,
Prasun.Kapoor@...iumnetworks.com, kilobyte@...band.pl,
geert@...ux-m68k.org, philipp.tomsich@...obroma-systems.com
Subject: Re: [PATCH 01/23] all: syscall wrappers: add documentation
On Thu, May 26, 2016 at 11:48:19PM +0300, Yury Norov wrote:
> On Wed, May 25, 2016 at 02:28:21PM -0700, David Miller wrote:
> > From: Arnd Bergmann <arnd@...db.de>
> > Date: Wed, 25 May 2016 23:01:06 +0200
> >
> > > On Wednesday, May 25, 2016 1:50:39 PM CEST David Miller wrote:
> > >> From: Arnd Bergmann <arnd@...db.de>
> > >> Date: Wed, 25 May 2016 22:47:33 +0200
> > >>
> > >> > If we use the normal calling conventions, we could remove these overrides
> > >> > along with the respective special-case handling in glibc. None of them
> > >> > look particularly performance-sensitive, but I could be wrong there.
> > >>
> > >> You could set the lowest bit in the system call entry pointer to indicate
> > >> the upper-half clears should be elided.
> > >
> > > Right, but that would introduce an extra conditional branch in the syscall
> > > hotpath, and likely eliminate the gains from passing the loff_t arguments
> > > in a single register instead of a pair.
> >
> > Ok, then, how much are you really gaining from avoiding a 'shift' and
> > an 'or' to build the full 64-bit value? 3 cycles? Maybe 4?
>
> 4 cycles in kernel and ~same cost in glibc to create a pair.
It would take a single instruction per argument in the kernel to do
shift+or and maybe 1-2 more instructions to move the remaining arguments
in place (we do this for a few wrappers in arch/arm64/kernel/entry32.S).
And the glibc counterpart.
> And 8 'mov's that exist for every syscall, even yield().
>
> > And the executing the wrappers, those have a non-trivial cost too.
>
> The cost is pretty trivial though. See kernel/compat_wrapper.o:
> COMPAT_SYSCALL_WRAP2(creat, const char __user *, pathname, umode_t, mode);
> 0: a9bf7bfd stp x29, x30, [sp,#-16]!
> 4: 910003fd mov x29, sp
> 8: 2a0003e0 mov w0, w0
> c: 94000000 bl 0 <sys_creat>
> 10: a8c17bfd ldp x29, x30, [sp],#16
> 14: d65f03c0 ret
I would say the above could be more expensive than 8 movs (16 bytes to
write, read, a branch and a ret). You can also add the I-cache locality,
having wrappers for each syscalls instead of a single place for zeroing
the upper half (where no other wrapper is necessary).
Can we trick the compiler into doing a tail call optimisation. This
could have simply been:
COMPAT_SYSCALL_WRAP2(creat, ...):
mov w0, w0
b <sys_creat>
> > Cost wise, this seems like it all cancels out in the end, but what
> > do I know?
>
> I think you know something, and I also think Heiko and other s390 guys
> know something as well. So I'd like to listen their arguments here.
>
> For me spark64 way is looking reasonable only because it's really simple
> and takes less coding. I'll try it on some branch and share here what happened.
The kernel code will definitely look simpler ;). It would be good to see
if there actually is any performance impact. Even with 16 more cycles on
syscall entry, would they be lost in the noise? You don't need a full
implementation, just some dummy mov x0, x0 on the entry path.
--
Catalin
Powered by blists - more mailing lists