lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160310164104.GM9349@brightrain.aerifal.cx>
Date:	Thu, 10 Mar 2016 11:41:05 -0500
From:	Rich Felker <dalias@...c.org>
To:	Ingo Molnar <mingo@...nel.org>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andy Lutomirski <luto@...nel.org>,
	the arch/x86 maintainers <x86@...nel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Borislav Petkov <bp@...en8.de>,
	"musl@...ts.openwall.com" <musl@...ts.openwall.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: [musl] Re: [RFC PATCH] x86/vdso/32: Add AT_SYSINFO cancellation
 helpers

On Thu, Mar 10, 2016 at 12:16:46PM +0100, Ingo Molnar wrote:
> 
> * Rich Felker <dalias@...c.org> wrote:
> 
> > [...]
> >
> > I believe a new kernel cancellation API with a sticky cancellation flag (rather 
> > than a signal), and a flag or'd onto the syscall number to make it cancellable 
> > at the call point, could work, but then userspace needs to support fairly 
> > different old and new kernel APIs in order to be able to run on old kernels 
> > while also taking advantage of new ones, and it's not clear to me that it would 
> > actually be worthwhile to do so. I could see doing it for a completely new 
> > syscall API, but as a second syscall API for a system that already has one it 
> > seems gratuitous. From my perspective the existing approach (checking program 
> > counter from signal handler) is very clean and simple. After all it made enough 
> > sense that I was able to convince the glibc folks to adopt it.
> 
> I concur with your overall analysis, but things get a bit messy once we consider 
> AT_SYSINFO which is a non-atomic mix of user-space and kernel-space code. Trying 
> to hand cancellation status through that results in extra complexity:
> 
>  arch/x86/entry/vdso/Makefile                      |   3 +-
>  arch/x86/entry/vdso/vdso32/cancellation_helpers.c | 116 ++++++++++++++++++++++
>  arch/x86/entry/vdso/vdso32/vdso32.lds.S           |   2 +
>  tools/testing/selftests/x86/unwind_vdso.c         |  57 +++++++++--
>  4 files changed, 171 insertions(+), 7 deletions(-)
> 
> So instead of a sticky cancellation flag, we could introduce a sticky cancellation 
> signal.
> 
> A 'sticky signal' is not cleared from signal_pending() when the signal handler 
> executes, but it's automatically blocked so no signal handler recursion occurs.
> (A sticky signal could still be cleared via a separate mechanism, by the 
>  cancellation cleanup code.)
> 
> Such a 'sticky cancellation signal' would, in the racy situation, cause new 
> blocking system calls to immediately return with -EINTR. Non-blocking syscalls 
> could still be used. (So the cancellation signal handler itself would still have 
> access to various fundamental system calls.)
> 
> I think this would avoid messy coupling between the kernel's increasingly more 
> varied system call entry code and C libraries.
> 
> Sticky signals could be requested via a new SA_ flag.
> 
> What do you think?

This still doesn't address the issue that the code making the syscall
needs to be able to control whether it's cancellable or not. Not only
do some syscalls whose public functions are cancellation points need
to be used internally in non-cancellable ways; there's also the
pthread_setcancelstate interface that allows deferring cancellation so
that it's possible to call functions which are cancellation points
without invoking cancellation.

Ideally all syscalls would be like pselect/ppoll and take a sigset_t
to unmask/remask atomically with respect to the syscall action. Then
implementing cancellation (as well as using EINTR race-free) would be
trivial. But this is obviously not a practical change to make.

>From my standpoint the simplest and cleanest solution is for vdso to
provide a predicate function that takes a ucontext_t and returns
true/false for whether it represents a state prior to entering (or
reentering, for restart state) the vdso syscall. If vdso exports this
symbol libc can use vdso syscall with cancellation. If not, it can
just fallback to straight inline syscall like now.

Rich

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ