linux-kernel - Re: Kernel stack read with PTRACE_EVENT_EXIT and io

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YNDnY0niP+IfSx+X@zeniv-ca.linux.org.uk>
Date:   Mon, 21 Jun 2021 19:24:19 +0000
From:   Al Viro <viro@...iv.linux.org.uk>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     "Eric W. Biederman" <ebiederm@...ssion.com>,
        Michael Schmitz <schmitzmic@...il.com>,
        linux-arch <linux-arch@...r.kernel.org>,
        Jens Axboe <axboe@...nel.dk>, Oleg Nesterov <oleg@...hat.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Richard Henderson <rth@...ddle.net>,
        Ivan Kokshaysky <ink@...assic.park.msu.ru>,
        Matt Turner <mattst88@...il.com>,
        alpha <linux-alpha@...r.kernel.org>,
        Geert Uytterhoeven <geert@...ux-m68k.org>,
        linux-m68k <linux-m68k@...ts.linux-m68k.org>,
        Arnd Bergmann <arnd@...nel.org>,
        Ley Foon Tan <ley.foon.tan@...el.com>,
        Tejun Heo <tj@...nel.org>, Kees Cook <keescook@...omium.org>
Subject: Re: Kernel stack read with PTRACE_EVENT_EXIT and io_uring threads

On Mon, Jun 21, 2021 at 06:59:01PM +0000, Al Viro wrote:
> On Mon, Jun 21, 2021 at 01:54:56PM +0000, Al Viro wrote:
> > On Tue, Jun 15, 2021 at 02:58:12PM -0700, Linus Torvalds wrote:
> > 
> > > And I think our horrible "kernel threads return to user space when
> > > done" is absolutely horrifically nasty. Maybe of the clever sort, but
> > > mostly of the historical horror sort.
> > 
> > How would you prefer to handle that, then?  Separate magical path from
> > kernel_execve() to switch to userland?  We used to have something of
> > that sort, and that had been a real horror...
> > 
> > As it is, it's "kernel thread is spawned at the point similar to
> > ret_from_fork(), runs the payload (which almost never returns) and
> > then proceeds out to userland, same way fork(2) would've done."
> > That way kernel_execve() doesn't have to do anything magical.
> > 
> > Al, digging through the old notes and current call graph...
> 
> 	There's a large mess around do_exit() - we have a bunch of
> callers all over arch/*; if nothing else, I very much doubt that really
> want to let tracer play with a thread in the middle of die_if_kernel()
> or similar.
> 
> We sure as hell do not want to arrange for anything on the kernel
> stack in such situations, no matter what's done in exit(2)...

FWIW, on alpha it's die_if_kernel(), do_entUna() and do_page_fault(),
all in not-from-userland cases.  On m68k - die_if_kernel(), do_page_fault()
(both for non-from-userland cases) and something really odd - fpsp040_die().
Exception handling for floating point stuff on 68040?  Looks like it has
an open-coded copy_to_user()/copy_from_user(), with faults doing hard
do_exit(SIGSEGV) instead of raising a signal and trying to do something
sane...

I really don't want to try and figure out how painful would it be to
teach that code how to deal with faults - _testing_ anything in that
area sure as hell will be.  IIRC, details of recovery from FPU exceptions
on 68040 in the manual left impression of a minefield...