linux-kernel - Re: Kernel stack read with PTRACE_EVENT_EXIT and io

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c7144d54-539d-6aef-ed15-beea91836468@gmail.com>
Date:   Tue, 22 Jun 2021 12:01:06 +1200
From:   Michael Schmitz <schmitzmic@...il.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Al Viro <viro@...iv.linux.org.uk>
Cc:     "Eric W. Biederman" <ebiederm@...ssion.com>,
        linux-arch <linux-arch@...r.kernel.org>,
        Jens Axboe <axboe@...nel.dk>, Oleg Nesterov <oleg@...hat.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Richard Henderson <rth@...ddle.net>,
        Ivan Kokshaysky <ink@...assic.park.msu.ru>,
        Matt Turner <mattst88@...il.com>,
        alpha <linux-alpha@...r.kernel.org>,
        Geert Uytterhoeven <geert@...ux-m68k.org>,
        linux-m68k <linux-m68k@...ts.linux-m68k.org>,
        Arnd Bergmann <arnd@...nel.org>,
        Ley Foon Tan <ley.foon.tan@...el.com>,
        Tejun Heo <tj@...nel.org>, Kees Cook <keescook@...omium.org>,
        Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
Subject: Re: Kernel stack read with PTRACE_EVENT_EXIT and io_uring threads

Hi Linus,

On 22/06/21 11:14 am, Linus Torvalds wrote:
> On Mon, Jun 21, 2021 at 12:45 PM Al Viro <viro@...iv.linux.org.uk> wrote:
>>> Looks like sys_exit() and do_group_exit() would be the two places to
>>> do it (do_group_exit() would handle the signal case and
>>> sys_group_exit()).
>> Maybe...  I'm digging through that pile right now, will follow up when
>> I get a reasonably complete picture
> We might have another possible way to solve this:
>
>   (a) make it the rule that everybody always saves the full (integer)
> register set in pt_regs
>
>   (b) make m68k just always create that switch-stack for all system
> calls (it's really not that big, I think it's like six words or
> something)

Correct - six words for registers, one for the return address. Probably 
still a win compared to setting and clearing flag bits all over the 
place in an attempt to catch any as yet undetected unsafe cases of 
ptrace_stop.

I'll have to see how much of a performance impact I can see (not that I 
can even remotely measure that accurately - it's more of a 'does it now 
feel real sluggish' thing).

Cheers,

     Michael

>
>   (c) admit that alpha is broken, but nobody really cares
>
>> In the meanwhile, do kernel/kthread.c uses look even remotely sane?
>> Intentional - sure, but it really looks wrong to use thread exit code
>> as communication channel there...
> I really doubt that it is even "intentional".
>
> I think it's "use some errno as a random exit code" and nobody ever
> really thought about it, or thought about how that doesn't really
> work. People are used to the error numbers, not thinking about how
> do_exit() doesn't take an error number, but a signal number (and an
> 8-bit positive error code in bits 8-15).
>
> Because no, it's not even remotely sane.
>
> I think the do_exit(-EINTR) could be do_exit(SIGINT) and it would make
> more sense. And the -ENOMEM might be SIGBUS, perhaps.
>
> It does look like the usermode-helper code does save the exit code
> with things like
>
>                  kernel_wait(pid, &sub_info->retval);
>
> and I see call_usermodehelper_exec() doing
>
>          retval = sub_info->retval;
>
> and treating it as an error code. But I think those have never been
> tested with that (bogus) exit code thing from kernel_wait(), because
> it wouldn't have worked.  It has only ever been tested with the (real)
> exit code things like
>
>                  if (pid < 0) {
>                          sub_info->retval = pid;
>
> which does actually assign a negative error code to it.
>
> So I think that
>
>                  kernel_wait(pid, &sub_info->retval);
>
> line is buggy, and should be something like
>
>                  int wstatus;
>                  kernel_wait(pid, &wstatus);
>                  sub_info->retval = WEXITSTATUS(wstatus) ? -EINVAL : 0;
>
> or something.
>
>              Linus