linux-kernel - Re: [GIT pull] sched/core for v5.16-rc1

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YYD5ti23DQUjdQdz@hirez.programming.kicks-ass.net>
Date:   Tue, 2 Nov 2021 09:41:26 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Kees Cook <keescook@...omium.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        the arch/x86 maintainers <x86@...nel.org>,
        Mark Rutland <mark.rutland@....com>
Subject: Re: [GIT pull] sched/core for v5.16-rc1

On Mon, Nov 01, 2021 at 02:27:49PM -0700, Linus Torvalds wrote:
> On Mon, Nov 1, 2021 at 2:01 PM Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
> >
> > Unwinders that need locks because they can do bad things if they are
> > working on unstable data are EVIL and WRONG.
> 
> Note that this is fundamental: if you can fool an unwider to do
> something bad just because the data isn't stable, then the unwinder is
> truly horrendously buggy, and not usable.

>From what I've been led to believe, quite a few of our arch unwinders
seem to fall in that category. They're mostly only happy when unwinding
self and don't have many guardrails on otherwise.

> It could be a user process doing bad things to the user stack frame
> from another thread when profiling is enabled.

Most of the unwinders seem to only care about the kernel stack. Not the
user stack.

> It could be debug code unwinding without locks for random reasons.
> 
> So I really don't like "take a lock for unwinding". It's a pretty bad
> bug if the lock required.

Fair enough; te x86 unwinder is pretty robust in this regard, but it
seems to be one of few :/

> The "Link" in the commit also is entirely useless, pointing back to
> the emailed submission of the patch, rather than any useful discussion
> about why the patch happened.

So the initial discussion started here:

  https://lkml.kernel.org/r/20210923233105.4045080-1-keescook@chromium.org

A later thread that might also be of interest is:

  https://lkml.kernel.org/r/YWgyy+KvNLQ7eMIV@shell.armlinux.org.uk

Also, an even later thread proposes to push that lock into more stack
unwinding functions (anything doing remote unwinds):

  https://lkml.kernel.org/r/20211022150933.883959987@infradead.org

But it seems to be you're thinking that's fundamentally buggered and
people should instead invest in fixing their unwinders already.

Now, as is, this stuff is user exposed through /proc/$pid/{wchan,stack}
and as such I think it *can* do with a few extra guardrails in generic
code. OTOH, /proc/$pid/stack is root only.

Also, the remote stack-trace code is hooked into bpf (because
kitchen-sink) and while I didn't look too hard, I can imagine it could
be used to trigger crashes on our less robust architectures if prodded
just right.

Should I care about all this from a generic code PoV, or simply let the
architectures that got it 'wrong' deal with it?