linux-kernel - Re: [GIT pull] sched/core for v5.16-rc1

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20211103135249.GA38767@C02TD0UTHF1T.local>
Date:   Wed, 3 Nov 2021 13:52:49 +0000
From:   Mark Rutland <mark.rutland@....com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Kees Cook <keescook@...omium.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        the arch/x86 maintainers <x86@...nel.org>
Subject: Re: [GIT pull] sched/core for v5.16-rc1

On Tue, Nov 02, 2021 at 09:41:26AM +0100, Peter Zijlstra wrote:
> On Mon, Nov 01, 2021 at 02:27:49PM -0700, Linus Torvalds wrote:
> > On Mon, Nov 1, 2021 at 2:01 PM Linus Torvalds
> > <torvalds@...ux-foundation.org> wrote:
> > >
> > > Unwinders that need locks because they can do bad things if they are
> > > working on unstable data are EVIL and WRONG.
> > 
> > Note that this is fundamental: if you can fool an unwider to do
> > something bad just because the data isn't stable, then the unwinder is
> > truly horrendously buggy, and not usable.
> 
> From what I've been led to believe, quite a few of our arch unwinders
> seem to fall in that category. They're mostly only happy when unwinding
> self and don't have many guardrails on otherwise.
> 
> > It could be a user process doing bad things to the user stack frame
> > from another thread when profiling is enabled.
> 
> Most of the unwinders seem to only care about the kernel stack. Not the
> user stack.

Yup; there are usually separate unwinders for user/kernel, since there
are different constaints (and potentially different ABIs for unwinding).

> > It could be debug code unwinding without locks for random reasons.
> > 
> > So I really don't like "take a lock for unwinding". It's a pretty bad
> > bug if the lock required.
> 
> Fair enough; te x86 unwinder is pretty robust in this regard, but it
> seems to be one of few :/

FWIW, the arm64 kernel unwinder also shouldn't blow up (so long as the
target stack is pinned via try_get_stack() or similar).

However, depending on how the task reuses the stack, the results can be
entirely bogus rather than just stale, since data on the stack can look
like a kernel pointer (even if that's fairly unllikely). I'm happy to
believe that we don't care aobut that for wchan, but it's not something
I'd like to see spread.

> > The "Link" in the commit also is entirely useless, pointing back to
> > the emailed submission of the patch, rather than any useful discussion
> > about why the patch happened.
> 
> So the initial discussion started here:
> 
>   https://lkml.kernel.org/r/20210923233105.4045080-1-keescook@chromium.org
> 
> A later thread that might also be of interest is:
> 
>   https://lkml.kernel.org/r/YWgyy+KvNLQ7eMIV@shell.armlinux.org.uk
> 
> Also, an even later thread proposes to push that lock into more stack
> unwinding functions (anything doing remote unwinds):
> 
>   https://lkml.kernel.org/r/20211022150933.883959987@infradead.org
> 
> But it seems to be you're thinking that's fundamentally buggered and
> people should instead invest in fixing their unwinders already.
> 
> Now, as is, this stuff is user exposed through /proc/$pid/{wchan,stack}
> and as such I think it *can* do with a few extra guardrails in generic
> code. OTOH, /proc/$pid/stack is root only.
> 
> Also, the remote stack-trace code is hooked into bpf (because
> kitchen-sink) and while I didn't look too hard, I can imagine it could
> be used to trigger crashes on our less robust architectures if prodded
> just right.

I do worry that remote unwinds from BPF are just silently generating
junk, but it's not clear to me what they're actually used for and how
much that matters. I don't understand why a remote unwind is necessary
at all.

> Should I care about all this from a generic code PoV, or simply let the
> architectures that got it 'wrong' deal with it?

FWIW I'm happy either way. There are some upcoming improvements to the
arm64 unwinder that currently conflict and I need to know whether to
wait and rebase or assume that we take those first.

Thanks,
Mark.