linux-kernel - Re: printk badness with VMAP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0196764b-c2a6-cce6-7b82-5a4895f18365@redhat.com>
Date:   Wed, 26 Oct 2016 17:04:02 -0700
From:   Laura Abbott <labbott@...hat.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Andy Lutomirski <luto@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Kees Cook <keescook@...omium.org>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: printk badness with VMAP_STACK

On 10/26/2016 04:34 PM, Linus Torvalds wrote:
> On Wed, Oct 26, 2016 at 3:55 PM, Laura Abbott <labbott@...hat.com> wrote:
>>
>> I was playing around with overflowing stacks and I managed to generate a
>> test
>> case that hung the kernel with vmapped stacks. The test case is just
>>
>> static void noinline foo1(void)
>> {
>>        pr_info("%p\n", (void *)current_stack_pointer());
>>        foo2();
>> }
>>
>> where foo$n is the same function with the name changed. I'm super
>> creative. I have a couple thousand of these for testing with the final
>> one doing a WARN. The kernel eventually hangs in printk on logbuf_lock
>
> So just to get this right - your test-case is intentionally doing that
> mutually recursive thing with foo1/foo2 calling each other until they
> run out of stack?

No, it's 1000 functions of foo1, foo2 up to foo1000. The idea is
to have a very deep stack that would eventually terminate if stack
space was infinite.

>
> And yes, occasionally the stack will run out while in the middle of
> "printk()", and then when we take a fault, we'll be screwed.
>
> Note that we do *not* guarantee that "printk()" works in all contexts,
> so it might not really be considered a bug. It's very much a "best
> effort", but the scheduler and timekeeping, for example, uses
> "printk_deferred()" exactly because one of the contexts where printk()
> does *not* work is when you hold the rq lock.
>
> And the reason for *that* is that printk() ends up relying on a few
> different locks:
>
>  - logbuf_lock, obviously.
>  - console_sem for actual output
>  - cond_resched() requires rq->lock
>
> And we do have some hacks on place - the recursive printk test
> (logbuf_cpu, as you note) and oops_in_progress and that "zap_locks()".
>
> But zap_locks only zaps logbuf_lock and console_sem, for example.
>
> If you run out of stack somewhere in the middle of the scheduler when
> the "cond_resched()" case of printk triggers, and we hold "rq->lock"
> when the double fault occurs, the machine *will* be dead. It will
> still try to print things out (thanks to that zap_locks thing), but
> rq->lock will be wrong, and nothing will ever recover.
>
> And it _sounds_ like that's the case you hit.
>

A similar one at least.

> Basically, zap_locks and the other printk "try to at least print
> things out" can only handle so much.
>
>              Linus
>

I wonder if the addition of hardening features will lead to an actual
increase in problems for printk since things will be faulting more
often instead of chugging along. Or maybe it will just be limited
to developers trying to purposely break things.

Anyway, thanks for the explanation about the limitations of printk
for future reference.

Laura