linux-kernel - Re: [PATCH] RFC: console: hack up console

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190503151437.dc2ty2mnddabrz4r@pathway.suse.cz>
Date:   Fri, 3 May 2019 17:14:37 +0200
From:   Petr Mladek <pmladek@...e.com>
To:     Daniel Vetter <daniel.vetter@...ll.ch>
Cc:     DRI Development <dri-devel@...ts.freedesktop.org>,
        Intel Graphics Development <intel-gfx@...ts.freedesktop.org>,
        Daniel Vetter <daniel.vetter@...el.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Will Deacon <will.deacon@....com>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        John Ogness <john.ogness@...utronix.de>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] RFC: console: hack up console_trylock more

On Thu 2019-05-02 16:16:43, Daniel Vetter wrote:
> console_trylock, called from within printk, can be called from pretty
> much anywhere. Including try_to_wake_up. Note that this isn't common,
> usually the box is in pretty bad shape at that point already. But it
> really doesn't help when then lockdep jumps in and spams the logs,
> potentially obscuring the real backtrace we're really interested in.
> One case I've seen (slightly simplified backtrace):
> 
>  Call Trace:
>   <IRQ>
>   console_trylock+0xe/0x60
>   vprintk_emit+0xf1/0x320
>   printk+0x4d/0x69
>   __warn_printk+0x46/0x90
>   native_smp_send_reschedule+0x2f/0x40
>   check_preempt_curr+0x81/0xa0
>   ttwu_do_wakeup+0x14/0x220
>   try_to_wake_up+0x218/0x5f0
>   pollwake+0x6f/0x90
>   credit_entropy_bits+0x204/0x310
>   add_interrupt_randomness+0x18f/0x210
>   handle_irq+0x67/0x160
>   do_IRQ+0x5e/0x130
>   common_interrupt+0xf/0xf
>   </IRQ>
> 
> This alone isn't a problem, but the spinlock in the semaphore is also
> still held while waking up waiters (up() -> __up() -> try_to_wake_up()
> callchain), which then closes the runqueue vs. semaphore.lock loop,
> and upsets lockdep, which issues a circular locking splat to dmesg.
> Worse it upsets developers, since we don't want to spam dmesg with
> clutter when the machine is dying already.
>
> Fix this by creating a __down_trylock which only trylocks the
> semaphore.lock. This isn't correct in full generality, but good enough
> for console_lock:
> 
> - there's only ever one console_lock holder, we won't fail spuriously
>   because someone is doing a down() or up() while there's still room
>   (unlike other semaphores with count > 1).
>
> - console_unlock() has one massive retry loop, which will catch anyone
>   who races the trylock against the up(). This makes sure that no
>   printk lines will get lost. Making the trylock more racy therefore
>   has no further impact.

To be honest, I do not see how this could solve the problem.

The circular dependency is still there. If the new __down_trylock()
succeeds then console_unlock() will get called in the same context
and it will still need to call up() -> try_to_wake_up().

Note that there are many other console_lock() callers that might
happen in parallel and might appear in the wait queue.

Best Regards,
Petr