linux-kernel - Re: [PATCH printk v5 1/1] printk: extend console

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87a6bwapij.fsf@jogness.linutronix.de>
Date:   Thu, 05 May 2022 00:48:28 +0206
From:   John Ogness <john.ogness@...utronix.de>
To:     Marek Szyprowski <m.szyprowski@...sung.com>,
        Petr Mladek <pmladek@...e.com>
Cc:     Sergey Senozhatsky <senozhatsky@...omium.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        linux-kernel@...r.kernel.org,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        linux-amlogic@...ts.infradead.org
Subject: Re: [PATCH printk v5 1/1] printk: extend console_lock for
 per-console locking

On 2022-05-04, John Ogness <john.ogness@...utronix.de> wrote:
> I can reproduce the apparent stack corruption with qemu:
>
> [    5.545268] task:pr/ttyAMA0      state:S stack:    0 pid:   26 ppid:     2 flags:0x00000008
> [    5.545520] Call trace:
> [    5.545620]  __switch_to+0x104/0x160
> [    5.545796]  __schedule+0x2f4/0x9f0
> [    5.546122]  schedule+0x54/0xd0
> [    5.546206]  0x0

I believe I am chasing a ghost. I can rather easily reproduce these
strange call traces, but if another sysrq-t is sent afterwards, the call
trace is OK. Also, I added trace_dump_stack() into the printk-kthread
main loop to dump the stack on every iteration. There I never see any
corruption, even though the timestamps are near the sysrq-t dump showing
corruption. Moving trace_dump_stack() into
amba-pl011:pl011_console_write() also showed no stack corruption at very
near times when sysrq-t did.

And it should be noted that the console-hanging issues reported in this
thread _cannot_ be reproduced with qemu.

So I will stop focussing on this "corrupt stack" thing and instead
investigate what the meson driver is doing that causes it to get
stuck. Since interrupts do not even fire, I'm guessing that the RX
interrupts are not being re-enabled (AML_UART_RX_INT_EN) for some code
path. This bit is only explicitly set once, in
meson_uart_startup(). Whenever the bit is cleared, later the previous
value is restored. This is assumed to mean the interrupt gets
re-enabled. But if there is some code path where multiple CPUs can
modify the register, then the interrupt could end up permanently
disabled.

I will go through and check if all access to AML_UART_CONTROL is
protected by port->lock.

John