lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <45849b63-d7a8-5cc3-26ad-256a28d09991@samsung.com>
Date:   Tue, 3 May 2022 01:13:19 +0200
From:   Marek Szyprowski <m.szyprowski@...sung.com>
To:     Petr Mladek <pmladek@...e.com>
Cc:     John Ogness <john.ogness@...utronix.de>,
        Sergey Senozhatsky <senozhatsky@...omium.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        linux-kernel@...r.kernel.org,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        linux-amlogic@...ts.infradead.org
Subject: Re: [PATCH printk v5 1/1] printk: extend console_lock for
 per-console locking

Hi Petr,

On 02.05.2022 15:17, Petr Mladek wrote:
> On Mon 2022-05-02 11:19:07, Marek Szyprowski wrote:
>> On 30.04.2022 18:00, John Ogness wrote:
>>> On 2022-04-29, Marek Szyprowski <m.szyprowski@...sung.com> wrote:
>>>> The same issue happens if I boot with init=/bin/bash
>>> Very interesting. Since you are seeing all the output up until you try
>>> doing something, I guess receiving UART data is triggering the issue.
>> Right, this is how it looks like.
>>
>>>> I found something really interesting. When lockup happens, I'm still
>>>> able to log via ssh and trigger any magic sysrq action via
>>>> /proc/sysrq-trigger.
>>> If you boot the system and directly login via ssh (without sending any
>>> data via serial), can you trigger printk output on the UART? For
>>> example, with:
>>>
>>>       echo hello > /dev/kmsg
>>>
>>> (You might need to increase your loglevel to see it.)
>> Data written to /dev/kmsg and all kernel logs were always displayed
>> correctly. Also data written directly to /dev/ttyAML0 is displayed
>> properly on the console. The latter doesn't however trigger the input
>> related activity.
>>
>> It looks that the data read from the uart is delivered only if other
>> activity happens on the kernel console. If I type 'reboot' and press
>> enter, nothing happens immediately. If I type 'date >/dev/ttyAML0' via
>> ssh then, I only see the date printed on the console. However if I type
>> 'date >/dev/kmsg', the the date is printed and reboot happens.
> This is really interesting.
>
> 'date >/dev/kmsg' should be handled like a normal printk().
> It should get pushed to the console using printk kthread,
> that calls call_console_driver() that calls con->write()
> callback. In our case, it should be meson_serial_console_write().
>
> I am not sure if meson_serial_console_write() is used also
> when writing via /dev/ttyAML0.
>
>>>> It turned out that the UART console is somehow blocked, but it
>>>> receives and buffers all the input. For example after issuing "echo
>>>>    >/proc/sysrq-trigger" from the ssh console, the UART console has been
>>>> updated and I see the magic sysrq banner and then all the commands I
>>>> blindly typed in the UART console! However this doesn't unblock the
>>>> console.
>>> sysrq falls back to direct printing. This would imply that the kthread
>>> printer is somehow unable to print.
>>>
>>>> Here is the output of 't' magic sys request:
>>>>
>>>> https://protect2.fireeye.com/v1/url?k=8649f24d-e73258c4-86487902-74fe48600034-a2ca6bb18361467d&q=1&e=1bc4226f-a422-42b9-95e8-128845b8609f&u=https%3A%2F%2Fpastebin.com%2FfjbRuy4f
>>> It looks like the call trace for the printing kthread (pr/ttyAML0) is
>>> corrupt.
>> Right, good catch. For comparison, here is a 't' sysrq result from the
>> 'working' serial console (next-20220429), which happens usually 1 of 4
>> boots:
>>
>> https://protect2.fireeye.com/v1/url?k=610106f1-008a13b6-61008dbe-000babff99aa-11083c39c44861df&q=1&e=2eafad9e-c5d2-4696-9d78-f3b5885256dc&u=https%3A%2F%2Fpastebin.com%2Fmp8zGFbW
> Strange. The backtrace is weird here too:
>
> [   50.514509] task:pr/ttyAML0      state:R  running task     stack:    0 pid:   65 ppid:     2 flags:0x00000008
> [   50.514540] Call trace:
> [   50.514548]  __switch_to+0xe8/0x160
> [   50.514561]  meson_serial_console+0x78/0x118
>
> There should be kthread() and printk_kthread_func() on the stack.
>
> Hmm,  meson_serial_console+0x78/0x118 is weird on its own.
> meson_serial_console is the name of the structure. I would
> expect a name of the .write callback, like
> meson_serial_console_write()

Well, this sounds a bit like a stack corruption. For the reference, I've 
checked what magic sysrq 't' reports for my other test boards:

RaspberryPi4:

[  166.702431] task:pr/ttyS0        state:R stack:    0 pid:   64 
ppid:     2 flags:0x00000008
[  166.711069] Call trace:
[  166.713647]  __switch_to+0xe8/0x160
[  166.717216]  __schedule+0x2f4/0x9f0
[  166.720862]  log_wait+0x0/0x50
[  166.724081] task:vfio-irqfd-clea state:I stack:    0 pid:   65 
ppid:     2 flags:0x00000008
[  166.732698] Call trace:


ARM Juno R1:

[   74.356562] task:pr/ttyAMA0      state:R  running task stack:    0 
pid:   47 ppid:     2 flags:0x00000008
[   74.356605] Call trace:
[   74.356617]  __switch_to+0xe8/0x160
[   74.356637]  amba_console+0x78/0x118
[   74.356657] task:kworker/2:1     state:I stack:    0 pid:   48 
ppid:     2 flags:0x00000008
[   74.356695] Workqueue:  0x0 (mm_percpu_wq)
[   74.356738] Call trace:


QEMU virt/arm64:

[  174.155760] task:pr/ttyAMA0      state:S stack:    0 pid:   26 
ppid:     2 flags:0x00000008
[  174.156305] Call trace:
[  174.156529]  __switch_to+0xe8/0x160
[  174.157131]  0xffff5ebbbfdd62d8


In the last case it doesn't happen always. In the other runs I got the 
following log from QEMU virt/arm64:

[  200.537579] task:pr/ttyAMA0      state:S stack:    0 pid:   26 
ppid:     2 flags:0x00000008
[  200.538121] Call trace:
[  200.538341]  __switch_to+0xe8/0x160
[  200.538583]  __schedule+0x2f4/0x9f0
[  200.538822]  schedule+0x54/0xd0
[  200.539047]  printk_kthread_func+0x2d8/0x3bc
[  200.539301]  kthread+0x118/0x11c
[  200.539523]  ret_from_fork+0x10/0x20


I hope that at least the qemu case will let you to analyze it by 
yourself. I run my test system with the following command:

qemu-system-aarch64 -kernel virt/Image -append "console=ttyAMA0 
root=/dev/vda rootwait" -M virt -cpu cortex-a57 -smp 2 -m 1024 -device 
virtio-blk-device,drive=virtio-blk0 -drive 
file=qemu-virt-rootfs.raw,id=virtio-blk0,if=none,format=raw -netdev 
user,id=user -device virtio-net-device,netdev=user -display none


Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ