linux-kernel - Re: [PATCH] printk: stop spining waiter when console resume to flush prb

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <46df3838-e1a6-ee95-b398-bef0896d2b03@huawei.com>
Date:   Mon, 10 May 2021 16:29:11 +0800
From:   luojiaxing <luojiaxing@...wei.com>
To:     Petr Mladek <pmladek@...e.com>,
        Sergey Senozhatsky <senozhatsky@...omium.org>
CC:     <sergey.senozhatsky@...il.com>, <rostedt@...dmis.org>,
        <john.ogness@...utronix.de>, <linux-kernel@...r.kernel.org>,
        <linuxarm@...wei.com>, <bobo.shaobowang@...wei.com>
Subject: Re: [PATCH] printk: stop spining waiter when console resume to flush
 prb


On 2021/5/8 0:13, Petr Mladek wrote:
> On Thu 2021-05-06 23:07:19, Sergey Senozhatsky wrote:
>> Thanks for Cc-ing Petr
>>
>> On (21/05/06 15:39), Petr Mladek wrote:
>>> Many printk messages might get accumulated when consoles were suspended.
>>> They are proceed when console_unlock() is called in resume_console().
>>>
>>> The possibility to pass the console lock owner was added to reduce the risk
>>> of softlockup when too many messages were handled in an atomic context.
>>>
>>> Now, resume_console() is always in a preemptible context that is safe
>>> to handle all accumulated messages. The possibility to pass the console
>>> lock owner actually makes things worse. The new owner might be in an atomic
>>> context and might cause softlockup when processing all messages accumulated
>>> when the console was suspended.
>>>
>>> Create new console_unlock_preemptible() that will not allow to pass
>>> the console lock owner. As a result, all accumulated messages will
>>> be proceed in the safe preemptible process.
>> If we have a lot of pending messages in the logbuf, then there is
>> something chatty - some context (task, irq) or maybe several contexts.
>> And those contexts can continue adding messages, while we print them
>> _exclusively_ from preemptible context only. without ever throttling down
>> printk() callers - something that console_owner spinning and handover
>> does for us. And those printk() callers can even preempt
>> console_unlock_preemptible() and cause delays and lost messages.
> Luo, please, correct me if I am wrong.


Hi, Petr, I reply the test result on your first reply for this patch.

Please check it when you are free.


>
> This patch a about one well defined scenario. The messages are
> accumulated between suspend_console() and resume_console().
> It is "small" part of the system hibernation. And we need
> to get them out now. There might be many if something special
> was debugged.
>
> I am pretty sure that Luo did not see any flood of messages:
>
>     + Flood in more contexts would be balanced by switching
>       the console_owner.
>
>     + Flood in one context would be naturally throttled because
>       this context will become the console_owner.
>
> In each case, these messages would be generated after
> console_resume(). Luo's original patch was explicitly talking
> about messages accumulated during the suspend.
>
> Luo, could you please provide some log showing the problem?


Sure, But, it is not easy to find that the printk got such problem in 
the user's logs.

So I'm attaching a log of the simulation test. You can clearly see that 
the driver thread calling dev_info() is blocked.


My test method is as follows:
Kernel thread A causes the console to enter suspend and then resume it 
10 seconds later.
Kernel thread B repeatedly invokes dev_info() for 15 seconds after the 
console suspend.


Part of dmesg log I save before:

[  288.013869] hisi_sas_v3_hw 0000:b4:02.0: print test
[  288.013870] hisi_sas_v3_hw 0000:b4:02.0: print test
[  288.013871] hisi_sas_v3_hw 0000:b4:02.0: print test
[  288.013872] hisi_sas_v3_hw 0000:b4:02.0: print test
[  288.013873] hisi_sas_v3_hw 0000:b4:02.0: print test
[  288.013874] hisi_sas_v3_hw 0000:b4:02.0: print test
[  288.013875] hisi_sas_v3_hw 0000:b4:02.0: print test
[  288.013876] hisi_sas_v3_hw 0000:b4:02.0: print test
[  288.013877] hisi_sas_v3_hw 0000:b4:02.0: print test
[  289.670256] hisi_sas_v3_hw 0000:b4:02.0: print test
[  299.286325] hisi_sas_v3_hw 0000:b4:02.0: print test
[  299.291198] hisi_sas_v3_hw 0000:b4:02.0: print test
[  299.296063] hisi_sas_v3_hw 0000:b4:02.0: print test
[  299.300924] hisi_sas_v3_hw 0000:b4:02.0: print test
[  299.305787] hisi_sas_v3_hw 0000:b4:02.0: print test

The log shows that when thread B invokes dev_info, dev_info is blocked 
for several seconds.


Thanks

Jiaxing


>
> Best Regards,
> Petr
>
> .
>