lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 04 Mar 2018 23:08:23 +0800
From:   "Qixuan.Wu" <qixuan.wu@...ux.alibaba.com>
To:     "linux-kernel-owner" <linux-kernel-owner@...r.kernel.org>
Cc:     "Petr Mladek" <pmladek@...e.com>, "Jan Kara" <jack@...e.cz>,
        "Steven Rostedt" <rostedt@...dmis.org>,
        "linux-kernel" <linux-kernel@...r.kernel.org>,
        "Sergey Senozhatsky" <sergey.senozhatsky@...il.com>,
        "chenggang.qin" <chenggang.qin@...ux.alibaba.com>,
        "caijingxian" <caijingxian@...ux.alibaba.com>,
        "yuanliang.wyl" <yuanliang.wyl@...baba-inc.com>
Subject: Re: Would you help to tell why async printk solution was not taken to upstream kernel ?

Hi Sergey, 

Thank you for your fast reply. 
 
On (03/04/18 21:02), Sergey Senozhatsky wrote:

> On (03/04/18 20:10), Qixuan.Wu wrote:
>>    Hi Sergey, petr, and Jan,
>>      I find you wrote a patch set of "[PATCH v12 0/3] printk: Make printk()
>>    completely async"(https://lkml.org/lkml/2016/5/13/275), and many people
>>    have reviewd. But I did not see them be taken to upstream kernel. Would
>>    you please help to tell me the reason ? Is it just only because of the
>>    LOG_CONT scenario (4th patch) ?
> 
> Hello,
> 
>   Thanks for your email, we desperately need more feedback from
>  people who are facing printk() related issues. While, certainly, I'm not
>  happy to hear that printk() causes troubles on your side.

>   Regarding the async printk patch set. It's still "work in
>  progress", and probably will take some time (due to various reasons,
>  LOG_CONT is not one of them).

It's fine. People know the prink is important and used in the kernel at many 
many place, and it's difficult to cover all the scenario, so it's predictable there 
are some places to be improved. 
For async printk patch set, would you help to know when they can be finished.
I think it should be very useful to avoid softlockup or RCU stall.  

>   Yes. 4.16 has Steven's patch which tweaks printk() in a very smart
>  way and addresses some of the issues printk() has. If you can't test 4.16
>  (quite possible), then the commits you'd want to take a look at are
> (Linus's tree):
>  dbdda842fe96f89  printk: Add console owner and waiter logic to load balance console writes
>  c162d5b4338d72d  printk: Hide console waiter logic into helpers
>  fd5f7cde1b85d4c  printk: Never set console_may_schedule in console_trylock()
>  c14376de3a1befa  printk: Wake klogd when passing console_lock owner

Thank you for your suggested solution wrote by Steven. I looked through it, the 
thought is good. I think it can mitigate 99.999% the softlockup problem in the 
scenario.  But I have a comment for it, actually maybe it's not correct. 

Suppose there is one scenario that the system has 100 CPU(0~99). While CPU 0 is 
calling slow console, CPU 1~99 are calling printk at the same time. And suppose 
CPU 1 will be waiter, as per the patch, 2~99 will return directly. After CPU 0 finish 
it's log to console, it will return when it finds CPU 1 are waiting. Then CPU 1 need 
flush all logs of CPU(1~99) to the console, which may cause  softlockup or rcu 
stall. Above scenario is very unusual and it's very unlikely to happen. 

> If you can backport those, test and tell us about your experience - would be
> great and very much appreciated.

Anyway the code in 4.16 is also very useful to the problem. We will think over to 
try to backport. If any other problem occur, will inform you again. 

Thanks & Regards
Qixuan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ