linux-kernel - Re: [RFC] how to perform a safe NMI stack trace on all CPUs on x86?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <55548503.2050406@huawei.com>
Date:	Thu, 14 May 2015 19:20:35 +0800
From:	"long.wanglong" <long.wanglong@...wei.com>
To:	Jiri Kosina <jkosina@...e.cz>
CC:	王龙 <wanglong@...qinren.net>,
	rostedt <rostedt@...dmis.org>,
	paulmck <paulmck@...ux.vnet.ibm.com>, pmladek <pmladek@...e.cz>,
	dzickus <dzickus@...hat.com>,
	johannes <johannes@...solutions.net>, koct9i <koct9i@...il.com>,
	tglx <tglx@...utronix.de>, mingo <mingo@...hat.com>,
	hpa <hpa@...or.com>, x86 <x86@...nel.org>,
	atomlin <atomlin@...hat.com>, akpm <akpm@...ux-foundation.org>,
	"sasha.levin" <sasha.levin@...cle.com>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	peifeiyue <peifeiyue@...wei.com>,
	"morgan.wang" <morgan.wang@...wei.com>
Subject: Re: [RFC] how to perform a safe NMI stack trace on all CPUs on x86?

On 2015/5/13 22:26, Jiri Kosina wrote:
> On Wed, 13 May 2015, 王龙 wrote:
> 
>> Hi all,
>>
>> In kernel before 3.19, when trigger_all_cpu_backtrace() is called on x86, 
>> it will trigger an NMI on each CPU and call show_regs(). But this can lead
>> to a hard lock up if the NMI comes in on another printk().
>>
>> The commit a9edc88093287183ac934be44f295f183b2c62dd (x86/nmi: Perform a safe 
>> NMI stack trace on all CPUs) fix this problem on kernel mainline. when the NMI 
>> triggers, it switches the printk routine for that CPU to call a NMI safe printk 
>> function that records the printk in a per_cpu seq_buf descriptor. After all 
>> NMIs have finished recording its data, the seq_bufs are printed in a safe 
>> context. But how do we fix this problem in older version of kernel(eg, 3.10 stable)? 
>> The 3.10 stable has no "switch printk routine" and "seq_buf" infrastructures.
>>
>> Could anyone give me some ideas?
> 
> Either you backport seq_buf-based aproach to the older kernel, or, if you 
> are working on 3.4 kernel or earlier (basically any kernel preceeding the 
> printk() revamp that happened in 7ff9554bb57 and after), you can use 
> slightly simpler aproach.
> 
> It's an aproach we used initially when finding out the issue for the first 
> time, and it is proven to work as well (but it's not applicable after Kay 
> added all the complexity to printk()).
> 
> You can see it in our SLE11 kernel tree, available on
> 	
> 	http://kernel.suse.com/cgit/kernel/commit/?h=SLE11-SP4&id=8d62ae68ff61d77ae3c4899f05dbd9c9742b14c9
> 
> for example.
> 
> It's up to you to judget which is the least painful way :)
> 

Hi Jiri Kosina,

For 3.10 stable, the only way to solve this problem is backport seq_buf-based aproach.

I will backport necessary patches to 3.10 stable. Welcome you to review my backport patches.

Best Regards
Wang Long




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/