linux-kernel - RE: [PATCH] Fix the race between smp_call

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <27240C0AC20F114CBF8149A2696CBE4A05A6BF@SHSMSX101.ccr.corp.intel.com>
Date:	Tue, 20 Mar 2012 00:22:28 +0000
From:	"Liu, Chuansheng" <chuansheng.liu@...el.com>
To:	Peter Zijlstra <peterz@...radead.org>
CC:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Yanmin Zhang <yanmin_zhang@...ux.intel.com>,
	"tglx@...utronix.de" <tglx@...utronix.de>,
	"Liu, Chuansheng" <chuansheng.liu@...el.com>
Subject: RE: [PATCH] Fix the race between smp_call_function and CPU booting



> -----Original Message-----
> From: Peter Zijlstra [mailto:peterz@...radead.org]
> Sent: Monday, March 19, 2012 6:03 PM
> To: Liu, Chuansheng
> Cc: linux-kernel@...r.kernel.org; Yanmin Zhang; tglx@...utronix.de
> Subject: RE: [PATCH] Fix the race between smp_call_function and CPU booting
> 
> On Mon, 2012-03-19 at 00:58 +0000, Liu, Chuansheng wrote:
> > Your patch advance the setting active bit before online setting, that
> > will cause an warning error,
> 
> WHY!?
I have done the stress tests based on your patches. The following warning error is very easy to
be reproduced, paste my result again. Thanks to give some time to have a look.

I did a stress test that starting two different scripts concurrently:
1/ onoff_line script like below:
while true
do
 echo 0 > /sys/devices/system/cpu/cpu1/online
 echo 1 > /sys/devices/system/cpu/cpu1/online
done
2/ Adding a simple sys interface to trigger calling smp_call_function:
test_set()
{
  smp_call_function(...);
}

The script is writing the interface to trigger the calling in loop every 500ms;


The result is:
1/ without any patch, the deadlock issue is very easy to be reproduced;

2/ With your patch http://lkml.org/lkml/2011/12/15/255, the below issue is always found, and the system is hanging there.
I think it is because the booted CPU1 is set to active too early and the online do not be set yet.
[  721.759736] cpu_down
[  721.822193] LCS test smp_call_function [  721.864892] CPU 1 is now offline [  721.868270] SMP alternatives: switching to UP code [  721.886925] _cpu_up [  721.892222] SMP alternatives: switching to SMP code [  721.906420] Booting Node 0 Processor 1 APIC 0x1 [  721.921177] Initializing CPU#1 [  721.981898] ------------[ cut here ]------------ [  721.989553] WARNING: at /root/r3_ics/hardware/intel/linux-2.6/arch/x86/kernel/smp.c:118 native_smp_send_reschedule+0x50/0x60()
[  722.000923] Hardware name: Medfield
[  722.004401] Modules linked in: atomisp lm3554 mt9m114 mt9e013 videobuf_vmalloc videobuf_core mac80211 cfg80211 compat btwilink st_drv [  722.016408] Pid: 18865, comm: workqueue_trust Not tainted 3.0.8-137166-g2639a16-dirty #1 
[  722.024486] Call Trace: 
[  722.026939]  [<c1252287>] warn_slowpath_common+0x77/0x130 
[  722.032321][<c121df70>] ? native_smp_send_reschedule+0x50/0x60
[  722.038314]  [<c121df70>] ? native_smp_send_reschedule+0x50/0x60
[  722.044316]  [<c1252362>] warn_slowpath_null+0x22/0x30 
[  722.049445]  [<c121df70>] native_smp_send_reschedule+0x50/0x60
[  722.055268]  [<c124bacf>] try_to_wake_up+0x17f/0x390 
[  722.060225]  [<c124bd34>] wake_up_process+0x14/0x20 
[  722.065091]  [<c1277107>] kthread_stop+0x37/0x100 
[  722.069789]  [<c126f5e0>] destroy_worker+0x50/0x90 
[  722.074573]  [<c18c1b4d>] trustee_thread+0x3e3/0x4bf 
[  722.079524]  [<c1277410>] ? wake_up_bit+0x90/0x90 
[  722.084224]  [<c18c176a>] ? wait_trustee_state+0x91/0x91 
[  722.089520]  [<c1276fc4>] kthread+0x74/0x80 [  722.093694]  
[<c1276f50>] ? __init_kthread_worker+0x30/0x30 [  722.099264]  
[<c18c7cfa>] kernel_thread_helper+0x6/0x10 [  722.104474] 
---[ end trace fa5bcc15ece677c6 ]---

3/ With my patch, the system kept there for 1 hour ,did not find issue yet.
I will keep the stress test running for a long long time;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/