linux-kernel - Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <519B292F.5020603@linux.vnet.ibm.com>
Date:	Tue, 21 May 2013 15:58:39 +0800
From:	Michael Wang <wangyun@...ux.vnet.ibm.com>
To:	Borislav Petkov <bp@...en8.de>
CC:	Viresh Kumar <viresh.kumar@...aro.org>, Tejun Heo <tj@...nel.org>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Jiri Kosina <jkosina@...e.cz>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Tony Luck <tony.luck@...el.com>, linux-kernel@...r.kernel.org,
	x86@...nel.org, Thomas Gleixner <tglx@...utronix.de>, rjw@...k.pl,
	cpufreq@...r.kernel.org, linux-pm@...r.kernel.org
Subject: Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule,
 round 2

On 05/21/2013 03:21 PM, Borislav Petkov wrote:
> On Tue, May 21, 2013 at 10:20:51AM +0800, Michael Wang wrote:
>> This is not enough to prove that policy->cpus is wrong, the cpu could
>> be online when get from policy->cpus, but offline when checked here,
>> since hotplug is able to happen during the period.
> 
> Strictly speaking you're correct but I don't do any hotplug besides the
> one-time thing which is part of halting the box.

Well, they share the same cpu_down() I suppose...

> 
>> I don't get it...
>>
>> get_online_cpus() is just stop hotplug happen after it was invoked, so
>> unless policy->cpus is really wrong, otherwise all the cpu it masked
>> won't go offline any more.
> 
> Yes, that's my impression too - at the point we do gov_queue_work,
> policy->cpus already contains offline cpus.
> 
>> This protect nothing...before we go here, the cpu could already
>> offline, nothing changed...
> 
> Yes, but I don't want to schedule work on an offlined cpu and that is
> ensured here.

IMHO, the problem seems mostly like the wrong usage of policy->cpus,
it's providing the right info, but just at that time, we don't need
worry about work on offlined cpu if we don't allow cpu disappear.

Your approach could be good respect to performance, but if we could
prove that policy->cpus is correct firstly, than we could fix the
problem without any concern, don't we?

> 
>> If you really want to confirm the policy->cpus was wrong, the way
>> should be apply the fix I suggested, than check online in here.
> 
> Sure, feel free to get a box, enable NO_HZ_FULL and do all the
> experimentations you desire. I surely cannot be the only one who
> triggers this.

I'm fine if the problem get solved, that means your box doesn't show
WARN any more :)

Regards,
Michael Wang

> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/