lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140516115737.GP11096@twins.programming.kicks-ass.net>
Date:	Fri, 16 May 2014 13:57:37 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Lai Jiangshan <laijs@...fujitsu.com>
Cc:	jjherne@...ux.vnet.ibm.com, Sasha Levin <sasha.levin@...cle.com>,
	Tejun Heo <tj@...nel.org>, LKML <linux-kernel@...r.kernel.org>,
	Dave Jones <davej@...hat.com>, Ingo Molnar <mingo@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Steven Rostedt <rostedt@...dmis.org>
Subject: Re: workqueue: WARN at at kernel/workqueue.c:2176

On Fri, May 16, 2014 at 11:50:42AM +0800, Lai Jiangshan wrote:
> Hi, Peter and other scheduler Gurus:
> 
> When I was trying to test wq-VS-hotplug, I always hit a problem in scheduler
> with the following WARNING:
> 
> [   74.765519] WARNING: CPU: 1 PID: 13 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x2d/0x4b()
> [   74.765520] Modules linked in: wq_hotplug(O) fuse cpufreq_ondemand ipv6 kvm_intel kvm uinput snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi e1000e snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer ptp iTCO_wdt iTCO_vendor_support lpc_ich snd mfd_core pps_core soundcore acpi_cpufreq i2c_i801 microcode wmi radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core
> [   74.765545] CPU: 1 PID: 13 Comm: migration/1 Tainted: G           O  3.15.0-rc3+ #153
> [   74.765546] Hardware name: LENOVO ThinkCentre M8200T/  , BIOS 5JKT51AUS 11/02/2010
> [   74.765547]  000000000000007c ffff880236199c88 ffffffff814d7d2c 0000000000000000
> [   74.765550]  0000000000000000 ffff880236199cc8 ffffffff8103add4 ffff880236199cb8
> [   74.765552]  ffffffff81023e1b ffff8802361861c0 0000000000000001 ffff88023fd92b40
> [   74.765555] Call Trace:
> [   74.765559]  [<ffffffff814d7d2c>] dump_stack+0x51/0x75
> [   74.765562]  [<ffffffff8103add4>] warn_slowpath_common+0x81/0x9b
> [   74.765564]  [<ffffffff81023e1b>] ? native_smp_send_reschedule+0x2d/0x4b
> [   74.765566]  [<ffffffff8103ae08>] warn_slowpath_null+0x1a/0x1c
> [   74.765568]  [<ffffffff81023e1b>] native_smp_send_reschedule+0x2d/0x4b
> [   74.765571]  [<ffffffff8105c2ea>] smp_send_reschedule+0xa/0xc
> [   74.765574]  [<ffffffff8105fe46>] resched_task+0x5e/0x62
> [   74.765576]  [<ffffffff81060238>] check_preempt_curr+0x43/0x77
> [   74.765578]  [<ffffffff81060680>] __migrate_task+0xda/0x100
> [   74.765580]  [<ffffffff810606a6>] ? __migrate_task+0x100/0x100
> [   74.765582]  [<ffffffff810606c3>] migration_cpu_stop+0x1d/0x22
> [   74.765585]  [<ffffffff810a33c6>] cpu_stopper_thread+0x84/0x116
> [   74.765587]  [<ffffffff814d8642>] ? __schedule+0x559/0x581
> [   74.765590]  [<ffffffff814dae3c>] ? _raw_spin_lock_irqsave+0x12/0x3c
> [   74.765592]  [<ffffffff8105bd75>] ? __smpboot_create_thread+0x109/0x109
> [   74.765594]  [<ffffffff8105bf46>] smpboot_thread_fn+0x1d1/0x1d6
> [   74.765598]  [<ffffffff81056665>] kthread+0xad/0xb5
> [   74.765600]  [<ffffffff810565b8>] ? kthread_freezable_should_stop+0x41/0x41
> [   74.765603]  [<ffffffff814e0e2c>] ret_from_fork+0x7c/0xb0
> [   74.765605]  [<ffffffff810565b8>] ? kthread_freezable_should_stop+0x41/0x41
> [   74.765607] ---[ end trace 662efb362b4e8ed0 ]---
> 
> After debugging, I found the hotlug-in cpu is atctive but !online in this case.
> the problem was introduced by 5fbd036b.
> Some code assumes that any cpu in cpu_active_mask is also online, but 5fbd036b breaks
> this assumption, so the corresponding code with this assumption should be changed too.
> 

This of course leaves the question how the workqueue code manages to
call set_cpu_allowed_ptr() on a cpu _before_ its online.

That too sounds fishy.. with the proposed patch the
set_cpus_allowed_ptr() will 'gracefully' fail, but calling it in the
first place is of course dubious too.

Content of type "application/pgp-signature" skipped

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ