lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <5266B5F2.5070102@hp.com>
Date:	Tue, 22 Oct 2013 13:29:22 -0400
From:	Don Morris <don.morris@...com>
To:	LKML <linux-kernel@...r.kernel.org>
Subject: BUG: soft lockup - CPU#8 stuck for 22s!

Greetings, all.

Just wanted to drop this out there to see if it rang any bells.
I've been getting a soft lockup (numad thread stuck on a cpu
while attempting to attach a task to a cgroup) for a while now,
but I thought it was only happening when I applied Mel Gorman's
set of AutoNUMA patches. Today, however, it happened on a stock
3.12rc3 kernel as well, so it is in the baseline. And before
anyone asks, I wanted to make sure directed numa activities
such as numad would do interacted safely with the AutoNUMA
stuff so that's why I was running with both enabled.

I believe this started in the 3.11 timeframe (and I'll try to
bisect to narrow things down).

The problem/reproduction environment is:
	+ Centos 6.4
	/* The next three lines are to get numad running */
	+ mkdir /cgroup/cpuset
	+ mount cgroup -t cgroup -o cpuset /cgroup/cpuset
	+ service numad start
	+ loop running the AutoNUMA tests available at:
	git://gitorious.org/autonuma-benchmark/autonuma-benchmark.git

How long it takes to hit this varies -- since it looks like it
is not due to Mel's changes at all, a stress test for cgroup
interactions would likely kick it faster (anyone care to point
me at one?).

/var/log/messages output attached, trimmed to just one boot+instance
of the problem.

Oct 22 11:05:10 hornet2 kernel: BUG: soft lockup - CPU#8 stuck for 22s!
[numad:27384]
Oct 22 11:05:10 hornet2 kernel: Modules linked in: ebtable_nat ebtables
xt_CHECKSUM iptable_mangle bridge autofs4 sunrpc 8021q garp stp llc
ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables
ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack
ip6table_filter ip6_tables ipv6 ext2 vhost_net macvtap macvlan vhost tun
kvm_intel kvm uinput hp_wmi sparse_keymap rfkill snd_usb_audio
snd_usbmidi_lib snd_rawmidi acpi_cpufreq freq_table iTCO_wdt
iTCO_vendor_support sg microcode serio_raw pcspkr sb_edac edac_core wmi
i2c_i801 lpc_ich mfd_core xhci_hcd e1000e ptp pps_core ioatdma dca
snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec
snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore
snd_page_alloc ext4 jbd2 mbcache sr_mod cdrom sd_mod crc_t10dif
crct10dif_common firewire_ohci firewire_core crc_itu_t ahci libahci
pata_acpi ata_generic isci libsas scsi_transport_sas radeon ttm
drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log
dm_mod
Oct 22 11:05:10 hornet2 kernel: CPU: 8 PID: 27384 Comm: numad Not
tainted 3.12.0-rc3+ #1
Oct 22 11:05:10 hornet2 kernel: Hardware name: Hewlett-Packard HP Z620
Workstation/158A, BIOS J61 v03.15 05/09/2013
Oct 22 11:05:10 hornet2 kernel: task: ffff88070e9c60c0 ti:
ffff88070e520000 task.ti: ffff88070e520000
Oct 22 11:05:10 hornet2 kernel: RIP: 0010:[<ffffffff8154256c>]
[<ffffffff8154256c>] _raw_read_lock+0xc/0x20
Oct 22 11:05:10 hornet2 kernel: RSP: 0018:ffff88070e521cc8  EFLAGS: 00000217
Oct 22 11:05:10 hornet2 kernel: RAX: 0000000000000000 RBX:
ffffffff81117b52 RCX: ffff880c073ca6e8
Oct 22 11:05:10 hornet2 kernel: RDX: ffff880c12a1f040 RSI:
ffff88070c264000 RDI: ffffffff81a46cc8
Oct 22 11:05:10 hornet2 kernel: RBP: ffff88070e521cc8 R08:
ffff880c2fc55ba0 R09: ffff880c030fa000
Oct 22 11:05:10 hornet2 kernel: R10: f4f0000000000000 R11:
f000000000000000 R12: 0000000000000d0f
Oct 22 11:05:10 hornet2 kernel: R13: ffff88070e521ce8 R14:
0000000000000030 R15: ffff88070e520000
Oct 22 11:05:10 hornet2 kernel: FS:  00007f5cf7023700(0000)
GS:ffff880c2fc40000(0000) knlGS:0000000000000000
Oct 22 11:05:10 hornet2 kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Oct 22 11:05:10 hornet2 kernel: CR2: ffffffffff600400 CR3:
000000070f357000 CR4: 00000000000407e0
Oct 22 11:05:10 hornet2 kernel: Stack:
Oct 22 11:05:10 hornet2 kernel: ffff88070e521ce8 ffffffff810c0ee9
ffff880c0f5c9020 0000000000000000
Oct 22 11:05:10 hornet2 kernel: ffff88070e521da8 ffffffff810c4896
0000000000000030 ffff88070e521d28
Oct 22 11:05:10 hornet2 kernel: 000000005266940e 0000000000000000
ffff88070c264000 ffff880600000001
Oct 22 11:05:10 hornet2 kernel: Call Trace:
Oct 22 11:05:10 hornet2 kernel: [<ffffffff810c0ee9>]
task_cgroup_from_root+0x29/0xa0
Oct 22 11:05:10 hornet2 kernel: [<ffffffff810c4896>]
cgroup_attach_task+0xe6/0x3b0
Oct 22 11:05:10 hornet2 kernel: [<ffffffff810c4ccf>]
attach_task_by_pid+0x16f/0x1b0
Oct 22 11:05:10 hornet2 kernel: [<ffffffff810c4d26>]
cgroup_tasks_write+0x16/0x20
Oct 22 11:05:10 hornet2 kernel: [<ffffffff810c1b3c>]
cgroup_write_X64+0xec/0x150
Oct 22 11:05:10 hornet2 kernel: [<ffffffff81210cb3>] ?
security_file_permission+0x23/0x90
Oct 22 11:05:10 hornet2 kernel: [<ffffffff810c4338>]
cgroup_file_write+0x58/0xc0
Oct 22 11:05:10 hornet2 kernel: [<ffffffff81178683>] ?
file_start_write+0x33/0x40
Oct 22 11:05:10 hornet2 kernel: [<ffffffff81178c68>] vfs_write+0xc8/0x170
Oct 22 11:05:10 hornet2 kernel: [<ffffffff8117927f>] SyS_write+0x5f/0xb0
Oct 22 11:05:10 hornet2 kernel: [<ffffffff8154af92>]
system_call_fastpath+0x16/0x1b
Oct 22 11:05:10 hornet2 kernel: Code: 17 b8 01 00 00 00 ff ca 78 05 c9
c3 0f 1f 00 f0 ff 07 30 c0 c9 c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5
66 66 66 66 90 f0 ff 0f <79> 05 e8 bd f0 d2 ff c9 c3 66 66 2e 0f 1f 84
00 00 00 00 00 55

Thanks in advance for any input/interest.
Don Morris

View attachment "cgroup_hang_trimmed" of type "text/plain" (359432 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ