lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <4E550057.9070609@linux.vnet.ibm.com>
Date:	Wed, 24 Aug 2011 19:14:55 +0530
From:	"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>
To:	linux-kernel@...r.kernel.org
CC:	linux-fsdevel@...r.kernel.org, linux-pm@...ts.linux-foundation.org
Subject: [BUG] Soft-lockup during cpu-hotplug in VFS callpaths

Hi,

While running stressful cpu hotplug tests along with kernel compilation
running in background, soft-lockups are detected on multiple CPUs.
Sometimes this also leads to hard lockups and kernel panic.
All the soft-lockups seem to occur at vfsmount_lock_local_cpu() or other VFS
callpaths.


[37108.410813] BUG: soft lockup - CPU#5 stuck for 22s! [cc1:29669]
<snip>
[37108.694781] Call Trace:
[37108.697306]  [<ffffffff81199e70>] ? vfsmount_lock_local_lock_cpu+0x70/0x70
[37108.704258]  [<ffffffff81187cb5>] path_init+0x315/0x400
[37108.709558]  [<ffffffff8127c398>] ? __raw_spin_lock_init+0x38/0x70
[37108.715812]  [<ffffffff8118961c>] path_openat+0x8c/0x3f0
[37108.721203]  [<ffffffff81012129>] ? sched_clock+0x9/0x10
[37108.726597]  [<ffffffff8109416d>] ? sched_clock_cpu+0xcd/0x110
[37108.732508]  [<ffffffff810a178d>] ? trace_hardirqs_off+0xd/0x10
[37108.738498]  [<ffffffff8109421f>] ? local_clock+0x6f/0x80
[37108.743970]  [<ffffffff81189a99>] do_filp_open+0x49/0xa0
[37108.749362]  [<ffffffff811982f3>] ? alloc_fd+0xc3/0x210
[37108.754665]  [<ffffffff8152584b>] ? _raw_spin_unlock+0x2b/0x40
[37108.760575]  [<ffffffff811982f3>] ? alloc_fd+0xc3/0x210
[37108.765875]  [<ffffffff81179607>] do_sys_open+0x107/0x1e0
[37108.771352]  [<ffffffff810d610f>] ? audit_syscall_entry+0x1bf/0x1f0
[37108.777695]  [<ffffffff81179720>] sys_open+0x20/0x30
[37108.782741]  [<ffffffff8152e202>] system_call_fastpath+0x16/0x1b

Kernel version: 3.0.1, 3.0.3
Hardware: Dual socket quad-core hyper-threaded Intel x86 machine
Scenario:
(a) Stressful cpu hotplug tests + kernel compilation

(b) IRQ balancing had been disabled and all the IRQs  were made to be
    routed to CPU 0 (except the ones that couldn't be routed).

(c) Lockdep was enabled during kernel configuration.

Steps (b) and (c) were done to dig deeper into the issue. However the same
issue was observed by just doing step (a).

Definitely there seems to be a race condition occurring here, because this
issue is hit after sometime, after starting the tests. And the time it
takes to hit the issue increases as we increase the number of debug print
statements. In some cases (especially when the number of debug print
statements were quite high), the stress on the machine had to be increased
in order to hit the issue within measurable time. In my tests, a maximum
of about 2 to 2.5 hours was sufficient, to hit this bug.

Please find the console log attached with this mail.

Any ideas on how to go about fixing this bug?

-- 
Regards,
Srivatsa S. Bhat  <srivatsa.bhat@...ux.vnet.ibm.com>
Linux Technology Center,
IBM India Systems and Technology Lab

View attachment "soft-lockup_log.txt" of type "text/plain" (58725 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ