lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <53981100.7030404@huawei.com>
Date:	Wed, 11 Jun 2014 16:19:12 +0800
From:	Weng Meiling <wengmeiling.weng@...wei.com>
To:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC:	<jack@...e.cz>, <akpm@...ux-foundation.org>,
	<adilger.kernel@...ger.ca>, Jens Axboe <axboe@...nel.dk>,
	Li Zefan <lizefan@...wei.com>,
	Huang Qiang <h.huangqiang@...wei.com>,
	Zhao Hongjiang <zhaohongjiang@...wei.com>
Subject: [linux 3.4 question] reboot command stall when vdbench test

Hi guys,

We run vdbench test in our suse system with kernel 3.4, the vdbench test is about different
block size seq and rand read/write. Before the vdbench test, we had did some test about: disk
message lookup, raid rebuild(note we use hard raid: SAS2008 RAID).

we used nohup to run the vdbench test script:

#nohup ./vdbench_batch_test &

During test,  we cat the result file:

#cat nohup.out

at this time, the cat command stalled, then try to reboot, but the system
didn't reboot, and the reboot also stalled, shutdown gone to uninterruptible
sleep:

root     21716  0.0  0.0   4276   556 ?        D    18:31   0:00 cat nohup.out
root     21726  0.0  0.0  17880  2876 ?        Ds   18:33   0:00 -bash
root     21868  0.0  0.0   8224   740 ?        D    19:03   0:00 shutdown -r 0 w
root     21892  0.0  0.0  17880  2884 ?        Ds   19:11   0:00 -bash
root     21967  0.0  0.0   8224   740 ?        D    19:19   0:00 shutdown -r 0 w
root     21970  0.0  0.0  86044  3680 ?        Ss   19:19   0:00 sshd: root@.../4
root     21975  0.0  0.0  17880  2880 pts/4    Ss   19:19   0:00 -bash
root     22000  0.0  0.0  12932  1280 pts/4    T    19:20   0:00 top

after several hours the system gone to dead, all the ssh connect stalled, we can't connect
to this server any more. The status kept for a week, finally we had to reboot the system
by power key. After system reboot, we done the same steps to try to reproduce the problem for
more than a month, but it didn't happen again.

We had analysed the code and lock information according the call trace, also review linux 3.4+
mainline patch to find similar problem fix, but no result.

Many others met the similar problem because use SAN/NFS/multipath devices, but we don't use none of these.

The attachments are our test program and dmesg information we get by sysrq before system dead.
Does anyone met the problem before? Any suggestion is appreciative. Thanks!











Download attachment "vdbench.rar" of type "application/octet-stream" (14694 bytes)

View attachment "log_09140514-2-org.log" of type "text/plain" (280389 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ