[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <53981100.7030404@huawei.com>
Date: Wed, 11 Jun 2014 16:19:12 +0800
From: Weng Meiling <wengmeiling.weng@...wei.com>
To: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC: <jack@...e.cz>, <akpm@...ux-foundation.org>,
<adilger.kernel@...ger.ca>, Jens Axboe <axboe@...nel.dk>,
Li Zefan <lizefan@...wei.com>,
Huang Qiang <h.huangqiang@...wei.com>,
Zhao Hongjiang <zhaohongjiang@...wei.com>
Subject: [linux 3.4 question] reboot command stall when vdbench test
Hi guys,
We run vdbench test in our suse system with kernel 3.4, the vdbench test is about different
block size seq and rand read/write. Before the vdbench test, we had did some test about: disk
message lookup, raid rebuild(note we use hard raid: SAS2008 RAID).
we used nohup to run the vdbench test script:
#nohup ./vdbench_batch_test &
During test, we cat the result file:
#cat nohup.out
at this time, the cat command stalled, then try to reboot, but the system
didn't reboot, and the reboot also stalled, shutdown gone to uninterruptible
sleep:
root 21716 0.0 0.0 4276 556 ? D 18:31 0:00 cat nohup.out
root 21726 0.0 0.0 17880 2876 ? Ds 18:33 0:00 -bash
root 21868 0.0 0.0 8224 740 ? D 19:03 0:00 shutdown -r 0 w
root 21892 0.0 0.0 17880 2884 ? Ds 19:11 0:00 -bash
root 21967 0.0 0.0 8224 740 ? D 19:19 0:00 shutdown -r 0 w
root 21970 0.0 0.0 86044 3680 ? Ss 19:19 0:00 sshd: root@.../4
root 21975 0.0 0.0 17880 2880 pts/4 Ss 19:19 0:00 -bash
root 22000 0.0 0.0 12932 1280 pts/4 T 19:20 0:00 top
after several hours the system gone to dead, all the ssh connect stalled, we can't connect
to this server any more. The status kept for a week, finally we had to reboot the system
by power key. After system reboot, we done the same steps to try to reproduce the problem for
more than a month, but it didn't happen again.
We had analysed the code and lock information according the call trace, also review linux 3.4+
mainline patch to find similar problem fix, but no result.
Many others met the similar problem because use SAN/NFS/multipath devices, but we don't use none of these.
The attachments are our test program and dmesg information we get by sysrq before system dead.
Does anyone met the problem before? Any suggestion is appreciative. Thanks!
Download attachment "vdbench.rar" of type "application/octet-stream" (14694 bytes)
View attachment "log_09140514-2-org.log" of type "text/plain" (280389 bytes)
Powered by blists - more mailing lists