[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <46EABE19.27660.2EE1940D@localhost>
Date: Fri, 14 Sep 2007 17:00:09 +0200
From: "Frantisek Rysanek" <Frantisek.Rysanek@...t.cz>
To: linux-kernel@...r.kernel.org
Subject: Re: [newbie:] Bonnie++2 hangs recent 2.6 kernels? Bash keeps looping in waitpid(), eating 100% CPU
Dear Mr. Piggin,
thanks for your response in the first place :-)
On 13 Sep 2007 at 2:30, Nick Piggin wrote:
>
> Can you see if it is looping in userspace or kernel? Can you kill -9
> the process?
>
I can't run any command. Any command hangs or coredumps.
> Are you able to test with the latest 2.6.23-rc kernel? If not (or if it
> still has the same problem), then can you get the output of sysrq+T
> and three sysrq+P calls, please? (this might help work out where in
> kernel it is spinning).
>
I've compiled 2.6.23-rc6, enabled serial console and captured
the output of sysrq+P (on the affected virtual VGA console)
and sysrq+T.
http://www.fccps.cz/download/adv/frr/bonnie/2.6.23-rc6.txt
The interesting bit of information, related to the erratic "bash"
processes, is always a single line, such as:
bash R running 0 2358 1
I've also taken a photo of `top` running
on another virtual console. I can't get any data out of the
affected box, as I can't run any shell commands...
http://www.fccps.cz/download/adv/frr/bonnie/top.jpg
Note that there are rather few processes running in the user space.
Can't say if that makes any difference from a full-blown distro.
Maybe I could set up the bootable CD for download somewhere
(gzipped ISO of maybe 50 Megs).
In this scenario, Linux 2.6.16.18 once reported a soft lockup.
http://www.fccps.cz/download/adv/frr/bonnie/soft-lockup1.txt
Never again.
I also managed to catch the misbehavior in strace once, didn't
get a capture, but essentially it was stuck at a single open
syscall, I believe it was "waitpid(1, " . (Never managed that again,
always got segfaults instead of the loopy bash when trying to watch
bash by strace -p).
Exactly where does the context switch from user to kernel take place?
I know that I can call ioctl() from user space, and I can write
ioctl() handlers in kernel space as part of device drivers (the
handlers take place entirely in kernel space). The waitpid()
thing is a syscall, being entered only once from user space
- and the bash process seems to keep looping inside it.
Does the single "running" line in Alt+SysRq+T mean that the
process is looping in user space?
Take a look at the CPU consumption % numbers though...
Note that there's no OOM killer. (Seen that one before, under
different circumstances - when OCFS2 didn't like machines
with less than 1 GB RAM.)
My impression is that the erratic behavior could be a secondary
symptom of a kernel-space memory leak taking place somewhere else
than in the loopy code itself. Can't say if the leak takes place in
memory management or EXT3 for instance...
Frank Rysanek
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists