lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 14 Sep 2007 17:00:09 +0200
From:	"Frantisek Rysanek" <Frantisek.Rysanek@...t.cz>
To:	linux-kernel@...r.kernel.org
Subject: Re: [newbie:] Bonnie++2 hangs recent 2.6 kernels? Bash keeps looping in waitpid(), eating 100% CPU

Dear Mr. Piggin,

thanks for your response in the first place :-)

On 13 Sep 2007 at 2:30, Nick Piggin wrote:
>
> Can you see if it is looping in userspace or kernel? Can you kill -9
> the process?
> 
I can't run any command. Any command hangs or coredumps.

> Are you able to test with the latest 2.6.23-rc kernel? If not (or if it
> still has the same problem), then can you get the output of sysrq+T
> and three sysrq+P calls, please? (this might help work out where in
> kernel it is spinning).
>
I've compiled 2.6.23-rc6, enabled serial console and captured 
the output of sysrq+P (on the affected virtual VGA console)
and sysrq+T. 

http://www.fccps.cz/download/adv/frr/bonnie/2.6.23-rc6.txt

The interesting bit of information, related to the erratic "bash" 
processes, is always a single line, such as:

bash          R running      0  2358      1

I've also taken a photo of `top` running
on another virtual console. I can't get any data out of the
affected box, as I can't run any shell commands...

http://www.fccps.cz/download/adv/frr/bonnie/top.jpg

Note that there are rather few processes running in the user space.
Can't say if that makes any difference from a full-blown distro.

Maybe I could set up the bootable CD for download somewhere 
(gzipped ISO of maybe 50 Megs).

In this scenario, Linux 2.6.16.18 once reported a soft lockup.
http://www.fccps.cz/download/adv/frr/bonnie/soft-lockup1.txt
Never again.

I also managed to catch the misbehavior in strace once, didn't
get a capture, but essentially it was stuck at a single open
syscall, I believe it was "waitpid(1, " . (Never managed that again, 
always got segfaults instead of the loopy bash when trying to watch 
bash by strace -p). 

Exactly where does the context switch from user to kernel take place?
I know that I can call ioctl() from user space, and I can write 
ioctl() handlers in kernel space as part of device drivers (the 
handlers take place entirely in kernel space). The waitpid()
thing is a syscall, being entered only once from user space
- and the bash process seems to keep looping inside it.
Does the single "running" line in Alt+SysRq+T mean that the
process is looping in user space?
Take a look at the CPU consumption % numbers though...

Note that there's no OOM killer. (Seen that one before, under 
different circumstances - when OCFS2 didn't like machines
with less than 1 GB RAM.)

My impression is that the erratic behavior could be a secondary 
symptom of a kernel-space memory leak taking place somewhere else 
than in the loopy code itself. Can't say if the leak takes place in 
memory management or EXT3 for instance...

Frank Rysanek

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ