lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 29 Jan 2008 23:30:11 +0100
From:	"Guillaume Chazarain" <guichaz@...il.com>
To:	LKML <linux-kernel@...r.kernel.org>
Subject: Hang in work_resched

Hi,

In recent git (0ba6c33bcddc64a54b5f1c25a696c4767dc76292) I experience
a hard to reproduce problem where a single process hangs in
work_resched. The method I use so far to reproduce it is to launch
mplayer in a gnome-terminal (Fedora 8,i386) and wait for mplayer to
freeze when it cannot write into gnome-terminal, the latter being
frozen in work_resched.

sysrq-t gives valid backtraces for all tasks, but only a single line
for gnome-terminal:

 =======================
gnome-termina S 00000027     0  2201      1
       f6711fb0 00200082 cb330d62 00000027 f664105c 00000b1e 00000000 cb331880
       00000027 f660d780 009e3840 080ab7d8 080ab298 f6711000 c0103e7e 009e3840
       000e0002 00000002 080ab7d8 080ab298 bfb41be8 080ab7d8 0000007b c010007b
Call Trace:
 [<c0103e7e>] work_resched+0x5/0x16
 =======================

This corresponds to the cli instruction:
c0103e7e:       fa                      cli

When hung, stracing or killing gnome-terminal wakes it up, but not
kill -CONT without a previous kill -STOP.

While being straced gnome-terminal seems to never hang, and a strace
used to unfreeze it starts with:

23:06:53.182000 gettimeofday({1201644413, 182184}, NULL) = 0 <0.000401>
23:06:53.182644 poll([{fd=3, events=POLLIN, revents=POLLIN}, {fd=4,
events=POLLIN}, {fd=8, events=POLLIN|POLLPRI}, {fd=11,
events=POLLIN|POLLPRI}, {fd=12, events=POLLIN|POLLPRI}, {fd=14,
events=POLLIN|POLLPRI}, {fd=15, events=POLLIN|POLLPRI}, {fd=16,
events=POLLIN|POLLPRI}, {fd=18, events=POLLIN|POLLPRI}, {fd=17,
events=POLLIN|POLLPRI}, {fd=13, events=POLLIN|POLLPRI}], 11, 0) = 1
<0.000028>
23:06:53.182840 read(3,
"\34\326\324\20\\\0\0\0W\1\0\0&~\226\307\0\1\0\0\2\0\0\0\0\0\0\0\\\0\0\0"...,
4096) = 4096 <0.000166>
23:06:53.183504 read(3,
"\6\0\324\20K\206\226\307\\\0\0\0008\v`\2\0\0\0\0\311\2\232\1\n\0R\1\0\0\1\0"...,
4096) = 3616 <0.000168>
23:06:53.184073 read(3, 0x80ab868, 4096) = -1 EAGAIN (Resource
temporarily unavailable) <0.000022>
23:06:53.184188 read(3, 0x80ab868, 4096) = -1 EAGAIN (Resource
temporarily unavailable) <0.000021>
23:06:53.184280 read(3, 0x80ab868, 4096) = -1 EAGAIN (Resource
temporarily unavailable) <0.000020>
23:06:53.184370 read(3, 0x80ab868, 4096) = -1 EAGAIN (Resource
temporarily unavailable) <0.000021>
23:06:53.184460 read(3, 0x80ab868, 4096) = -1 EAGAIN (Resource
temporarily unavailable) <0.000020>
23:06:53.184570 read(3, 0x80ab868, 4096) = -1 EAGAIN (Resource
temporarily unavailable) <0.000022>
23:06:53.184661 read(3, 0x80ab868, 4096) = -1 EAGAIN (Resource
temporarily unavailable) <0.000021>
23:06:53.184751 read(3, 0x80ab868, 4096) = -1 EAGAIN (Resource
temporarily unavailable) <0.000021>
23:06:53.184840 read(3, 0x80ab868, 4096) = -1 EAGAIN (Resource
temporarily unavailable) <0.000651>
23:06:53.185610 read(3, 0x80ab868, 4096) = -1 EAGAIN (Resource
temporarily unavailable) <0.000021>
23:06:53.185707 read(3, 0x80ab868, 4096) = -1 EAGAIN (Resource
temporarily unavailable) <0.000021>
23:06:53.185797 read(3, 0x80ab868, 4096) = -1 EAGAIN (Resource
temporarily unavailable) <0.000021>
23:06:53.185888 read(3, 0x80ab868, 4096) = -1 EAGAIN (Resource
temporarily unavailable) <0.000021>
23:06:53.185977 read(3, 0x80ab868, 4096) = -1 EAGAIN (Resource
temporarily unavailable) <0.000021>
23:06:53.186067 read(3, 0x80ab868, 4096) = -1 EAGAIN (Resource
temporarily unavailable) <0.000022>
23:06:53.186158 read(3, 0x80ab868, 4096) = -1 EAGAIN (Resource
temporarily unavailable) <0.000021>
23:06:53.186247 read(3, 0x80ab868, 4096) = -1 EAGAIN (Resource
temporarily unavailable) <0.000021>
23:06:53.186337 read(3, 0x80ab868, 4096) = -1 EAGAIN (Resource
temporarily unavailable) <0.000021>
23:06:53.186426 read(3, 0x80ab868, 4096) = -1 EAGAIN (Resource
temporarily unavailable) <0.000021>
23:06:53.186516 read(3, 0x80ab868, 4096) = -1 EAGAIN (Resource
temporarily unavailable) <0.000021>
23:06:53.186627 read(3, 0x80ab868, 4096) = -1 EAGAIN (Resource
temporarily unavailable) <0.000020>
23:06:53.186717 read(3, 0x80ab868, 4096) = -1 EAGAIN (Resource
temporarily unavailable) <0.005377>
23:06:53.192208 read(3, 0x80ab868, 4096) = -1 EAGAIN (Resource
temporarily unavailable) <0.000022>
23:06:53.192310 read(3, 0x80ab868, 4096) = -1 EAGAIN (Resource
temporarily unavailable) <0.000021>
23:06:53.192401 read(3, 0x80ab868, 4096) = -1 EAGAIN (Resource
temporarily unavailable) <0.000021>
...
then back to normal


A normal strace of gnome-terminal never shows more than one
consecutive read()->EAGAIN, unlike here, after a hang in work_resched.

I think file descriptor 3 is the local connection to the X server, and
the hang happens in a poll() call.

I understand this is a bit vague, but maybe I'll get some me-too.

-- 
Guillaume

Download attachment "dmesg" of type "application/octet-stream" (21166 bytes)

Download attachment "config" of type "application/octet-stream" (55599 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ