lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0710212201190.3254@axis700.grange>
Date:	Sun, 21 Oct 2007 22:11:11 +0200 (CEST)
From:	Guennadi Liakhovetski <g.liakhovetski@....de>
To:	Ray Lee <ray-lk@...rabbit.org>
cc:	Jeff Garzik <jeff@...zik.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: ext3 deadlock or Re: [2.6.23] tasks stuck in running state?

On Fri, 19 Oct 2007, Ray Lee wrote:

> On 10/19/07, Jeff Garzik <jeff@...zik.org> wrote:
> > On my main devel box, vanilla 2.6.23 on x86-64/Fedora-7, I'm seeing a
> > certain behavior at least once a day.  I'll start a kernel build (make
> > -sj5 on this box), and it will "hang" in the following way:
> >
> > > 31003 ?        S      0:04 sshd: jgarzik@.../0
> > > 31004 pts/0    Ss     0:02  \_ -bash
> > >  8280 pts/0    S+     0:00      \_ make ARCH=i386 -sj4
> > >  8690 pts/0    Z+     0:00          \_ [rm] <defunct>
> > >  8691 pts/0    S+     0:00          \_ /bin/sh -c cat include/config/kernel.release 2> /dev/null
> > >  8692 pts/0    R+     6:12              \_ cat include/config/kernel.release
> >
> > Specifically, the symptom is a process, often a simple one like cat(1)
> > or rm(1) or somewhere in check-headers, will stay in the running state,
> > accumulating CPU time.
> >
> > If I Ctrl-C the build, and start over, the build will normally -not- get
> > stuck at the same point, but proceed to chew through one of a bazillion
> > allmodconfig builds.
> 
> I *think* I'm seeing this with firefox under 2.6.23-rc6. I tried a
> `killall -SIGSTOP firefox; killall -SIGCONT firefox` and when I looked
> back it was back to life again, but that may have been a fluke.
> Regardless, try that the next time it happens?

Don't know if that's the same problem as above, but a few minutes ago my 
mail-server locked down completely. First pine froze, then more processes 
started freezing, then the system became unusable, ssh logins got stuck, 
USB- and ps/2 keyboards. I managed to get a trace with the "w" sysrq:

SysRq : Show Blocked State
  task                PC stack   pid father
syslogd       D c01275a3     0  2818      1
       e8a1fe4c 00000086 e8a1fe4c c01275a3 f7e9a200 00000282 e8a1fe5c 00b5ce60
       c1b05cc0 e8a1fe7c c030c8e7 e8a1ff30 c8d553c4 c0407348 c1b58b18 00b5ce60
       c01277d0 c1b1ba90 c0407040 000000da d5f806c0 e8a1fe84 c030c974 e8a1feb4
Call Trace:
 [<c030c8e7>] schedule_timeout+0x47/0xc0
 [<c030c974>] schedule_timeout_uninterruptible+0x14/0x20
 [<c01b94cb>] journal_stop+0xcb/0x270
 [<c01ba1fd>] journal_force_commit+0x1d/0x30
 [<c01b2265>] ext3_force_commit+0x25/0x30
 [<c01ac7dc>] ext3_write_inode+0x2c/0x40
 [<c0183f8b>] __writeback_single_inode+0x30b/0x3e0
 [<c01849b4>] sync_inode+0x24/0x60
 [<c01a8d42>] ext3_sync_file+0xc2/0xd0
 [<c0187340>] do_fsync+0x60/0xa0
 [<c01873a8>] __do_fsync+0x28/0x40
 [<c01873ed>] sys_fsync+0xd/0x10
 [<c010424e>] sysenter_past_esp+0x5f/0x85
 =======================
pine          D c01275a3     0  6910   6243
       d1f55e4c 00200082 c03c0e40 c01275a3 f7e42c80 00200282 d1f55e5c 00b5ce60
       c1b05cc0 d1f55e7c c030c8e7 d1f55f30 c8d553d8 c1b58b18 c90a9e5c 00b5ce60
       c01277d0 d8ae8a90 c0407040 000000c3 d5f806c0 d1f55e84 c030c974 d1f55eb4
Call Trace:
 [<c030c8e7>] schedule_timeout+0x47/0xc0
 [<c030c974>] schedule_timeout_uninterruptible+0x14/0x20
 [<c01b94cb>] journal_stop+0xcb/0x270
 [<c01ba1fd>] journal_force_commit+0x1d/0x30
 [<c01b2265>] ext3_force_commit+0x25/0x30
 [<c01ac7dc>] ext3_write_inode+0x2c/0x40
 [<c0183f8b>] __writeback_single_inode+0x30b/0x3e0
 [<c01849b4>] sync_inode+0x24/0x60
 [<c01a8d42>] ext3_sync_file+0xc2/0xd0
 [<c0187340>] do_fsync+0x60/0xa0
 [<c01873a8>] __do_fsync+0x28/0x40
 [<c01873ed>] sys_fsync+0xd/0x10
 [<c010424e>] sysenter_past_esp+0x5f/0x85
 =======================
sendmail      D c01275a3     0  7448   7446
       c90a9e4c 00000082 c03c0e40 c01275a3 f7e00580 00000282 c90a9e5c 00b5ce60
       c1b05cc0 c90a9e7c c030c8e7 c90a9f30 c8d553ec d1f55e5c c0407348 00b5ce60
       c01277d0 d8ae8030 c0407040 0000004b d5f806c0 c90a9e84 c030c974 c90a9eb4
Call Trace:
 [<c030c8e7>] schedule_timeout+0x47/0xc0
 [<c030c974>] schedule_timeout_uninterruptible+0x14/0x20
 [<c01b94cb>] journal_stop+0xcb/0x270
 [<c01ba1fd>] journal_force_commit+0x1d/0x30
 [<c01b2265>] ext3_force_commit+0x25/0x30
 [<c01ac7dc>] ext3_write_inode+0x2c/0x40
 [<c0183f8b>] __writeback_single_inode+0x30b/0x3e0
 [<c01849b4>] sync_inode+0x24/0x60
 [<c01a8d42>] ext3_sync_file+0xc2/0xd0
 [<c0187340>] do_fsync+0x60/0xa0
 [<c01873a8>] __do_fsync+0x28/0x40
 [<c01873ed>] sys_fsync+0xd/0x10
 [<c010424e>] sysenter_past_esp+0x5f/0x85
 =======================

now you see why I wrote "ext3 deadlock." It's y VIA C7 system, running 
2.6.23-rc9-g804b3f9a, no problems since 5 October, when the kernel has 
been built. No Oops / warnings in dmesg. Or has this been fixed since 
23-rc9?

Thanks
Guennadi
---
Guennadi Liakhovetski
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ