lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-Id: <201101231306.23069.Martin@lichtvoll.de>
Date:	Sun, 23 Jan 2011 13:06:15 +0100
From:	Martin Steigerwald <Martin@...htvoll.de>
To:	linux-kernel@...r.kernel.org, linux-btrfs@...r.kernel.org
Subject: [regression?] hang with 2.6.37 on a BTRFS test machine

Not reported on bugzilla.kernel.org yet as its so vague.


Hi!

I have something I do not have the slightest clue on how to track down:

Occasionally with 2.6.37 plain vanilla my ThinkPad T23 freezes in such a 
way, that when I type a command at the prompt, nothing will happen 
anymore. Same goes for login into a tty. Even dmesg is not executed 
anymore. My KDE session doesn't react to clicks anymore, except for window 
movement. Most of the time the mouse pointer is still movable, but 
sometimes it is frozen as well. But the latter might be a different problem 
with X.org.

My theory is that at least I/O related kernel functions do not return 
anymore at that time.

Nothing is in the logs after a reboot.

Problem now is: This happens rather irregularily. That kernel run for days 
without problems. But today two times it hung after the first time it hung. 
One time it was on TTY 1 instead of in X.org. So when this happens it 
seems to persist a while.

But I now booted with 2.6.36 and the machine is as stable as if nothing 
has happened at all.

OTOH a 2.6.37 with recursive mtime patches from Jan Kara - that hung on 
the T23 as well - runs just fine on my T42 for nine days (minus hibernation 
times) already.

Now it could be something hardware related, the T23 is different from the 
T42. Also the T23 has a fan that should be replaced or oiled - but it 
still works right up to now. Also with 2.6.36 or earlier kernels the 
machine never hung that way so far.

Or it could be something BTRFS related. The T23 is using BTRFS on /home 
for months already and for / since beginning this year or end of last year 
while the T42 is still with Ext4 - until there is a robust fsck for BTRFS 
and BTRFS is marked stable in the kernel at last.

A bisection is too much effort for me - especially as it could take days 
till the problems shows up. Some sysrq stuff doesn't work without serial or 
netconsole since a call to dmesg does not return anymore when it happens. 
Having netconsole and serial running for days for just in case it happens 
is also quite some effort, since I have to remember to reinitialize it 
after each hibernation cycle to have a permanent setup.

Since I have no other idea on how to track this with reasonable effort - I 
spent tons of time with trying to help debugging various, partly 
intermittent, kernel problems in the last months and I have enough of that 
ATM - I will stick to 2.6.36 for the time being and see whether the T23 is 
really stable with that one. I still wanted to report my - admittedly 
vague - findings in case it rings a bell for someone. I will monitor 
2.6.37.y updates in case they contain BTRFS and or T23 hardware related 
fixes.

Last but not least: I have no seen BTRFS related errors or other oddities 
in syslog. The following looks quite normal to me, I did a btrfs filesystem 
balance some time ago and had some hard switch off cold reboots due to the 
issue mentioned:

artin@...pdance:~> zgrep -i btrfs /var/log/syslog*
/var/log/syslog.1:Jan 17 20:43:06 localhost kernel: Btrfs loaded
/var/log/syslog.1:Jan 23 12:01:12 localhost kernel: Btrfs loaded
/var/log/syslog.1:Jan 23 12:01:12 localhost kernel: btrfs: unlinked 3 
orphans
/var/log/syslog.1:Jan 23 12:09:36 localhost kernel: Btrfs loaded
/var/log/syslog.1:Jan 23 12:09:36 localhost kernel: btrfs: unlinked 2 
orphans
/var/log/syslog.1:Jan 23 12:20:52 localhost kernel: Btrfs loaded
/var/log/syslog.1:Jan 23 12:20:52 localhost kernel: btrfs: unlinked 18 
orphans
/var/log/syslog.2.gz:Jan 14 12:32:16 localhost kernel: Btrfs loaded
/var/log/syslog.2.gz:Jan 14 12:32:16 localhost kernel: btrfs: unlinked 116 
orphans
/var/log/syslog.2.gz:Jan 14 12:40:02 localhost kernel: Btrfs loaded
/var/log/syslog.2.gz:Jan 14 14:44:28 localhost kernel: Btrfs loaded
/var/log/syslog.2.gz:Jan 16 18:14:06 localhost kernel: Btrfs loaded
/var/log/syslog.2.gz:Jan 16 18:14:06 localhost kernel: btrfs: unlinked 2 
orphans
/var/log/syslog.2.gz:Jan 16 18:26:46 localhost kernel: Btrfs loaded
/var/log/syslog.2.gz:Jan 16 18:26:46 localhost kernel: btrfs: unlinked 2 
orphans
/var/log/syslog.2.gz:Jan 16 18:41:11 localhost kernel: Btrfs loaded
/var/log/syslog.2.gz:Jan 16 18:41:11 localhost kernel: btrfs: unlinked 3 
orphans
/var/log/syslog.2.gz:Jan 16 18:53:47 localhost kernel: Btrfs loaded
/var/log/syslog.2.gz:Jan 16 18:53:47 localhost kernel: btrfs: unlinked 2 
orphans
/var/log/syslog.2.gz:Jan 16 19:00:14 localhost kernel: Btrfs loaded
/var/log/syslog.2.gz:Jan 16 19:00:14 localhost kernel: btrfs: unlinked 2 
orphans
/var/log/syslog.2.gz:Jan 16 19:05:53 localhost kernel: Btrfs loaded
/var/log/syslog.2.gz:Jan 16 19:05:54 localhost kernel: btrfs: unlinked 2 
orphans
/var/log/syslog.7.gz:Dec  9 19:05:38 localhost kernel: Btrfs loaded
/var/log/syslog.7.gz:Dec  9 19:05:38 localhost kernel: btrfs: unlinked 2 
orphans
/var/log/syslog.7.gz:Dec  9 19:05:38 localhost kernel: btrfs: truncated 1 
orphans
/var/log/syslog.7.gz:Dec  9 20:54:10 localhost kernel: btrfs: relocating 
block group 10900996096 flags 36
/var/log/syslog.7.gz:Dec  9 20:55:02 localhost kernel: btrfs: found 2036 
extents
/var/log/syslog.7.gz:Dec  9 20:55:02 localhost kernel: btrfs: relocating 
block group 10766778368 flags 36
/var/log/syslog.7.gz:Dec  9 20:56:15 localhost kernel: btrfs: found 10432 
extents
/var/log/syslog.7.gz:Dec  9 20:56:16 localhost kernel: btrfs: relocating 
block group 9693036544 flags 1
/var/log/syslog.7.gz:Dec  9 20:56:26 localhost kernel: btrfs: found 155 
extents
/var/log/syslog.7.gz:Dec  9 20:56:31 localhost kernel: btrfs: found 154 
extents
/var/log/syslog.7.gz:Dec  9 20:56:31 localhost kernel: btrfs: relocating 
block group 8619294720 flags 1
/var/log/syslog.7.gz:Dec  9 20:57:37 localhost kernel: btrfs: found 5027 
extents
/var/log/syslog.7.gz:Dec  9 20:57:49 localhost kernel: btrfs: found 5027 
extents
/var/log/syslog.7.gz:Dec  9 20:57:51 localhost kernel: btrfs: relocating 
block group 7545552896 flags 1
/var/log/syslog.7.gz:Dec  9 20:59:48 localhost kernel: btrfs: found 12851 
extents
/var/log/syslog.7.gz:Dec  9 21:00:28 localhost kernel: btrfs: found 12846 
extents
/var/log/syslog.7.gz:Dec  9 21:00:29 localhost kernel: btrfs: relocating 
block group 6471811072 flags 1
/var/log/syslog.7.gz:Dec  9 21:01:59 localhost kernel: btrfs: found 46780 
extents
/var/log/syslog.7.gz:Dec  9 21:03:21 localhost kernel: btrfs: found 46780 
extents
/var/log/syslog.7.gz:Dec  9 21:03:26 localhost kernel: btrfs: relocating 
block group 5398069248 flags 1
/var/log/syslog.7.gz:Dec  9 21:04:34 localhost kernel: btrfs: found 19425 
extents
/var/log/syslog.7.gz:Dec  9 21:05:15 localhost kernel: btrfs: found 19425 
extents
/var/log/syslog.7.gz:Dec  9 21:05:16 localhost kernel: btrfs: relocating 
block group 4324327424 flags 1
/var/log/syslog.7.gz:Dec  9 21:06:33 localhost kernel: btrfs: found 26059 
extents
/var/log/syslog.7.gz:Dec  9 21:07:22 localhost kernel: btrfs: found 26059 
extents
/var/log/syslog.7.gz:Dec  9 21:07:25 localhost kernel: btrfs: relocating 
block group 3250585600 flags 1
/var/log/syslog.7.gz:Dec  9 21:08:47 localhost kernel: btrfs: found 15614 
extents
/var/log/syslog.7.gz:Dec  9 21:09:29 localhost kernel: btrfs: found 15614 
extents
/var/log/syslog.7.gz:Dec  9 21:09:31 localhost kernel: btrfs: relocating 
block group 2176843776 flags 1
/var/log/syslog.7.gz:Dec  9 21:10:58 localhost kernel: btrfs: found 4275 
extents
/var/log/syslog.7.gz:Dec  9 21:11:34 localhost kernel: btrfs: found 4275 
extents
/var/log/syslog.7.gz:Dec  9 21:11:35 localhost kernel: btrfs: relocating 
block group 1103101952 flags 1
/var/log/syslog.7.gz:Dec  9 21:13:01 localhost kernel: btrfs: found 7296 
extents
/var/log/syslog.7.gz:Dec  9 21:13:35 localhost kernel: btrfs: found 7296 
extents
/var/log/syslog.7.gz:Dec  9 21:13:36 localhost kernel: btrfs: relocating 
block group 29360128 flags 36
/var/log/syslog.7.gz:Dec  9 21:16:53 localhost kernel: btrfs: found 78308 
extents
/var/log/syslog.7.gz:Dec  9 21:16:54 localhost kernel: btrfs: relocating 
block group 20971520 flags 34
/var/log/syslog.7.gz:Dec  9 21:16:54 localhost kernel: btrfs: found 1 
extents
/var/log/syslog.7.gz:Dec  9 21:16:56 localhost kernel: btrfs: relocating 
block group 12582912 flags 1
/var/log/syslog.7.gz:Dec  9 21:16:57 localhost kernel: btrfs: found 85 
extents
/var/log/syslog.7.gz:Dec  9 21:16:59 localhost kernel: btrfs: found 85 
extents
/var/log/syslog.7.gz:Dec  9 21:17:01 localhost kernel: btrfs: relocating 
block group 4194304 flags 4

Hmmm, one thing there is:

Jan 13 13:27:15 localhost kernel: INFO: task dpkg:21162 blocked for more 
than 120 seconds.
Jan 13 13:27:15 localhost kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 13 13:27:15 localhost kernel: dpkg          D ef891b0c     0 21162   
2334 0x00000000
Jan 13 13:27:15 localhost kernel: d2e4be7c 00000082 ef8918cc ef891b0c 
ef8918a0 ef891b0c ef891b10 1b8b88c5
Jan 13 13:27:15 localhost kernel: 00018796 ef8918a0 7fffffff d2e4bf4c 7fffffff 
d2e4bee4 c13c4495 cb7862ac
Jan 13 13:27:15 localhost kernel: d2e4be98 c102ab56 cb7862ac c1514fa0 
d2e4beb4 c102d740 ee303140 d2e4bec0
Jan 13 13:27:15 localhost kernel: Call Trace:
Jan 13 13:27:15 localhost kernel: [<c13c4495>] 
schedule_timeout+0x175/0x280
Jan 13 13:27:15 localhost kernel: [<c102ab56>] ? 
__dequeue_entity+0x26/0x50
Jan 13 13:27:15 localhost kernel: [<c102d740>] ? set_next_entity+0xc0/0xe0
Jan 13 13:27:15 localhost kernel: [<c10017b3>] ? __switch_to+0xa3/0x160
Jan 13 13:27:15 localhost kernel: [<c1030132>] ? T.1277+0x32/0xa0
Jan 13 13:27:15 localhost kernel: [<c13c394c>] ? schedule+0x1fc/0x5d0
Jan 13 13:27:15 localhost kernel: [<c13c4006>] wait_for_common+0xa6/0x160
Jan 13 13:27:15 localhost kernel: [<c10316a0>] ? 
default_wake_function+0x0/0x20
Jan 13 13:27:15 localhost kernel: [<c13c4197>] 
wait_for_completion+0x17/0x20
Jan 13 13:27:15 localhost kernel: [<c10f8d52>] sync_inodes_sb+0x92/0x1a0
Jan 13 13:27:15 localhost kernel: [<c111f230>] ? 
dquot_quota_sync+0x0/0x310
Jan 13 13:27:15 localhost kernel: [<c10fd4f7>] __sync_filesystem+0x87/0xa0
Jan 13 13:27:15 localhost kernel: [<c10fd52f>] sync_one_sb+0x1f/0x30
Jan 13 13:27:15 localhost kernel: [<c10d9e6b>] iterate_supers+0x7b/0xf0
Jan 13 13:27:15 localhost kernel: [<c10fd510>] ? sync_one_sb+0x0/0x30
Jan 13 13:27:15 localhost kernel: [<c10fd57c>] sys_sync+0x3c/0x60
Jan 13 13:27:15 localhost kernel: [<c1002f23>] sysenter_do_call+0x12/0x2c
Jan 13 13:33:38 localhost init: Trying to re-exec init

There is only one occurance of it:

martin@...pdance:~> zgrep -i "blocked" /var/log/syslog*
/var/log/syslog.2.gz:Jan 13 13:27:15 localhost kernel: INFO: task 
dpkg:21162 blocked for more than 120 seconds.
martin@...pdance:~#1>

I put the logs aside in case anyone wants to have a closer look.

Thanks,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

Download attachment "signature.asc " of type "application/pgp-signature" (199 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ