lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 20 Aug 2008 11:19:18 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Greg Donald <gdonald@...il.com>, linux-kernel@...r.kernel.org,
	Arjan van de Ven <arjan@...radead.org>
Subject: Re: INFO: task reiserfs/0:1322 blocked for more than 120 seconds


* Andrew Morton <akpm@...ux-foundation.org> wrote:

> On Sat, 16 Aug 2008 23:36:03 -0500 "Greg Donald" <gdonald@...il.com> wrote:
> 
> > I got this while rsync'ng an NFS share onto a local disk:
> > 
> > [42374.151062] INFO: task reiserfs/0:1322 blocked for more than 120 seconds.
> > [42374.186295] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [42374.229433] reiserfs/0    D c1f36180     0  1322      2
> > [42374.265246]        f5dbdedc 00000046 c1f36180 c1f36180 f5e932c0
> > 1c823428 00002669 f5e932c0
> > [42374.273706]        f5e93514 c1f36180 00000000 f5dbc000 f62cc780
> > f5e932c0 00000002 00000001
> > [42374.313709]        00000000 00000000 f5e932c0 c013cc01 00000246
> > f5dbded4 c013cbce e31e12ec
> > [42374.356837] Call Trace:
> > [42374.417842]  [<c013cc01>] ? trace_hardirqs_on+0xb/0xd
> > [42374.451201]  [<c013cbce>] ? trace_hardirqs_on_caller+0xe9/0x111
> > [42374.489735]  [<c02e876b>] mutex_lock_nested+0x14b/0x22b
> > [42374.525760]  [<c01c9727>] ? flush_commit_list+0x119/0x505
> > [42374.560839]  [<c01c9727>] flush_commit_list+0x119/0x505
> > [42374.594183]  [<c01cca8e>] flush_async_commits+0x41/0x4b
> > [42374.629770]  [<c012ec1a>] run_workqueue+0xc3/0x18e
> > [42374.662893]  [<c012ebfe>] ? run_workqueue+0xa7/0x18e
> > [42374.697814]  [<c01cca4d>] ? flush_async_commits+0x0/0x4b
> > [42374.732504]  [<c012f609>] ? worker_thread+0x0/0x8a
> > [42374.765765]  [<c012f688>] worker_thread+0x7f/0x8a
> > [42374.797749]  [<c0131d61>] ? autoremove_wake_function+0x0/0x38
> > [42374.833713]  [<c0131c93>] kthread+0x40/0x69
> > [42374.865772]  [<c0131c53>] ? kthread+0x0/0x69
> > [42374.897774]  [<c010392f>] kernel_thread_helper+0x7/0x10
> > [42374.929777]  =======================
> > [42374.957001] 3 locks held by reiserfs/0/1322:
> > [42374.990140]  #0:  (reiserfs){--..}, at: [<c012ebe1>] run_workqueue+0x8a/0x18e
> > [42375.025754]  #1:  (&(&journal->j_work)->work){--..}, at:
> > [<c012ebfe>] run_workqueue+0xa7/0x18e
> > [42375.062963]  #2:  (&jl->j_commit_mutex){--..}, at: [<c01c9727>]
> > flush_commit_list+0x119/0x505
> > 
> > 
> > I deleted a few GBs of data and ran it again but was unable to
> > reproduce it.  This was on 2.6.27-rc3.
> > 
> > I don't see any corruption.  Fluke?
> > 
> 
> Seems that about 100% of the reports we get of this warning triggering 
> are sys_sync, transaction commit, etc.
> 
> Does kerneloops.org disagree with me?
> 
> If not, I vote we kill it.

ok. How about quadrupling the timeout, as per the patch below?

more than 8 minutes uninterruptible wait, is that a reasonable limit?

I had this warning trigger a couple of times during development, 
alerting me to hung tasks.

	Ingo

------------------>
>From 3fb4198766c38aa03492cc3996475076073c22ea Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@...e.hu>
Date: Wed, 20 Aug 2008 11:17:40 +0200
Subject: [PATCH] softlockup: increase hung tasks check from 2 minutes to 8 minutes

Andrew says:

> Seems that about 100% of the reports we get of this warning triggering
> are sys_sync, transaction commit, etc.

increase the timeout. If it still triggers for people, we can kill it.

Signed-off-by: Ingo Molnar <mingo@...e.hu>
---
 kernel/softlockup.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/softlockup.c b/kernel/softlockup.c
index b75b492..17a0580 100644
--- a/kernel/softlockup.c
+++ b/kernel/softlockup.c
@@ -164,7 +164,7 @@ unsigned long __read_mostly sysctl_hung_task_check_count = 1024;
 /*
  * Zero means infinite timeout - no checking done:
  */
-unsigned long __read_mostly sysctl_hung_task_timeout_secs = 120;
+unsigned long __read_mostly sysctl_hung_task_timeout_secs = 480;
 
 unsigned long __read_mostly sysctl_hung_task_warnings = 10;
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ