[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200703281844.58698.m.kozlowski@tuxland.pl>
Date: Wed, 28 Mar 2007 18:44:57 +0200
From: Mariusz Kozłowski <m.kozlowski@...land.pl>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: linux-kernel@...r.kernel.org
Subject: Re: 2.6.21-rc5-mm1
Hello,
I run 2.6.21-rc4-mm1 with no hangs for a week.
Then when 2.6.21-rc5-mm1 showed up so I switched to it. Unfortunately
today my laptop hunged twice in a similar way as described here:
http://www.ussg.iu.edu/hypermail/linux/kernel/0703.0/index.html#1165
The difference is that it happened when I closed the lid in my laptop.
When reopend it the box was frozen (ACPI?). Again disk I/O was dead
so nothing was found in syslog.
I tried to reproduce it and capture something with netconsole.
I tortured the box for a few hours but the system did not hang. I pushed
the box real hard and what I got was only oom-killer firing etc ;-)
Anyway I found something else you might be interested in:
1) This happened when 'echo 3 > /proc/sys/vm/drop_caches' on really
busy system.
=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.21-rc5-mm1 #1
-------------------------------------------------------
bash/20633 is trying to acquire lock:
(&journal->j_list_lock){--..}, at: [<c01bb60e>] journal_try_to_free_buffers+0x151/0x1bc
but task is already holding lock:
(inode_lock){--..}, at: [<c0187d46>] drop_pagecache+0x58/0xf9
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (inode_lock){--..}:
[<c0132a0a>] __lock_acquire+0xe31/0xfe9
[<c0132c2b>] lock_acquire+0x69/0x83
[<c0406fb5>] _spin_lock+0x35/0x42
[<c0187734>] __mark_inode_dirty+0x4f/0x163
[<c01533c2>] __set_page_dirty_nobuffers+0x99/0xf5
[<c018b0db>] mark_buffer_dirty+0x1f/0x25
[<c01b8e3d>] __journal_temp_unlink_buffer+0x88/0x1a8
[<c01b9192>] __journal_unfile_buffer+0xb/0x15
[<c01b9289>] __journal_refile_buffer+0xed/0xef
[<c01bc57c>] journal_commit_transaction+0xd8b/0x127f
[<c01c033b>] kjournald+0xac/0x1ed
[<c0127a05>] kthread+0xa2/0xc9
[<c01042af>] kernel_thread_helper+0x7/0x18
[<ffffffff>] 0xffffffff
-> #0 (&journal->j_list_lock){--..}:
[<c0132873>] __lock_acquire+0xc9a/0xfe9
[<c0132c2b>] lock_acquire+0x69/0x83
[<c0406fb5>] _spin_lock+0x35/0x42
[<c01bb60e>] journal_try_to_free_buffers+0x151/0x1bc
[<c01ad9f3>] ext3_releasepage+0x3f/0x76
[<c014f6e4>] try_to_release_page+0x2f/0x4a
[<c0156588>] invalidate_mapping_pages+0xb6/0xee
[<c0187d94>] drop_pagecache+0xa6/0xf9
[<c0187e3b>] drop_caches_sysctl_handler+0x54/0x69
[<c01a2ccb>] proc_sys_write+0x80/0x8a
[<c016ce3d>] vfs_write+0x8b/0x11f
[<c016d371>] sys_write+0x3d/0x64
[<c0103f44>] sysenter_past_esp+0x5d/0x99
[<ffffffff>] 0xffffffff
other info that might help us debug this:
2 locks held by bash/20633:
#0: (&type->s_umount_key#16){----}, at: [<c0187d35>] drop_pagecache+0x47/0xf9
#1: (inode_lock){--..}, at: [<c0187d46>] drop_pagecache+0x58/0xf9
stack backtrace:
[<c0104614>] show_trace_log_lvl+0x1a/0x30
[<c01052c9>] show_trace+0x12/0x14
[<c0105355>] dump_stack+0x16/0x18
[<c01309f0>] print_circular_bug_tail+0x68/0x71
[<c0132873>] __lock_acquire+0xc9a/0xfe9
[<c0132c2b>] lock_acquire+0x69/0x83
[<c0406fb5>] _spin_lock+0x35/0x42
[<c01bb60e>] journal_try_to_free_buffers+0x151/0x1bc
[<c01ad9f3>] ext3_releasepage+0x3f/0x76
[<c014f6e4>] try_to_release_page+0x2f/0x4a
[<c0156588>] invalidate_mapping_pages+0xb6/0xee
[<c0187d94>] drop_pagecache+0xa6/0xf9
[<c0187e3b>] drop_caches_sysctl_handler+0x54/0x69
[<c01a2ccb>] proc_sys_write+0x80/0x8a
[<c016ce3d>] vfs_write+0x8b/0x11f
[<c016d371>] sys_write+0x3d/0x64
[<c0103f44>] sysenter_past_esp+0x5d/0x99
=======================
2) This was found a couple minutes later when the system was
really busy and close to oom condition.
INFO: lockdep is turned off.
BUG: soft lockup detected on CPU#0!
[<c0104614>] show_trace_log_lvl+0x1a/0x30
[<c01052c9>] show_trace+0x12/0x14
[<c0105355>] dump_stack+0x16/0x18
[<c01467a0>] softlockup_tick+0x81/0xa8
[<c011e4dc>] run_local_timers+0x12/0x14
[<c011e8dd>] update_process_times+0x2b/0x63
[<c012e4be>] tick_sched_timer+0x4d/0x9e
[<c012af00>] hrtimer_interrupt+0x12e/0x1a6
[<c0106f56>] timer_interrupt+0xe/0x15
[<c0146af3>] handle_IRQ_event+0x28/0x59
[<c01480a7>] handle_level_irq+0x6e/0xe7
[<c0105d3e>] do_IRQ+0x3d/0x7f
[<c01041b2>] common_interrupt+0x2e/0x34
[<c011afef>] do_softirq+0x4d/0x50
[<c011b263>] irq_exit+0x7e/0x80
[<c0105d43>] do_IRQ+0x42/0x7f
[<c01041b2>] common_interrupt+0x2e/0x34
[<c0178bf2>] core_sys_select+0x1c6/0x310
[<c0179101>] sys_select+0x39/0x18f
[<c0103f44>] sysenter_past_esp+0x5d/0x99
=======================
Clocksource tsc unstable (delta = 9372804176 ns)
Time: acpi_pm clocksource has been installed.
Please find .config attached. Not sure who to CC on this (as usual ;-)).
Regards,
Mariusz Kozlowski
View attachment ".config" of type "text/plain" (42516 bytes)
Powered by blists - more mailing lists