linux-kernel - Re: BUG: soft lockup detected on CPU#0! in sys

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20061114095818.GA2541@zip.com.au>
Date:	Tue, 14 Nov 2006 20:58:18 +1100
From:	CaT <cat@....com.au>
To:	"Dennis J.A. Bijwaard" <dennis@...22032063.dsl.speedlinq.nl>
Cc:	Andrew Morton <akpm@...l.org>, bijwaard@...il.com, sct@...hat.com,
	adilger@...sterfs.com, linux-kernel@...r.kernel.org
Subject: Re: BUG: soft lockup detected on CPU#0! in sys_close and ext3

On Sun, Oct 15, 2006 at 11:58:55PM +0200, Dennis J.A. Bijwaard wrote:
> Hi Andrew,
> 
> Thanks for your reply. The machine has 512MB and some more swap:
> 
> Mem:    510960k total,   504876k used,     6084k free,     1868k buffers
> Swap:   674640k total,     2652k used,   671988k free,   354832k cached
> 
> Machine may be slow for current standards, it has 2 * 500Mhz

This looks like what I have come across yesterda, except my machine is
far from slow. It's a dual core woodcrest 5130 cpu (2Ghz) with 4gb of
ram. My call trace looks very similar and occurs when sda1 (for eg) is 
involved or md0 (raid0 in this case) is. It happens multiple times in 
each case (from memory).

Nov 13 09:46:04 localhost kernel: Call Trace:
Nov 13 09:46:04 localhost kernel:  [show_trace+185/628] show_trace+0xb9/0x274
Nov 13 09:46:04 localhost kernel:  [dump_stack+21/26] dump_stack+0x15/0x1a
Nov 13 09:46:04 localhost kernel:  [softlockup_tick+231/264] softlockup_tick+0xe7/0x108
Nov 13 09:46:04 localhost kernel:  [run_local_timers+19/21] run_local_timers+0x13/0x15
Nov 13 09:46:04 localhost kernel:  [update_process_times+83/137] update_process_times+0x53/0x89
Nov 13 09:46:04 localhost kernel:  [smp_local_timer_interrupt+46/89] smp_local_timer_interrupt+0x2e/0x59
Nov 13 09:46:04 localhost kernel:  [smp_apic_timer_interrupt+92/106] smp_apic_timer_interrupt+0x5c/0x6a
Nov 13 09:46:04 localhost kernel:  [apic_timer_interrupt+107/112] apic_timer_interrupt+0x6b/0x70
Nov 13 09:46:04 localhost kernel:  [_write_unlock_irqrestore+81/95] _write_unlock_irqrestore+0x51/0x5f
Nov 13 09:46:04 localhost kernel:  [test_clear_page_dirty+189/224] test_clear_page_dirty+0xbd/0xe0
Nov 13 09:46:04 localhost kernel:  [try_to_free_buffers+123/174] try_to_free_buffers+0x7b/0xae
Nov 13 09:46:04 localhost kernel:  [try_to_release_page+68/74] try_to_release_page+0x44/0x4a
Nov 13 09:46:04 localhost kernel:  [invalidate_complete_page+54/191] invalidate_complete_page+0x36/0xbf
Nov 13 09:46:04 localhost kernel:  [invalidate_mapping_pages+157/280] invalidate_mapping_pages+0x9d/0x118
Nov 13 09:46:04 localhost kernel:  [invalidate_inode_pages+18/20] invalidate_inode_pages+0x12/0x14
Nov 13 09:46:04 localhost kernel:  [invalidate_bdev+47/56] invalidate_bdev+0x2f/0x38
Nov 13 09:46:04 localhost kernel:  [kill_bdev+25/52] kill_bdev+0x19/0x34
Nov 13 09:46:04 localhost kernel:  [__blkdev_put+88/314] __blkdev_put+0x58/0x13a
Nov 13 09:46:04 localhost kernel:  [blkdev_put+11/13] blkdev_put+0xb/0xd
Nov 13 09:46:04 localhost kernel:  [blkdev_close+52/61] blkdev_close+0x34/0x3d
Nov 13 09:46:04 localhost kernel:  [__fput+170/306] __fput+0xaa/0x132
Nov 13 09:46:04 localhost kernel:  [fput+21/23] fput+0x15/0x17
Nov 13 09:46:04 localhost kernel:  [filp_close+109/129] filp_close+0x6d/0x81
Nov 13 09:46:04 localhost kernel:  [sys_close+140/168] sys_close+0x8c/0xa8
Nov 13 09:46:04 localhost kernel:  [system_call+126/131] system_call+0x7e/0x83
Nov 13 09:46:04 localhost kernel:  [phys_startup_64+-1072859838/2147483392]

> > So the kernel was doing a lot of work, on a slow CPU.  Perhaps that simply
> > exceeded the softlockup timeout.  If that's true then the machine should
> > have recovered.  Once it did, and once it didn't.  I don't know why it
> > didn't.

It did for me in all cases. Should it be taking long enough to trigger
the softlock timeout to do this? The size of the device is approx 280gig.

-- 
    "To the extent that we overreact, we proffer the terrorists the
    greatest tribute."
    	- High Court Judge Michael Kirby
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/