linux-kernel - Re: BUG: soft lockup detected on CPU#0! in sys

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20061114020125.636c9006.akpm@osdl.org>
Date:	Tue, 14 Nov 2006 02:01:25 -0800
From:	Andrew Morton <akpm@...l.org>
To:	CaT <cat@....com.au>
Cc:	"Dennis J.A. Bijwaard" <dennis@...22032063.dsl.speedlinq.nl>,
	bijwaard@...il.com, sct@...hat.com, adilger@...sterfs.com,
	linux-kernel@...r.kernel.org
Subject: Re: BUG: soft lockup detected on CPU#0! in sys_close and ext3

On Tue, 14 Nov 2006 20:58:18 +1100
CaT <cat@....com.au> wrote:

> On Sun, Oct 15, 2006 at 11:58:55PM +0200, Dennis J.A. Bijwaard wrote:
> > Hi Andrew,
> > 
> > Thanks for your reply. The machine has 512MB and some more swap:
> > 
> > Mem:    510960k total,   504876k used,     6084k free,     1868k buffers
> > Swap:   674640k total,     2652k used,   671988k free,   354832k cached
> > 
> > Machine may be slow for current standards, it has 2 * 500Mhz
> 
> This looks like what I have come across yesterda, except my machine is
> far from slow. It's a dual core woodcrest 5130 cpu (2Ghz) with 4gb of
> ram. My call trace looks very similar and occurs when sda1 (for eg) is 
> involved or md0 (raid0 in this case) is. It happens multiple times in 
> each case (from memory).
> 
> Nov 13 09:46:04 localhost kernel: Call Trace:
> Nov 13 09:46:04 localhost kernel:  [show_trace+185/628] show_trace+0xb9/0x274
> Nov 13 09:46:04 localhost kernel:  [dump_stack+21/26] dump_stack+0x15/0x1a
> Nov 13 09:46:04 localhost kernel:  [softlockup_tick+231/264] softlockup_tick+0xe7/0x108
> Nov 13 09:46:04 localhost kernel:  [run_local_timers+19/21] run_local_timers+0x13/0x15
> Nov 13 09:46:04 localhost kernel:  [update_process_times+83/137] update_process_times+0x53/0x89
> Nov 13 09:46:04 localhost kernel:  [smp_local_timer_interrupt+46/89] smp_local_timer_interrupt+0x2e/0x59
> Nov 13 09:46:04 localhost kernel:  [smp_apic_timer_interrupt+92/106] smp_apic_timer_interrupt+0x5c/0x6a
> Nov 13 09:46:04 localhost kernel:  [apic_timer_interrupt+107/112] apic_timer_interrupt+0x6b/0x70
> Nov 13 09:46:04 localhost kernel:  [_write_unlock_irqrestore+81/95] _write_unlock_irqrestore+0x51/0x5f
> Nov 13 09:46:04 localhost kernel:  [test_clear_page_dirty+189/224] test_clear_page_dirty+0xbd/0xe0
> Nov 13 09:46:04 localhost kernel:  [try_to_free_buffers+123/174] try_to_free_buffers+0x7b/0xae
> Nov 13 09:46:04 localhost kernel:  [try_to_release_page+68/74] try_to_release_page+0x44/0x4a
> Nov 13 09:46:04 localhost kernel:  [invalidate_complete_page+54/191] invalidate_complete_page+0x36/0xbf
> Nov 13 09:46:04 localhost kernel:  [invalidate_mapping_pages+157/280] invalidate_mapping_pages+0x9d/0x118
> Nov 13 09:46:04 localhost kernel:  [invalidate_inode_pages+18/20] invalidate_inode_pages+0x12/0x14
> Nov 13 09:46:04 localhost kernel:  [invalidate_bdev+47/56] invalidate_bdev+0x2f/0x38
> Nov 13 09:46:04 localhost kernel:  [kill_bdev+25/52] kill_bdev+0x19/0x34
> Nov 13 09:46:04 localhost kernel:  [__blkdev_put+88/314] __blkdev_put+0x58/0x13a
> Nov 13 09:46:04 localhost kernel:  [blkdev_put+11/13] blkdev_put+0xb/0xd
> Nov 13 09:46:04 localhost kernel:  [blkdev_close+52/61] blkdev_close+0x34/0x3d
> Nov 13 09:46:04 localhost kernel:  [__fput+170/306] __fput+0xaa/0x132
> Nov 13 09:46:04 localhost kernel:  [fput+21/23] fput+0x15/0x17
> Nov 13 09:46:04 localhost kernel:  [filp_close+109/129] filp_close+0x6d/0x81
> Nov 13 09:46:04 localhost kernel:  [sys_close+140/168] sys_close+0x8c/0xa8
> Nov 13 09:46:04 localhost kernel:  [system_call+126/131] system_call+0x7e/0x83
> Nov 13 09:46:04 localhost kernel:  [phys_startup_64+-1072859838/2147483392]
> 
> > > So the kernel was doing a lot of work, on a slow CPU.  Perhaps that simply
> > > exceeded the softlockup timeout.  If that's true then the machine should
> > > have recovered.  Once it did, and once it didn't.  I don't know why it
> > > didn't.
> 
> It did for me in all cases. Should it be taking long enough to trigger
> the softlock timeout to do this? The size of the device is approx 280gig.
> 

It's a bit of a worry if it's taking all that time to shoot down 4g of
pagecache.  The 280G will affect things - the radix-tree will be sparse and
the invalidate has firther to walk.  But still...

I assume that a fsck does the same thing?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/