linux-kernel - Re: endless loop in native_flush_tlb_others in smp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <200803112243.27056.chunkeey@web.de>
Date:	Tue, 11 Mar 2008 22:43:26 +0100
From:	Chr <chunkeey@....de>
To:	"Pallipadi, Venkatesh" <venkatesh.pallipadi@...el.com>
Cc:	"Jike Song" <albcamus@...il.com>,
	"Linux Kernel" <linux-kernel@...r.kernel.org>,
	"Ingo Molnar" <mingo@...e.hu>,
	"Thomas Gleixner" <tglx@...utronix.de>,
	"Brown, Len" <len.brown@...el.com>
Subject: Re: endless loop in native_flush_tlb_others in smp_64.c

On Tuesday 11 March 2008 12:09:24 you wrote:
> On Tue, 11 Mar 2008, Jike Song wrote:
>
> Any chance that you can capture SYSRQ-T output via serial or
> netconsole, so we can see the stacktrace and what the other CPUs are
> doing, if they are doing anything.

this time with a 2.6.25-rc4-wl: (unfortunatly tainted again)
the serial console seems to work: GFPs all over the place...
take a look here: 
http://www.pastebin.ca/938757

Since I get so many different Oopses. I'm beginning to suspect my
fancy JFS/ReiserFS/Ext3:DM-Crypt:LVM2:MD(Raid1) combo causes 
memory corruptions/leaks/voodoo...  

like this other tragic incident: 
loop0         D ffff810079331bd0     0 15716      2
 ffff810079331b40 0000000000000046 ffff810062295c90 ffffffff804028e0
 ffff810069608800 ffff810079331af0 ffffc20010af7040 ffffffff805f6700
 ffffffff805f6700 ffffffff805f2f50 ffffffff805f6700 ffff81007a7df830
Call Trace:
 [<ffffffff804028e0>] __split_bio+0x367/0x378
 [<ffffffff8033e442>] generic_unplug_device+0x18/0x24
 [<ffffffff804040b5>] dm_table_unplug_all+0x2a/0x3d
 [<ffffffff802930c5>] sync_buffer+0x0/0x3f
 [<ffffffff8048476d>] io_schedule+0x28/0x34
 [<ffffffff80293100>] sync_buffer+0x3b/0x3f
 [<ffffffff8048499e>] __wait_on_bit+0x40/0x6e
 [<ffffffff802930c5>] sync_buffer+0x0/0x3f
 [<ffffffff80484a38>] out_of_line_wait_on_bit+0x6c/0x78
 [<ffffffff8023eb3d>] wake_bit_function+0x0/0x23
 [<ffffffff802932b7>] ll_rw_block+0x8c/0xaf
 [<ffffffff8029385b>] __block_prepare_write+0x366/0x3b9
 [<ffffffff802e2a1c>] ext3_get_block+0x0/0xf9
 [<ffffffff8029394b>] block_write_begin+0x78/0xc9
 [<ffffffff802e3f1f>] ext3_write_begin+0xeb/0x1aa
 [<ffffffff802e2a1c>] ext3_get_block+0x0/0xf9
 [<ffffffff803b5928>] do_lo_send_aops+0x9f/0x177
 [<ffffffff803b5889>] do_lo_send_aops+0x0/0x177
 [<ffffffff803b5732>] loop_thread+0x2ce/0x425
 [<ffffffff803b5464>] loop_thread+0x0/0x425
 [<ffffffff8023e9ed>] kthread+0x47/0x76
 [<ffffffff80229404>] schedule_tail+0x28/0x5c
 [<ffffffff8020be68>] child_rip+0xa/0x12
 [<ffffffff8023e9a6>] kthread+0x0/0x76
 [<ffffffff8020be5e>] child_rip+0x0/0x12

situation: the system died after writing >2 Gb from /dev/zero 
(gosh, about only 1Mb/s-500kb/s!!) into a file in a _mounted_ 
loopdevice of a old-hdd-image-file on a jfs/dm-crypt/lvm2 combo. 

BTW: bisect is still running... the regression seems to have sneaked in
between 2.6.24 and 2.6.25-rc1 however 4000 diffs will take a while... 

(it takes so long since the raid has to resync each reboot... 
Thank *** that this is just a stress-testing system that can take some 
beating without _failing_ apart. ;-) )

Regards,
	Chr
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/