linux-kernel - Re: KVM induced panic on 2.6.38[2367] & 2.6.39

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4DE60918.3010008@redhat.com>
Date:	Wed, 01 Jun 2011 12:40:40 +0300
From:	Avi Kivity <avi@...hat.com>
To:	Brad Campbell <lists2009@...rfbargle.com>
CC:	Hugh Dickins <hughd@...gle.com>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Borislav Petkov <bp@...en8.de>, linux-kernel@...r.kernel.org,
	kvm@...r.kernel.org, linux-mm <linux-mm@...ck.org>
Subject: Re: KVM induced panic on 2.6.38[2367] & 2.6.39

On 06/01/2011 12:29 PM, Brad Campbell wrote:
> On 01/06/11 14:56, Avi Kivity wrote:
>> On 06/01/2011 09:31 AM, Brad Campbell wrote:
>>> On 01/06/11 12:52, Hugh Dickins wrote:
>>>
>>>>
>>>> I guess Brad could try SLUB debugging, boot with slub_debug=P
>>>> for poisoning perhaps; though it might upset alignments and
>>>> drive the problem underground. Or see if the same happens
>>>> with SLAB instead of SLUB.
>>>
>>> Not much use I'm afraid.
>>> This is all I get in the log
>>>
>>> [ 3161.300073]
>>> ============================================================================= 
>>>
>>>
>>> [ 3161.300147] BUG kmalloc-512: Freechain corrupt
>>>
>>> The qemu process is then frozen, unkillable but reported in state "R"
>>>
>>> 13881 ? R 3:27 /usr/bin/qemu -S -M pc-0.13 -enable-kvm -m 1024 -smp
>>> 2,sockets=2,cores=1,threads=1 -nam
>>>
>>> The machine then progressively dies until it's frozen solid with no
>>> further error messages.
>>>
>>> I stupidly forgot to do an alt-sysrq-t prior to doing an alt-sysrq-b,
>>> but at least it responded to that.
>>>
>>> On the bright side I can reproduce it at will.
>>
>> Please try slub_debug=FZPU; that should point the finger (hopefully at
>> somebody else).
>>
>
> Well the first attempt locked the machine solid. No network, no console..
>
> I saw 
> "=========================================================================="
>
> on the console.. nothing after that. Would not respond to sysrq-t or 
> any other sysrq combination other than -b, which rebooted the box.
>
>
> No output on netconsole at all, I had to walk to the other building to 
> look at the monitor and reboot it.
>
> The second attempt jammed netconsole again, but I managed to get this 
> from an ssh session I already had established. The machine died a slow 
> and horrible death, but remained interactive enough for me to reboot 
> it with
>
> echo b > /proc/sysrq-trigger
>
> Nothing else worked.
>
>
> [  413.756416]  [<ffffffff81318f1c>] ? pskb_expand_head+0x15c/0x250
> [  413.756424]  [<ffffffff813a6c45>] ? nf_bridge_copy_header+0x145/0x160
> [  413.756431]  [<ffffffff8139f78d>] ? br_dev_queue_push_xmit+0x6d/0x80
> [  413.756439]  [<ffffffff813a55a0>] ? br_nf_post_routing+0x2a0/0x2f0
> [  413.756447]  [<ffffffff81346bc4>] ? nf_iterate+0x84/0xb0
> [  413.756453]  [<ffffffff8139f720>] ? br_flood_deliver+0x20/0x20
> [  413.756459]  [<ffffffff81346c64>] ? nf_hook_slow+0x74/0x120
> [  413.756465]  [<ffffffff8139f720>] ? br_flood_deliver+0x20/0x20
> [  413.756472]  [<ffffffff8139f7da>] ? br_forward_finish+0x3a/0x60
> [  413.756479]  [<ffffffff813a5758>] ? br_nf_forward_finish+0x168/0x170
> [  413.756487]  [<ffffffff813a5c90>] ? br_nf_forward_ip+0x360/0x3a0
> [  413.756492]  [<ffffffff81346bc4>] ? nf_iterate+0x84/0xb0
> [  413.756498]  [<ffffffff8139f7a0>] ? br_dev_queue_push_xmit+0x80/0x80
> [  413.756504]  [<ffffffff81346c64>] ? nf_hook_slow+0x74/0x120
> [  413.756510]  [<ffffffff8139f7a0>] ? br_dev_queue_push_xmit+0x80/0x80
> [  413.756516]  [<ffffffff8139f800>] ? br_forward_finish+0x60/0x60
> [  413.756522]  [<ffffffff8139f800>] ? br_forward_finish+0x60/0x60
> [  413.756528]  [<ffffffff8139f875>] ? __br_forward+0x75/0xc0
> [  413.756534]  [<ffffffff8139f426>] ? deliver_clone+0x36/0x60
> [  413.756540]  [<ffffffff8139f69d>] ? br_flood+0xbd/0x100
> [  413.756546]  [<ffffffff813a05b0>] ? br_handle_local_finish+0x40/0x40
> [  413.756552]  [<ffffffff813a080e>] ? br_handle_frame_finish+0x25e/0x280
> [  413.756560]  [<ffffffff813a60f0>] ? 
> br_nf_pre_routing_finish+0x1a0/0x330
> [  413.756568]  [<ffffffff813a6958>] ? br_nf_pre_routing+0x6d8/0x800
> [  413.756577]  [<ffffffff8102d46a>] ? enqueue_task+0x3a/0x90
> [  413.756582]  [<ffffffff81346bc4>] ? nf_iterate+0x84/0xb0
> [  413.756589]  [<ffffffff813a05b0>] ? br_handle_local_finish+0x40/0x40
> [  413.756594]  [<ffffffff81346c64>] ? nf_hook_slow+0x74/0x120
> [  413.756600]  [<ffffffff813a05b0>] ? br_handle_local_finish+0x40/0x40
> [  413.756607]  [<ffffffff810339b0>] ? try_to_wake_up+0x2c0/0x2c0
> [  413.756613]  [<ffffffff813a09d9>] ? br_handle_frame+0x1a9/0x280
> [  413.756620]  [<ffffffff813a0830>] ? br_handle_frame_finish+0x280/0x280
> [  413.756627]  [<ffffffff81320ef7>] ? __netif_receive_skb+0x157/0x5c0
> [  413.756634]  [<ffffffff81321443>] ? process_backlog+0xe3/0x1d0
> [  413.756641]  [<ffffffff81321da5>] ? net_rx_action+0xc5/0x1d0
> [  413.756650]  [<ffffffff8103df11>] ? __do_softirq+0x91/0x120
> [  413.756657]  [<ffffffff813d838c>] ? call_softirq+0x1c/0x30
> [  413.756660] <EOI>  [<ffffffff81003cbd>] ? do_softirq+0x4d/0x80
> [  413.756673]  [<ffffffff81321ece>] ? netif_rx_ni+0x1e/0x30
> [  413.756681]  [<ffffffff812b3ae2>] ? tun_chr_aio_write+0x332/0x4e0
> [  413.756688]  [<ffffffff812b37b0>] ? tun_sendmsg+0x4d0/0x4d0
> [  413.756697]  [<ffffffff810c24e9>] ? do_sync_readv_writev+0xa9/0xf0
> [  413.756704]  [<ffffffff81063f9c>] ? do_futex+0x13c/0xa70
> [  413.756711]  [<ffffffff811d6730>] ? timerqueue_add+0x60/0xb0
> [  413.756719]  [<ffffffff81056ab7>] ? 
> __hrtimer_start_range_ns+0x1e7/0x410
> [  413.756726]  [<ffffffff810c231b>] ? rw_copy_check_uvector+0x7b/0x140
> [  413.756734]  [<ffffffff810c2bcf>] ? do_readv_writev+0xdf/0x210
> [  413.756742]  [<ffffffff810c2e7e>] ? sys_writev+0x4e/0xc0
> [  413.756750]  [<ffffffff813d753b>] ? system_call_fastpath+0x16/0x1b
> [  413.756756] FIX kmalloc-1024: Restoring 
> 0xffff880417179566-0xffff880417179567=0x5a

bridge and netfilter, IIRC this was also the problem last time.

Do you have any ebtables loaded?

Can you try building a kernel without ebtables?  Without netfilter at all?

Please run all tests with slub_debug=FZPU.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/