[<prev] [next>] [day] [month] [year] [list]
Message-ID: <f7ac00c4-9314-b6da-d783-78d8696bb9db@univention.de>
Date: Wed, 28 Sep 2016 18:23:50 +0200
From: Philipp Hahn <hahn@...vention.de>
To: linux-kernel@...r.kernel.org
Subject: 4.1.16 crash: list_add corruption
Hello,
one of our servers crashed repeatedly this week. After setting up serial
console logging we were able to capture the following stack traces:
> [3689736.061539] WARNING: CPU: 0 PID: 29284 at linux-4.1.6/lib/list_debug.c:33 __list_add+0xc0/0xd0()
> [3689736.061541] list_add corruption. prev->next should be next (ffffffff81ab3ca8), but was ffffffff81ab3cc8. (prev=ffff8804d9910d58).
Compare this ...
> [3689736.061602] CPU: 0 PID: 29284 Comm: slapd Tainted: G W 4.1.0-ucs190-amd64 #1 Debian 4.1.6-1.190.201604142226
> [3689736.061603] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/21/2015
Maybe VMware has a bug?
> [3689736.061604] 0000000000000000 ffffffff817531c0 ffffffff81597807 ffff88083fc038a8
> [3689736.061606] ffffffff81076c45 ffff88004b553e00 ffffffff81ab3ca8 ffff8804d9910d58
> [3689736.061608] 0000000000000001 0000000137090762 ffffffff81076d4a ffffffff81753310
> [3689736.061609] Call Trace:
...
> [3689736.061624] [<ffffffff8130be50>] ? __list_add+0xc0/0xd0
> [3689736.061627] [<ffffffff810da5a6>] ? internal_add_timer+0x36/0xa0
> [3689736.061629] [<ffffffff810dc6fa>] ? mod_timer_pending+0xfa/0x140
> [3689736.061635] [<ffffffffa048c441>] ? __nf_ct_refresh_acct+0xb1/0xc0 [nf_conntrack]
> [3689736.061640] [<ffffffffa04945bc>] ? tcp_packet+0x66c/0x1500 [nf_conntrack]
> [3689736.061643] [<ffffffff810b5fff>] ? autoremove_wake_function+0x2f/0x50
> [3689736.061647] [<ffffffffa0493ef2>] ? tcp_error+0x1b2/0x210 [nf_conntrack]
> [3689736.061650] [<ffffffffa048e725>] ? nf_conntrack_in+0x3a5/0xb30 [nf_conntrack]
> [3689736.061654] [<ffffffff81481cb4>] ? sk_reset_timer+0x14/0x20
> [3689736.061657] [<ffffffff814cdeef>] ? nf_iterate+0x4f/0x80
> [3689736.061659] [<ffffffff814cdfb8>] ? nf_hook_slow+0x98/0xf0
> [3689736.061662] [<ffffffff814d52f4>] ? ip_rcv+0x314/0x400
> [3689736.061664] [<ffffffff814d48a0>] ? inet_add_protocol+0x50/0x50
> [3689736.061668] [<ffffffff81498ae3>] ? __netif_receive_skb_core+0x703/0x920
> [3689736.061670] [<ffffffff8101f405>] ? read_tsc+0x5/0x10
> [3689736.061672] [<ffffffff81498ecf>] ? netif_receive_skb_internal+0x1f/0x90
> [3689736.061673] [<ffffffff81499af0>] ? napi_gro_receive+0xb0/0xe0
> [3689736.061678] [<ffffffffa0097fe4>] ? e1000_clean_rx_irq+0x2b4/0x500 [e1000]
> [3689736.061681] [<ffffffffa0099ccc>] ? e1000_clean+0x26c/0x900 [e1000]
> [3689736.061683] [<ffffffff81499629>] ? net_rx_action+0x159/0x330
> [3689736.061685] [<ffffffff8107aace>] ? __do_softirq+0xde/0x260
> [3689736.061687] [<ffffffff8107ae95>] ? irq_exit+0x95/0xa0
> [3689736.061689] [<ffffffff815a0b74>] ? do_IRQ+0x64/0x110
> [3689736.061691] [<ffffffff8159e9ee>] ? common_interrupt+0x6e/0x6e
...
> [3689738.157677] WARNING: CPU: 0 PID: 29284 at linux-4.1.6/lib/list_debug.c:33 __list_add+0xc0/0xd0()
> [3689738.157678] list_add corruption. prev->next should be next (ffffffff81ab3cc8), but was ffffffff81ab3ca8. (prev=ffff8804d9910d58).
with that one: the arguments are swapped.
...
> [3689738.157740] [<ffffffff8130be50>] ? __list_add+0xc0/0xd0
> [3689738.157742] [<ffffffff810da5a6>] ? internal_add_timer+0x36/0xa0
> [3689738.157744] [<ffffffff810dc6fa>] ? mod_timer_pending+0xfa/0x140
> [3689738.157748] [<ffffffffa048c441>] ? __nf_ct_refresh_acct+0xb1/0xc0 [nf_conntrack]
> [3689738.157751] [<ffffffffa04945bc>] ? tcp_packet+0x66c/0x1500 [nf_conntrack]
> [3689738.157753] [<ffffffff8101f9d5>] ? sched_clock+0x5/0x10
> [3689738.157755] [<ffffffff8109ea48>] ? resched_curr+0x38/0xc0
> [3689738.157758] [<ffffffff810b5fff>] ? autoremove_wake_function+0x2f/0x50
> [3689738.157760] [<ffffffffa0493ef2>] ? tcp_error+0x1b2/0x210 [nf_conntrack]
> [3689738.157763] [<ffffffffa048e725>] ? nf_conntrack_in+0x3a5/0xb30 [nf_conntrack]
> [3689738.157765] [<ffffffff81481cb4>] ? sk_reset_timer+0x14/0x20
> [3689738.157768] [<ffffffff814cdeef>] ? nf_iterate+0x4f/0x80
> [3689738.157769] [<ffffffff814cdfb8>] ? nf_hook_slow+0x98/0xf0
> [3689738.157771] [<ffffffff814d52f4>] ? ip_rcv+0x314/0x400
> [3689738.157773] [<ffffffff814d48a0>] ? inet_add_protocol+0x50/0x50
> [3689738.157775] [<ffffffff81498ae3>] ? __netif_receive_skb_core+0x703/0x920
> [3689738.157777] [<ffffffff8101f405>] ? read_tsc+0x5/0x10
> [3689738.157778] [<ffffffff81498ecf>] ? netif_receive_skb_internal+0x1f/0x90
> [3689738.157780] [<ffffffff81499af0>] ? napi_gro_receive+0xb0/0xe0
> [3689738.157784] [<ffffffffa0097fe4>] ? e1000_clean_rx_irq+0x2b4/0x500 [e1000]
> [3689738.157787] [<ffffffffa0099ccc>] ? e1000_clean+0x26c/0x900 [e1000]
> [3689738.157789] [<ffffffff81499629>] ? net_rx_action+0x159/0x330
> [3689738.157791] [<ffffffff8107aace>] ? __do_softirq+0xde/0x260
> [3689738.157792] [<ffffffff8107ae95>] ? irq_exit+0x95/0xa0
> [3689738.157794] [<ffffffff815a0b74>] ? do_IRQ+0x64/0x110
> [3689738.157797] [<ffffffff8159e9ee>] ? common_interrupt+0x6e/0x6e
Has anyone seen a similar issue and knows if it is fixed post 4.1.16?
If you need more data, just ask and I will see what else I can gather.
Thank you in advance.
Philipp
--
Philipp Hahn
Open Source Software Engineer
Univention GmbH
be open.
Mary-Somerville-Str. 1
D-28359 Bremen
Tel.: +49 421 22232-0
Fax : +49 421 22232-99
hahn@...vention.de
http://www.univention.de/
Geschäftsführer: Peter H. Ganten
HRB 20755 Amtsgericht Bremen
Steuer-Nr.: 71-597-02876
View attachment "syslog.txt" of type "text/plain" (7697 bytes)
Powered by blists - more mailing lists