[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <528A5282.7090505@gmail.com>
Date: Mon, 18 Nov 2013 12:46:42 -0500
From: Vlad Yasevich <vyasevich@...il.com>
To: Stephen Hemminger <stephen@...workplumber.org>,
netdev@...r.kernel.org
Subject: Re: Fw: [Bug 65131] New: kernel panic (BUG_ON raised) in SCTP function
sctp_cmd_interpreter
On 11/18/2013 12:14 PM, Stephen Hemminger wrote:
>
>
> Begin forwarded message:
>
> Date: Sun, 17 Nov 2013 19:38:56 -0800
> From: "bugzilla-daemon@...zilla.kernel.org" <bugzilla-daemon@...zilla.kernel.org>
> To: "stephen@...workplumber.org" <stephen@...workplumber.org>
> Subject: [Bug 65131] New: kernel panic (BUG_ON raised) in SCTP function sctp_cmd_interpreter
>
>
> https://bugzilla.kernel.org/show_bug.cgi?id=65131
>
> Bug ID: 65131
> Summary: kernel panic (BUG_ON raised) in SCTP function
> sctp_cmd_interpreter
> Product: Networking
> Version: 2.5
> Kernel Version: 3.11.8 custom build, repeated on 3.11.2
> Hardware: All
> OS: Linux
> Tree: Mainline
> Status: NEW
> Severity: blocking
> Priority: P1
> Component: IPV4
> Assignee: shemminger@...ux-foundation.org
> Reporter: yuras@....net
> Regression: No
>
> Created attachment 114991
> --> https://bugzilla.kernel.org/attachment.cgi?id=114991&action=edit
> Screenshot of panic
>
> Two-node cluster configured using latest corosync (also DRBD 8.4.4, LVM2, and
> GFS2 but this is unessential).
> Steps to reproduce:
> 1. Start corosync on both nodes.
> 2. Start dlm_controld (version 4.0.2) on both nodes (used SCTP protocol as TCP
> cannot be used on multi-homed hosts). Adds such lines to kern.log:
> kernel: [ 580.428664] sctp: Hash tables configured (established 65536 bind
> 65536)
> kernel: [ 580.441779] DLM installed
> 3. Start clvmd on either node. Adds such lines to kern.log:
> kernel: [ 1345.259502] dlm: Using SCTP for communications
> kernel: [ 1345.260699] dlm: clvmd: joining the lockspace group...
> kernel: [ 1345.262962] dlm: clvmd: dlm_recover 1
> kernel: [ 1345.262968] dlm: clvmd: group event done 0 0
> kernel: [ 1345.262992] dlm: clvmd: add member 1024
> kernel: [ 1345.262995] dlm: clvmd: dlm_recover_members 1 nodes
> kernel: [ 1345.262996] dlm: clvmd: join complete
> kernel: [ 1345.262998] dlm: clvmd: generation 1 slots 1 1:1024
> kernel: [ 1345.262999] dlm: clvmd: dlm_recover_directory
> kernel: [ 1345.263000] dlm: clvmd: dlm_recover_directory 0 in 0 new
> kernel: [ 1345.263002] dlm: clvmd: dlm_recover_directory 0 out 0 messages
> kernel: [ 1345.263019] dlm: clvmd: dlm_recover 1 generation 1 done: 0 ms
> 4. Start clvmd on second node. With high probability one node or both nodes
> panic in the similar way. Screenshot in attachment.
>
> Stack trace can differ slightly above EOI line, but RIP was always the same. I
> suppose provided CPU codes correspond to one of BUG_ON macro inside
> sctp_cmd_interpreter. So, this is a bug.
>
> Now this bug totally prevents me from using my cluster as DLM rejects to use
> TCP for multi-homed hosts.
>
Should be fixed by:
commit 7926c1d5be0b7cbe5b8d5c788d7d39237e7b212c
Author: Daniel Borkmann <dborkman@...hat.com>
Date: Thu Oct 31 09:13:32 2013 +0100
net: sctp: do not trigger BUG_ON in sctp_cmd_delete_tcb
-vlad
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists