[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4F0F202F.8060901@enea.com>
Date: Thu, 12 Jan 2012 19:02:23 +0100
From: Arvid Brodin <arvid.brodin@...a.com>
To: <netdev@...r.kernel.org>
CC: arbr <Arvid.Brodin@...a.com>
Subject: Re: bridge: HSR support - possible recursive locking?
Arvid Brodin wrote:
> Arvid Brodin wrote:
>>> On Tue, 11 Oct 2011 20:25:08 +0200
>>> Arvid Brodin <arvid.brodin@...a.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I want to add support for HSR ("High-availability Seamless Redundancy",
>>>> IEC-62439-3) to the bridge code. With HSR, all connected units have two network
>>>> ports and are connected in a ring. All new Ethernet packets are sent on both
>>>> ports (or passed through if the current unit is not the originating unit). The
>>>> same packet is never passed twice. Non-HSR units are not allowed in the ring.
>>>>
>>>> This gives instant, reconfiguration-free failover.
>>>>
> *snip*
>> I need to do two things:
>>
>> 1) Bind two network interfaces into one (say, eth0 & eth1 => hsr0). Frames sent on
>> hsr0 should get an HSR tag (including the correct EtherType) and go out on both
>> eth0 and eth1.
>>
>> 2) Ingress frames on eth0 & eth1, with EtherType 0x88fb, should be captured and
>> handled specially (either received on hsr0 or forwarded to the other bound
>> physical interface).
>>
>
> I'm slowly getting there! :)
>
> But what is net_device->header_ops->rebuild supposed to do?
>
I have a "possible recursive locking" when I send cloned packets, and I can't figure out
why. Here's the stack dump and some debug printouts:
hsr_dev_xmit:286: sent on first slave
=============================================
[ INFO: possible recursive locking detected ]
2.6.37 #43
---------------------------------------------
swapper/0 is trying to acquire lock:
(_xmit_ETHER#2){+.-...}, at: [<901b9aae>] sch_direct_xmit+0x24/0x152
but task is already holding lock:
(_xmit_ETHER#2){+.-...}, at: [<901afc4a>] dev_queue_xmit+0x2ce/0x37c
other info that might help us debug this:
4 locks held by swapper/0:
#0: (&n->timer){+.-...}, at: [<9002b2b4>] run_timer_softirq+0x98/0x184
#1: (rcu_read_lock_bh){.+....}, at: [<901af97c>] dev_queue_xmit+0x0/0x37c
#2: (_xmit_ETHER#2){+.-...}, at: [<901afc4a>] dev_queue_xmit+0x2ce/0x37c
#3: (rcu_read_lock_bh){.+....}, at: [<901af97c>] dev_queue_xmit+0x0/0x37c
stack backtrace:
Call trace:
[<9001c264>] dump_stack+0x18/0x20
[<9003fdbc>] validate_chain+0x40c/0x9ac
[<90040968>] __lock_acquire+0x60c/0x670
[<90041cda>] lock_acquire+0x3a/0x48
[<90216c5c>] _raw_spin_lock+0x20/0x44
[<901b9aae>] sch_direct_xmit+0x24/0x152
[<901afb44>] dev_queue_xmit+0x1c8/0x37c
[<90213090>] nf_hook_xmit+0x8/0xc
[<902130a2>] slave_xmit+0xe/0x10
[<902131d6>] hsr_dev_xmit+0xa6/0xcc
[<901af8c2>] dev_hard_start_xmit+0x382/0x43c
[<901afc64>] dev_queue_xmit+0x2e8/0x37c
[<901dc8a0>] arp_xmit+0x8/0xc
[<901dcf86>] arp_send+0x2a/0x2c
[<901dd978>] arp_solicit+0x110/0x130
[<901b54a4>] neigh_timer_handler+0x1c2/0x206
[<9002b31e>] run_timer_softirq+0x102/0x184
[<90027eb8>] __do_softirq+0x64/0xe0
[<9002804a>] do_softirq+0x26/0x48
[<90028146>] irq_exit+0x2e/0x64
[<90019bae>] do_IRQ+0x46/0x5c
[<90018424>] irq_level0+0x18/0x60
[<902136ae>] rest_init+0x72/0x90
[<9000063c>] start_kernel+0x21c/0x258
[<00000000>] 0x0
hsr_dev_xmit:289: sent on second slave
The code looks like this (from my hsr_dev_xmit() function):
...
skb2 = skb_clone(skb, GFP_ATOMIC);
slave_xmit(skb, hsr_priv->slave_data[0].dev);
printk(KERN_INFO "%s:%d: sent on first slave\n", __func__, __LINE__);
if (skb2)
slave_xmit(skb2, hsr_priv->slave_data[1].dev);
printk(KERN_INFO "%s:%d: sent on second slave\n", __func__, __LINE__);
...
and slave_xmit looks like this:
int nf_hook_xmit(struct sk_buff *skb)
{
dev_queue_xmit(skb);
return 0;
}
static int slave_xmit(struct sk_buff *skb, struct net_device *dev)
{
int res;
skb->dev = dev;
skb->priority = 1; // FIXME: what does this mean?
res = NF_HOOK(NFPROTO_BRIDGE, NF_BR_POST_ROUTING, skb, NULL, skb->dev, nf_hook_xmit);
// res = dev_queue_xmit(skb);
/* Buffer is consumed on errors too, so nothing to do here, really... */
return res;
}
I believe I'm doing exactly the same thing as the bridging code (but of course I
can't be). So what is it that I'm doing wrong???
--
Arvid Brodin
Enea Services Stockholm AB
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists