[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <m1mx6otk85.fsf@fess.ebiederm.org>
Date: Fri, 06 Apr 2012 18:16:10 -0700
From: ebiederm@...ssion.com (Eric W. Biederman)
To: Sasha Levin <levinsasha928@...il.com>
Cc: davem <davem@...emloft.net>, Eric Dumazet <eric.dumazet@...il.com>,
Eric Van Hensbergen <ericvh@...il.com>,
Dave Jones <davej@...hat.com>,
linux-kernel <linux-kernel@...r.kernel.org>,
netdev <netdev@...r.kernel.org>
Subject: Re: net: kernel BUG() in net/netns/generic.h:45
Sasha Levin <levinsasha928@...il.com> writes:
> On Fri, Apr 6, 2012 at 1:53 AM, Eric W. Biederman <ebiederm@...ssion.com> wrote:
>> Sasha Levin <levinsasha928@...il.com> writes:
>>
>>> Hi all,
>>>
>>> When an initialization of a network namespace in setup_net() fails, we
>>> try to undo everything by executing each of the exit callbacks of every
>>> namespace in the network.
>>>
>>> The problem is, it might be possible that the net_generic array wasn't
>>> initialized before we fail and try to undo everything. At that point,
>>> some of the networks assume that since we're already calling the exit
>>> callback, the net_generic structure is initialized and we hit the BUG()
>>> in net/netns/generic.h:45 .
>>>
>>> I'm not quite sure whether the right fix from the following three
>>> options is, and would be happy to figure it out before fixing it:
>>>
>>> 1. Don't assume net_generic was initialized in the exit callback, which
>>> is a bit problematic since we can't query that nicely anyway (a
>>> sub-option here would be adding an API to query whether the net_generic
>>> structure is initialized.
>>>
>>> 2. Remove the BUG(), switch it to a WARN() and let each subsystem
>>> handle the case of NULL on it's own. While it sounds a bit wrong, it's
>>> worth mentioning that that BUG() was initially added in an attempt to
>>> fix an issue in CAIF, which was fixed in a completely different way
>>> afterwards, so it's not strictly necessary here.
>>>
>>> 3. Only call the exit callback for subsystems we have called the init
>>> callback for.
>>
>> Your option 3 only calling the exit callbacks for subsystems we have
>> initialized should be what is implemented. Certainly it looks like we
>> are attempting to only call the exit callbacks for code whose init
>> callback has succeeded.
>>
>> What problem are you seeing?
>>
>> This smells suspiciously like a problem we had a little while ago caif
>> was registering as a pernet device instead of a pernet subsystem,
>> and because of that we had packets flying around after it had been
>> unregistered and was trying access it's net_generic data.
>
> It looks different from the caif problem a bit, here is a sample stacktrace:
>
> [ 163.733755] ------------[ cut here ]------------
> [ 163.734501] kernel BUG at include/net/netns/generic.h:45!
> [ 163.734501] invalid opcode: 0000 [#1] PREEMPT SMP
> [ 163.734501] CPU 2
> [ 163.734501] Pid: 19145, comm: trinity Tainted: G W
> 3.4.0-rc1-next-20120405-sasha-dirty #57
> [ 163.734501] RIP: 0010:[<ffffffff824d6062>] [<ffffffff824d6062>]
> phonet_pernet+0x182/0x1a0
> [ 163.734501] RSP: 0018:ffff8800674d5ca8 EFLAGS: 00010246
> [ 163.734501] RAX: 000000003fffffff RBX: 0000000000000000 RCX: ffff8800678c88d8
> [ 163.734501] RDX: 00000000003f4000 RSI: ffff8800678c8910 RDI: 0000000000000282
> [ 163.734501] RBP: ffff8800674d5cc8 R08: 0000000000000000 R09: 0000000000000000
> [ 163.734501] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880068bec920
> [ 163.734501] R13: ffffffff836b90c0 R14: 0000000000000000 R15: 0000000000000000
> [ 163.734501] FS: 00007f055e8de700(0000) GS:ffff88007d000000(0000)
> knlGS:0000000000000000
> [ 163.734501] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 163.734501] CR2: 00007f055e6bb518 CR3: 0000000070c16000 CR4: 00000000000406e0
> [ 163.734501] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 163.734501] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 163.734501] Process trinity (pid: 19145, threadinfo
> ffff8800674d4000, task ffff8800678c8000)
> [ 163.734501] Stack:
> [ 163.734501] ffffffff824d5f00 ffffffff810e2ec1 ffff880067ae0000
> 00000000ffffffd4
> [ 163.734501] ffff8800674d5cf8 ffffffff824d667a ffff880067ae0000
> 00000000ffffffd4
> [ 163.734501] ffffffff836b90c0 0000000000000000 ffff8800674d5d18
> ffffffff824d707d
> [ 163.734501] Call Trace:
> [ 163.734501] [<ffffffff824d5f00>] ? phonet_pernet+0x20/0x1a0
> [ 163.734501] [<ffffffff810e2ec1>] ? get_parent_ip+0x11/0x50
> [ 163.734501] [<ffffffff824d667a>] phonet_device_destroy+0x1a/0x100
> [ 163.734501] [<ffffffff824d707d>] phonet_device_notify+0x3d/0x50
> [ 163.734501] [<ffffffff810dd96e>] notifier_call_chain+0xee/0x130
> [ 163.734501] [<ffffffff810dd9d1>] raw_notifier_call_chain+0x11/0x20
> [ 163.734501] [<ffffffff821cce12>] call_netdevice_notifiers+0x52/0x60
> [ 163.734501] [<ffffffff821cd235>] rollback_registered_many+0x185/0x270
> [ 163.734501] [<ffffffff821cd334>] unregister_netdevice_many+0x14/0x60
> [ 163.734501] [<ffffffff823123e3>] ipip_exit_net+0x1b3/0x1d0
> [ 163.734501] [<ffffffff82312230>] ? ipip_rcv+0x420/0x420
> [ 163.734501] [<ffffffff821c8515>] ops_exit_list+0x35/0x70
> [ 163.734501] [<ffffffff821c911b>] setup_net+0xab/0xe0
> [ 163.734501] [<ffffffff821c9416>] copy_net_ns+0x76/0x100
> [ 163.734501] [<ffffffff810dc92b>] create_new_namespaces+0xfb/0x190
> [ 163.734501] [<ffffffff810dca21>] unshare_nsproxy_namespaces+0x61/0x80
> [ 163.734501] [<ffffffff810afd1f>] sys_unshare+0xff/0x290
> [ 163.734501] [<ffffffff8187622e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [ 163.734501] [<ffffffff82665539>] system_call_fastpath+0x16/0x1b
> [ 163.734501] Code: e0 c3 fe 66 0f 1f 44 00 00 48 c7 c2 40 60 4d 82
> be 01 00 00 00 48 c7 c7 80 d1 23 83 e8 48 2a c4 fe e8 73 06 c8 fe 48
> 85 db 75 0e <0f> 0b 0f 1f 40 00 eb fe 66 0f 1f 44 00 00 48 83 c4 10 48
> 89 d8
> [ 163.734501] RIP [<ffffffff824d6062>] phonet_pernet+0x182/0x1a0
> [ 163.734501] RSP <ffff8800674d5ca8>
> [ 163.861289] ---[ end trace fb5615826c548066 ]---
>
> It would appear that there's at least a one-off problem with the
> reverse iteration.
I don't see anything pointing to reverse iteration being the problem,
or even the call graph order.
What I see is phonet registering a netdevice notifier.
Then I see phonet register pernet_device operations.
Then I see phonet responding to all network devices no matter what
the network namespace is, and looking in the phonet net_generic data
for them.
Looking phonet does not implement any network devices phonet just has
per network device state. Which means phonet should be using
register_pernet_subsys and this roughly is the same issue we saw with
caif. Confusion of when you should use register_pernet_subsys and
register_pernet_device.
Issues that I see.
- There is a bunch of weird and stuff going on in phonet_net_exit
to handle the module unregister case where we don't synthesize
network device removal events. ( A generic bug we can fix).
- phonet should be using register_pernet_subsys instead of
register_pernet_device.
Patches to follow in a minute. Is there any chance you can reproduce
this so you can verify the problems don't continue to reproduce?
Eric
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists