[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <52A77784.1040309@ericsson.com>
Date: Tue, 10 Dec 2013 12:20:20 -0800
From: Jon Maloy <jon.maloy@...csson.com>
To: Pablo Neira Ayuso <pablo@...monks.org>, <netdev@...r.kernel.org>
CC: <davem@...emloft.net>, <allan.stephens@...driver.com>
Subject: Re: [PATCH net] net: tipc: fix possible CPU stall while removing
module
There is already a patch in the queue fixing this issue.
http://marc.info/?l=linux-netdev&m=138665890819879&w=2
Regards
///jon
On 12/10/2013 09:54 AM, Pablo Neira Ayuso wrote:
> The following stall is possible while removing the tipc module:
>
> [ 4244.091196] INFO: rcu_sched self-detected stall on CPU { 41} (t=30001 jiffies)
> [ 4244.091414] Pid: 5311, comm: rmmod Tainted: G O 3.4.51 #1
> [ 4244.091524] Call Trace:
> [ 4244.091618] <IRQ> [<ffffffff810b3741>] ? __rcu_pending+0x1a1/0x4d0
> [ 4244.091741] [<ffffffff81085030>] ? tick_nohz_handler+0xe0/0xe0
> [ 4244.091848] [<ffffffff810b3b18>] ? rcu_check_callbacks+0xa8/0x150
> [ 4244.091957] [<ffffffff81046f3f>] ? update_process_times+0x3f/0x80
> [ 4244.092065] [<ffffffff8108508b>] ? tick_sched_timer+0x5b/0xb0
> [ 4244.092172] [<ffffffff8105d967>] ? __run_hrtimer+0x77/0x1c0
> [ 4244.092278] [<ffffffff8105dd1f>] ? hrtimer_interrupt+0xef/0x270
> [ 4244.092386] [<ffffffff8106714d>] ? ttwu_do_wakeup+0x3d/0x100
> [ 4244.092494] [<ffffffff81020d43>] ? smp_apic_timer_interrupt+0x63/0xa0
> [ 4244.092605] [<ffffffff8167fc4a>] ? apic_timer_interrupt+0x6a/0x70
> [ 4244.092714] [<ffffffff81677985>] ? _raw_spin_lock_bh+0x25/0x30
> [ 4244.092820] [<ffffffff81677969>] ? _raw_spin_lock_bh+0x9/0x30
> [ 4244.092936] [<ffffffffa0333c45>] ? tipc_nodesub_unsubscribe+0x15/0x50 [tipc]
> [ 4244.093049] [<ffffffffa03303ea>] ? named_purge_publ+0x3a/0x90 [tipc]
> [ 4244.093158] [<ffffffff8103ee15>] ? __do_softirq+0xd5/0x1e0
> [ 4244.093266] [<ffffffffa032a25b>] ? process_signal_queue+0x7b/0xc0 [tipc]
> [ 4244.093376] [<ffffffff8103f41b>] ? tasklet_action+0xbb/0xd0
> [ 4244.093482] [<ffffffff8103edf1>] ? __do_softirq+0xb1/0x1e0
> [ 4244.093589] [<ffffffff8168059c>] ? call_softirq+0x1c/0x30
> [ 4244.093691] <EOI> [<ffffffff810041e5>] ? do_softirq+0x65/0xa0
> [ 4244.095460] [<ffffffff8103f834>] ? local_bh_enable_ip+0x94/0xa0
> [ 4244.095570] [<ffffffffa0332c93>] ? tipc_net_stop+0x73/0x90 [tipc]
> [ 4244.095679] [<ffffffffa0339d61>] ? tipc_exit+0x9/0x29 [tipc]
> [ 4244.095786] [<ffffffff8108e8bf>] ? sys_delete_module+0x1af/0x2b0
> [ 4244.095894] [<ffffffff81677d75>] ? page_fault+0x25/0x30
> [ 4244.095999] [<ffffffff8167f1b9>] ? system_call_fastpath+0x16/0x1b
>
> The two things that trigger this oops are related to tipc_net_stop(), they
> are:
>
> * tipc_net_stop() schedules the removal of the bearers via workqueue, which
> includes the removal of the packet handler for the TIPC protocol family. So,
> we have no time guarantees when the packet handler is removed.
>
> * tipc_net_stop() cleans up the the tipc_node_list, so it releases all
> tipc_node structures. However, the tipc_node_subscr structure still holds
> a reference to the tipc_node structures, which is now invalid.
>
> After leaving tipc_net_stop, BH are enabled again. If we have a TIPC
> message that is pending to be handled (in the softirq path), the packet
> handler will likely be still there, so it passes the packet to tipc_recv_msg().
> In that path, if the TIPC message announces a new service publication,
> named_purge_publ() is invoked, then tipc_nodesub_unsubscribe() to remove
> the node subscription happens. This function tries to get the node lock, but
> that structure was already released by tipc_net_stop(), so it stalls.
>
> The proposed fix removes the bearers first so we make sure we get no more
> TIPC packets running through the input path, accessing the name-service
> base in inconsistent state (as tipc_node structures are not there anymore).
>
> Signed-off-by: Pablo Neira Ayuso <pablo@...monks.org>
> ---
> net/tipc/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/tipc/core.c b/net/tipc/core.c
> index fd4eeea..7373fd9 100644
> --- a/net/tipc/core.c
> +++ b/net/tipc/core.c
> @@ -81,9 +81,9 @@ struct sk_buff *tipc_buf_acquire(u32 size)
> */
> static void tipc_core_stop_net(void)
> {
> - tipc_net_stop();
> tipc_eth_media_stop();
> tipc_ib_media_stop();
> + tipc_net_stop();
> }
>
> /**
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists