lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20160810015900.GB3482@linux.vnet.ibm.com>
Date:	Tue, 9 Aug 2016 18:59:00 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Ding Tianhong <dingtianhong@...wei.com>
Cc:	josh@...htriplett.org, rostedt@...dmis.org,
	mathieu.desnoyers@...icios.com, jiangshanlai@...il.com,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] rcu: Fix soft lockup for rcu_nocb_kthread

On Wed, Aug 10, 2016 at 09:13:14AM +0800, Ding Tianhong wrote:
> On 2016/6/16 22:19, Paul E. McKenney wrote:
> > On Thu, Jun 16, 2016 at 02:09:47PM +0800, Ding Tianhong wrote:
> >> On 2016/6/15 23:49, Paul E. McKenney wrote:
> >>> On Wed, Jun 15, 2016 at 03:27:36PM +0800, Ding Tianhong wrote:
> >>>> I met this problem when using the Testgine to send package to ixgbevf nic
> >>>> by this steps:
> >>>> 1. Connect to ixgbevf, and set the speed to 10Gb/s, it could work fine.
> >>>> 2. Then use ifconfig to down the nic and up again, loop for several times.
> >>>> 3. The system panic by soft lockup.
> >>>
> >>> Good catch, queued for review and testing.  But what .config was your
> >>> kernel built with?
> >>>
> >>
> >> I use the redhat7.1 defconfig to build my kernel, and the RCU config is this:
> >>  120 #
> >>  121 # RCU Subsystem
> >>  122 #
> >>  123 CONFIG_TREE_RCU=y
> >>  124 # CONFIG_PREEMPT_RCU is not set
> >>  125 CONFIG_RCU_STALL_COMMON=y
> >>  126 CONFIG_CONTEXT_TRACKING=y
> >>  127 CONFIG_RCU_USER_QS=y
> >>  128 # CONFIG_CONTEXT_TRACKING_FORCE is not set
> >>  129 CONFIG_RCU_FANOUT=64
> >>  130 CONFIG_RCU_FANOUT_LEAF=16
> >>  131 # CONFIG_RCU_FANOUT_EXACT is not set
> >>  132 # CONFIG_RCU_FAST_NO_HZ is not set
> >>  133 # CONFIG_TREE_RCU_TRACE is not set
> >>  134 CONFIG_RCU_NOCB_CPU=y
> >>  135 CONFIG_RCU_NOCB_CPU_ALL=y
> >>  136 CONFIG_BUILD_BIN2C=y
> > 
> > Thank you!  You were running with preemption disabled, so your system
> > would indeed be very susceptible to this problem.
> > 
> >>> Also, I did tweak both the commit log and the patch.  Your cond_resched()
> >>> would prevent soft lockups, but not RCU stalls, so I substituted
> >>> cond_resched_rcu_qs().  Please let me know if either of those changes
> >>> causes problems at your end.
> >>
> >> Looks fine to me, I will apply this to my branch and test it, thanks.
> > 
> > Please let me know how it goes!
> > 
> > 							Thanx, Paul
> > 
> 
> Hi Paul:
> 
> It has been a long time after applying this patch, and didn't found any problem, I believe this patch is fine, thanks.

Very good!  I will push this one upstream during the next merge window.

							Thanx, Paul

> Ding
> 
> >> Ding
> >>
> >>>
> >>> 							Thanx, Paul
> >>>
> >>> ------------------------------------------------------------------------
> >>>
> >>> commit c317cf19b34c0d2787b787c38bd2c8fe433215da
> >>> Author: Ding Tianhong <dingtianhong@...wei.com>
> >>> Date:   Wed Jun 15 15:27:36 2016 +0800
> >>>
> >>>     rcu: Fix soft lockup for rcu_nocb_kthread
> >>>     
> >>>     Carrying out the following steps results in a softlockup in the
> >>>     RCU callback-offload (rcuo) kthreads:
> >>>     
> >>>     1. Connect to ixgbevf, and set the speed to 10Gb/s.
> >>>     2. Use ifconfig to bring the nic up and down repeatedly.
> >>>     
> >>>     [  317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
> >>>     [  368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
> >>>     [  368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> >>>     [  368.106005] task: ffff88057dd8a220 ti: ffff88057dd9c000 task.ti: ffff88057dd9c000
> >>>     [  368.106005] RIP: 0010:[<ffffffff81579e04>]  [<ffffffff81579e04>] fib_table_lookup+0x14/0x390
> >>>     [  368.106005] RSP: 0018:ffff88061fc83ce8  EFLAGS: 00000286
> >>>     [  368.106005] RAX: 0000000000000001 RBX: 00000000020155c0 RCX: 0000000000000001
> >>>     [  368.106005] RDX: ffff88061fc83d50 RSI: ffff88061fc83d70 RDI: ffff880036d11a00
> >>>     [  368.106005] RBP: ffff88061fc83d08 R08: 0000000000000001 R09: 0000000000000000
> >>>     [  368.106005] R10: ffff880036d11a00 R11: ffffffff819e0900 R12: ffff88061fc83c58
> >>>     [  368.106005] R13: ffffffff816154dd R14: ffff88061fc83d08 R15: 00000000020155c0
> >>>     [  368.106005] FS:  0000000000000000(0000) GS:ffff88061fc80000(0000) knlGS:0000000000000000
> >>>     [  368.106005] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>     [  368.106005] CR2: 00007f8c2aee9c40 CR3: 000000057b222000 CR4: 00000000000407e0
> >>>     [  368.106005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>     [  368.106005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> >>>     [  368.106005] Stack:
> >>>     [  368.106005]  00000000010000c0 ffff88057b766000 ffff8802e380b000 ffff88057af03e00
> >>>     [  368.106005]  ffff88061fc83dc0 ffffffff815349a6 ffff88061fc83d40 ffffffff814ee146
> >>>     [  368.106005]  ffff8802e380af00 00000000e380af00 ffffffff819e0900 020155c0010000c0
> >>>     [  368.106005] Call Trace:
> >>>     [  368.106005]  <IRQ>
> >>>     [  368.106005]
> >>>     [  368.106005]  [<ffffffff815349a6>] ip_route_input_noref+0x516/0xbd0
> >>>     [  368.106005]  [<ffffffff814ee146>] ? skb_release_data+0xd6/0x110
> >>>     [  368.106005]  [<ffffffff814ee20a>] ? kfree_skb+0x3a/0xa0
> >>>     [  368.106005]  [<ffffffff8153698f>] ip_rcv_finish+0x29f/0x350
> >>>     [  368.106005]  [<ffffffff81537034>] ip_rcv+0x234/0x380
> >>>     [  368.106005]  [<ffffffff814fd656>] __netif_receive_skb_core+0x676/0x870
> >>>     [  368.106005]  [<ffffffff814fd868>] __netif_receive_skb+0x18/0x60
> >>>     [  368.106005]  [<ffffffff814fe4de>] process_backlog+0xae/0x180
> >>>     [  368.106005]  [<ffffffff814fdcb2>] net_rx_action+0x152/0x240
> >>>     [  368.106005]  [<ffffffff81077b3f>] __do_softirq+0xef/0x280
> >>>     [  368.106005]  [<ffffffff8161619c>] call_softirq+0x1c/0x30
> >>>     [  368.106005]  <EOI>
> >>>     [  368.106005]
> >>>     [  368.106005]  [<ffffffff81015d95>] do_softirq+0x65/0xa0
> >>>     [  368.106005]  [<ffffffff81077174>] local_bh_enable+0x94/0xa0
> >>>     [  368.106005]  [<ffffffff81114922>] rcu_nocb_kthread+0x232/0x370
> >>>     [  368.106005]  [<ffffffff81098250>] ? wake_up_bit+0x30/0x30
> >>>     [  368.106005]  [<ffffffff811146f0>] ? rcu_start_gp+0x40/0x40
> >>>     [  368.106005]  [<ffffffff8109728f>] kthread+0xcf/0xe0
> >>>     [  368.106005]  [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140
> >>>     [  368.106005]  [<ffffffff816147d8>] ret_from_fork+0x58/0x90
> >>>     [  368.106005]  [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140
> >>>     
> >>>     ==================================cut here==============================
> >>>     
> >>>     It turns out that the rcuos callback-offload kthread is busy processing
> >>>     a very large quantity of RCU callbacks, and it is not reliquishing the
> >>>     CPU while doing so.  This commit therefore adds an cond_resched_rcu_qs()
> >>>     within the loop to allow other tasks to run.
> >>>     
> >>>     Signed-off-by: Ding Tianhong <dingtianhong@...wei.com>
> >>>     [ paulmck: Substituted cond_resched_rcu_qs for cond_resched. ]
> >>>     Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
> >>>
> >>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> >>> index 0082fce402a0..85c5a883c6e3 100644
> >>> --- a/kernel/rcu/tree_plugin.h
> >>> +++ b/kernel/rcu/tree_plugin.h
> >>> @@ -2173,6 +2173,7 @@ static int rcu_nocb_kthread(void *arg)
> >>>  				cl++;
> >>>  			c++;
> >>>  			local_bh_enable();
> >>> +			cond_resched_rcu_qs();
> >>>  			list = next;
> >>>  		}
> >>>  		trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1);
> >>>
> >>>
> >>> .
> >>>
> >>
> >>
> > 
> > 
> > .
> > 
> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ