linux-kernel - Re: RCU stall

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20160324204658.GB4287@linux.vnet.ibm.com>
Date:	Thu, 24 Mar 2016 13:46:58 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Bart Van Assche <bart.vanassche@...disk.com>
Cc:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: RCU stall

On Thu, Mar 24, 2016 at 01:24:02PM -0700, Bart Van Assche wrote:
> On 03/22/2016 07:29 PM, Paul E. McKenney wrote:
> >Note that a soft lockup triggered at 10509.568010, well before the RCU
> >CPU stall warning..  And you have a second soft lockup at 10537.567212,
> >with the same funtion scsi_request_fn() at the top of the stack in both
> >stack traces.  That function has a nice big "for (;;)" loop that does
> >not appear to have any iteration-limiting mechanism.
> 
> Hello Paul,
> 
> Your feedback is really appreciated. I have started with testing the
> four attached patches. With the tests I ran so far these patches
> were sufficient to avoid any soft lockup or RCU stall complaints. I
> will submit these patches to the appropriate maintainers once I have
> finished testing these patches.

Hello, Bart,

Glad it helped!

The patches look sane to me, but that should be taken with a large
quantity of salt given that I know very little about Linux's storage
subsystem.

							Thanx, Paul

> Bart.

> >From 0f72b28329342346980ae99c69d19b7adb0123bc Mon Sep 17 00:00:00 2001
> From: Bart Van Assche <bart.vanassche@...disk.com>
> Date: Thu, 24 Mar 2016 11:01:14 -0700
> Subject: [PATCH 1/4] IB/cm: Fix a recently introduced locking bug
> 
> ib_cm_notify() can be called from interrupt context. Hence do not
> reenable interrupts unconditionally in cm_establish().
> 
> This patch avoids that lockdep reports the following warning:
> 
> WARNING: CPU: 0 PID: 23317 at kernel/locking/lockdep.c:2624 trace _hardirqs_on_caller+0x112/0x1b0
> DEBUG_LOCKS_WARN_ON(current->hardirq_context)
> Call Trace:
>  <IRQ>  [<ffffffff812bd0e5>] dump_stack+0x67/0x92
>  [<ffffffff81056f21>] __warn+0xc1/0xe0
>  [<ffffffff81056f8a>] warn_slowpath_fmt+0x4a/0x50
>  [<ffffffff810a5932>] trace_hardirqs_on_caller+0x112/0x1b0
>  [<ffffffff810a59dd>] trace_hardirqs_on+0xd/0x10
>  [<ffffffff815992c7>] _raw_spin_unlock_irq+0x27/0x40
>  [<ffffffffa0382e9c>] ib_cm_notify+0x25c/0x290 [ib_cm]
>  [<ffffffffa068fbc1>] srpt_qp_event+0xa1/0xf0 [ib_srpt]
>  [<ffffffffa04efb97>] mlx4_ib_qp_event+0x67/0xd0 [mlx4_ib]
>  [<ffffffffa034ec0a>] mlx4_qp_event+0x5a/0xc0 [mlx4_core]
>  [<ffffffffa03365f8>] mlx4_eq_int+0x3d8/0xcf0 [mlx4_core]
>  [<ffffffffa0336f9c>] mlx4_msi_x_interrupt+0xc/0x20 [mlx4_core]
>  [<ffffffff810b0914>] handle_irq_event_percpu+0x64/0x100
>  [<ffffffff810b09e4>] handle_irq_event+0x34/0x60
>  [<ffffffff810b3a6a>] handle_edge_irq+0x6a/0x150
>  [<ffffffff8101ad05>] handle_irq+0x15/0x20
>  [<ffffffff8101a66c>] do_IRQ+0x5c/0x110
>  [<ffffffff8159a2c9>] common_interrupt+0x89/0x89
>  [<ffffffff81297a17>] blk_run_queue_async+0x37/0x40
>  [<ffffffffa0163e53>] rq_completed+0x43/0x70 [dm_mod]
>  [<ffffffffa0164896>] dm_softirq_done+0x176/0x280 [dm_mod]
>  [<ffffffff812a26c2>] blk_done_softirq+0x52/0x90
>  [<ffffffff8105bc1f>] __do_softirq+0x10f/0x230
>  [<ffffffff8105bec8>] irq_exit+0xa8/0xb0
>  [<ffffffff8103653e>] smp_trace_call_function_single_interrupt+0x2e/0x30
>  [<ffffffff81036549>] smp_call_function_single_interrupt+0x9/0x10
>  [<ffffffff8159a959>] call_function_single_interrupt+0x89/0x90
>  <EOI>
> 
> Fixes: commit be4b499323bf ("IB/cm: Do not queue work to a device that's going away")
> Signed-off-by: Bart Van Assche <bart.vanassche@...disk.com>
> Cc: Erez Shitrit <erezsh@...lanox.com>
> Cc: Doug Ledford <dledford@...hat.com>
> Cc: stable <stable@...r.kernel.org> # v4.2+
> ---
>  drivers/infiniband/core/cm.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> index 1d92e09..c995255 100644
> --- a/drivers/infiniband/core/cm.c
> +++ b/drivers/infiniband/core/cm.c
> @@ -3452,14 +3452,14 @@ static int cm_establish(struct ib_cm_id *cm_id)
>  	work->cm_event.event = IB_CM_USER_ESTABLISHED;
> 
>  	/* Check if the device started its remove_one */
> -	spin_lock_irq(&cm.lock);
> +	spin_lock_irqsave(&cm.lock, flags);
>  	if (!cm_dev->going_down) {
>  		queue_delayed_work(cm.wq, &work->work, 0);
>  	} else {
>  		kfree(work);
>  		ret = -ENODEV;
>  	}
> -	spin_unlock_irq(&cm.lock);
> +	spin_unlock_irqrestore(&cm.lock, flags);
> 
>  out:
>  	return ret;
> -- 
> 2.7.3
> 

> >From 5fd6aedadc04d102cd261507ff61071071455fb6 Mon Sep 17 00:00:00 2001
> From: Bart Van Assche <bart.vanassche@...disk.com>
> Date: Thu, 24 Mar 2016 12:04:01 -0700
> Subject: [PATCH 2/4] kernel/kthread.c: Avoid CPU lockups
> 
> Avoid that complaints similar to the one below are reported against
> a debug kernel:
> 
> NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kdmwork-25 4:2:23313]
> irq event stamp: 16320042
> hardirqs last  enabled at (16320041): [<ffffffff815992c7>] _raw_spin_unlock_irq+0x27/0x40
> hardirqs last disabled at (16320042): [<ffff8803ffbe3cd8>] 0xffff8803ffbe3cd8
> softirqs last  enabled at (16319960): [<ffffffff8105bcdb>] __do_softirq+0x1cb/0x230
> softirqs last disabled at (16319715): [<ffffffff8105bec8>] irq_exit+0xa8/0xb0
> CPU: 1 PID: 23313 Comm: kdmwork-254:2
> RIP: 0010:[<ffffffff815992cf>]  [<ffffffff815992cf>] _raw_spin_unlock_irq+0x2f/0x40
> Call Trace:
>  [<ffffffff813f702f>] scsi_request_fn+0x11f/0x630
>  [<ffffffff812977fe>] __blk_run_queue+0x2e/0x40
>  [<ffffffff81297065>] __elv_add_request+0x75/0x1f0
>  [<ffffffff8129ba91>] blk_insert_cloned_request+0x101/0x190
>  [<ffffffffa0189f1a>] map_request+0x16a/0x1b0 [dm_mod]
>  [<ffffffffa0189f7d>] map_tio_request+0x1d/0x40 [dm_mod]
>  [<ffffffff81077812>] kthread_worker_fn+0x82/0x1a0
>  [<ffffffff8107771a>] kthread+0xea/0x100
>  [<ffffffff81599b72>] ret_from_fork+0x22/0x40
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@...disk.com>
> ---
>  kernel/kthread.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index 9ff173d..516ca6b 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -593,6 +593,7 @@ repeat:
>  	if (work) {
>  		__set_current_state(TASK_RUNNING);
>  		work->func(work);
> +		cond_resched_rcu_qs();
>  	} else if (!freezing(current))
>  		schedule();
> 
> -- 
> 2.7.3
> 

> >From 44985e4b2f3124bf87e84a4c7572efa00ac28d3b Mon Sep 17 00:00:00 2001
> From: Bart Van Assche <bart.vanassche@...disk.com>
> Date: Wed, 23 Mar 2016 17:14:57 -0700
> Subject: [PATCH 3/4] block: Limit work processed in softirq context
> 
> Avoid that complaints like the one below are reported against a
> debug kernel:
> 
> NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [disk11_0:2708]
> irq event stamp: 17120809
> hardirqs last  enabled at (17120808): [<ffffffff81599191>] _raw_spin_unlock_irqrestore+0x31/0x50
> hardirqs last disabled at (17120809): [<ffff88046f223bd0>] 0xffff88046f223bd0
> softirqs last  enabled at (17120794): [<ffffffffa060aa67>] scst_check_blocked_dev+0x77/0x1c0 [scst]
> softirqs last disabled at (17120795): [<ffffffff8159acbc>] do_softirq_own_stack+0x1c/0x30
> RIP: 0010:[<ffffffff81599193>]  [<ffffffff81599193>] _raw_spin_unlock_irqrestore+0x33/0x50
> Call Trace:
>  <IRQ>
>  [<ffffffff81171450>] free_debug_processing+0x270/0x3a0
>  [<ffffffff8117277a>] __slab_free+0x17a/0x2c0
>  [<ffffffff81172a74>] kmem_cache_free+0x1b4/0x1d0
>  [<ffffffff8111a6c2>] mempool_free_slab+0x12/0x20
>  [<ffffffff8111a846>] mempool_free+0x26/0x80
>  [<ffffffff81294cb9>] bio_free+0x49/0x60
>  [<ffffffff81294cee>] bio_put+0x1e/0x30
>  [<ffffffffa0199d31>] end_clone_bio+0x21/0x70 [dm_mod]
>  [<ffffffff81294d52>] bio_endio+0x52/0x60
>  [<ffffffff8129aaec>] blk_update_request+0x7c/0x2a0
>  [<ffffffff813f4a8e>] scsi_end_request+0x2e/0x1d0
>  [<ffffffff813f7674>] scsi_io_completion+0xb4/0x610
>  [<ffffffff813ee79a>] scsi_finish_command+0xca/0x120
>  [<ffffffff813f6ef0>] scsi_softirq_done+0x120/0x140
>  [<ffffffff812a26e6>] blk_done_softirq+0x76/0x90
>  [<ffffffff8105bc1f>] __do_softirq+0x10f/0x230
>  [<ffffffff8159acbc>] do_softirq_own_stack+0x1c/0x30
>  <EOI>
> ---
>  block/blk-softirq.c | 24 +++++++++++++++---------
>  1 file changed, 15 insertions(+), 9 deletions(-)
> 
> diff --git a/block/blk-softirq.c b/block/blk-softirq.c
> index 53b1737..d739949 100644
> --- a/block/blk-softirq.c
> +++ b/block/blk-softirq.c
> @@ -20,20 +20,26 @@ static DEFINE_PER_CPU(struct list_head, blk_cpu_done);
>   */
>  static void blk_done_softirq(struct softirq_action *h)
>  {
> -	struct list_head *cpu_list, local_list;
> +	struct list_head *cpu_list = this_cpu_ptr(&blk_cpu_done);
> +	struct request *rq;
> +	int i;
> 
>  	local_irq_disable();
> -	cpu_list = this_cpu_ptr(&blk_cpu_done);
> -	list_replace_init(cpu_list, &local_list);
> -	local_irq_enable();
> -
> -	while (!list_empty(&local_list)) {
> -		struct request *rq;
> -
> -		rq = list_entry(local_list.next, struct request, ipi_list);
> +	for (i = 64; i > 0; i--) {
> +		if (list_empty(cpu_list))
> +			goto done;
> +		rq = list_first_entry(cpu_list, struct request, ipi_list);
>  		list_del_init(&rq->ipi_list);
> +		local_irq_enable();
> +
>  		rq->q->softirq_done_fn(rq);
> +
> +		local_irq_disable();
>  	}
> +	raise_softirq_irqoff(BLOCK_SOFTIRQ);
> +
> +done:
> +	local_irq_enable();
>  }
> 
>  #ifdef CONFIG_SMP
> -- 
> 2.7.3
> 

> >From a73fdf710b98922fa02d464af96b499ea2740832 Mon Sep 17 00:00:00 2001
> From: Bart Van Assche <bart.vanassche@...disk.com>
> Date: Wed, 23 Mar 2016 14:38:13 -0700
> Subject: [PATCH 4/4] Avoid that I/O completion processing triggers lockup
>  complaints
> 
> Avoid that I/O completion processing triggers the following complaints
> if kernel debug options that slow down the kernel significantly are
> enabled:
> 
> NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [kdmwork-254:2:358]
> irq event stamp: 15233868
> hardirqs last  enabled at (15233867): [<ffffffff81578167>] _raw_spin_unlock_irq+0x27/0x40
> hardirqs last disabled at (15233868): [<ffffffff81579664>] apic_timer_interrupt+0x84/0x90
> softirqs last  enabled at (15233850): [<ffffffff8105a55b>] __do_softirq+0x1cb/0x230
> softirqs last disabled at (15233743): [<ffffffff8105a748>] irq_exit+0xa8/0xb0
> CPU: 3 PID: 358 Comm: kdmwork-254:2
> RIP: 0010:[<ffffffff8157816f>]  [<ffffffff8157816f>] _raw_spin_unlock_irq+0x2f/0x40
> Call Trace:
>  [<ffffffff813dbe28>] scsi_request_fn+0x118/0x600
>  [<ffffffff8128190e>] __blk_run_queue+0x2e/0x40
>  [<ffffffff81281175>] __elv_add_request+0x75/0x1f0
>  [<ffffffff81285b11>] blk_insert_cloned_request+0x101/0x190
>  [<ffffffffa0170eee>] map_request+0x18e/0x210 [dm_mod]
>  [<ffffffffa0170f8d>] map_tio_request+0x1d/0x40 [dm_mod]
>  [<ffffffff81075a0d>] kthread_worker_fn+0x7d/0x1a0
>  [<ffffffff8107591a>] kthread+0xea/0x100
>  [<ffffffff81578c7f>] ret_from_fork+0x3f/0x70
> 
> INFO: rcu_sched self-detected stall on CPU
>  3-...: (6497 ticks this GP) idle=fb9/140000000000002/0 softirq=2044956/2045037 fqs=5414
>   (t=6500 jiffies g=219289 c=219288 q=7233211)
> Task dump for CPU 3:
> kdmwork-254:2   R  running task        0   358      2 0x00000008
> Call Trace:
>  <IRQ>  [<ffffffff8108195f>] sched_show_task+0xbf/0x150
>  [<ffffffff81084742>] dump_cpu_task+0x32/0x40
>  [<ffffffff810b5e79>] rcu_dump_cpu_stacks+0x89/0xe0
>  [<ffffffff810b9999>] rcu_check_callbacks+0x439/0x730
>  [<ffffffff810bc1c4>] update_process_times+0x34/0x60
>  [<ffffffff810caaa0>] tick_sched_handle.isra.18+0x20/0x50
>  [<ffffffff810cb148>] tick_sched_timer+0x38/0x70
>  [<ffffffff810bc865>] __hrtimer_run_queues+0xa5/0x1c0
>  [<ffffffff810bcef6>] hrtimer_interrupt+0xa6/0x1b0
>  [<ffffffff81038ba3>] smp_trace_apic_timer_interrupt+0x63/0x90
>  [<ffffffff81038bd9>] smp_apic_timer_interrupt+0x9/0x10
>  [<ffffffff81579669>] apic_timer_interrupt+0x89/0x90
>  [<ffffffff811640d6>] __slab_free+0xc6/0x270
>  [<ffffffff811643d9>] kmem_cache_free+0x159/0x160
>  [<ffffffff811d0a12>] kiocb_free+0x32/0x40
>  [<ffffffff811d1b75>] aio_complete+0x1e5/0x3c0
>  [<ffffffff811bef45>] dio_complete+0x75/0x1d0
>  [<ffffffff811bf11a>] dio_bio_end_aio+0x7a/0x130
>  [<ffffffff812800da>] bio_endio+0x3a/0x60
>  [<ffffffff81284b7c>] blk_update_request+0x7c/0x2a0
>  [<ffffffffa016fcc1>] end_clone_bio+0x41/0x70 [dm_mod]
>  [<ffffffff812800da>] bio_endio+0x3a/0x60
>  [<ffffffff81284b7c>] blk_update_request+0x7c/0x2a0
>  [<ffffffff813da01e>] scsi_end_request+0x2e/0x1d0
>  [<ffffffff813dc444>] scsi_io_completion+0xb4/0x610
>  [<ffffffff813d3e0a>] scsi_finish_command+0xca/0x120
>  [<ffffffff813dbcf0>] scsi_softirq_done+0x120/0x140
>  [<ffffffff8128c742>] blk_done_softirq+0x72/0x90
>  [<ffffffff8105a49f>] __do_softirq+0x10f/0x230
>  [<ffffffff8105a748>] irq_exit+0xa8/0xb0
>  [<ffffffff810065a5>] do_IRQ+0x65/0x110
>  [<ffffffff815793c9>] common_interrupt+0x89/0x89
>  <EOI>
>  [<ffffffffa010f815>] __multipath_map.isra.16+0x145/0x260 [dm_multipath]
>  [<ffffffffa010f962>] multipath_map+0x12/0x20 [dm_multipath]
>  [<ffffffffa0170da3>] map_request+0x43/0x210 [dm_mod]
>  [<ffffffffa0170f8d>] map_tio_request+0x1d/0x40 [dm_mod]
>  [<ffffffff81075a0d>] kthread_worker_fn+0x7d/0x1a0
>  [<ffffffff8107591a>] kthread+0xea/0x100
>  [<ffffffff81578c7f>] ret_from_fork+0x3f/0x70
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@...disk.com>
> Cc: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
> ---
>  drivers/scsi/scsi_lib.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 8106515..8f264a0 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1770,13 +1770,14 @@ static void scsi_request_fn(struct request_queue *q)
>  	struct Scsi_Host *shost;
>  	struct scsi_cmnd *cmd;
>  	struct request *req;
> +	int i;
> 
>  	/*
> -	 * To start with, we keep looping until the queue is empty, or until
> -	 * the host is no longer able to accept any more requests.
> +	 * Loop until the queue is empty, until the host is no longer able to
> +	 * accept any more requests or until 64 requests have been processed.
>  	 */
>  	shost = sdev->host;
> -	for (;;) {
> +	for (i = 64; i > 0; i--) {
>  		int rtn;
>  		/*
>  		 * get next queueable request.  We do this early to make sure
> @@ -1861,6 +1862,9 @@ static void scsi_request_fn(struct request_queue *q)
>  		spin_lock_irq(q->queue_lock);
>  	}
> 
> +	if (unlikely(i == 0))
> +		blk_delay_queue(q, 0);
> +
>  	return;
> 
>   host_not_ready:
> -- 
> 2.7.3
>