linux-kernel - non-response to IPI kills wrong(?) task, confuses logs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20160921040353.GA25369@openwall.com>
Date:   Wed, 21 Sep 2016 06:03:53 +0200
From:   Solar Designer <solar@...nwall.com>
To:     linux-kernel@...r.kernel.org
Cc:     Adam Zabrocki <pi3@....com.pl>
Subject: non-response to IPI kills wrong(?) task, confuses logs

Hi,

Adam Zabrocki, CC'ed here, informed me of the following:

There is an undesirable situation in SMP Linux machines when sending an
IPI via the smp_call_function_single() API:

/*
 * smp_call_function_single - Run a function on a specific CPU
 * @func: The function to run. This must be fast and non-blocking.
 * @info: An arbitrary pointer to pass to the function.
 * @wait: If true, wait until function has completed on other CPUs.
 *
 * Returns 0 on success, else a negative status code.
 */
int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
                             int wait)

It runs "func" on a specific CPU, and if the "wait" flag is true, it
waits until "func" has completed work before returning to the caller.
This is a useful feature used to control when it is safe to continue and
potentially parse data generated by another CPU.

Unfortunately, Linux can kill the task waiting on its
smp_call_function_single().  The NMI watchdog is detecting a "soft
lockup" on the CPU which calls smp_call_function_single() with the
"wait" flag set.  In practice it is not always the desirable behavior.
It is possible that the target CPU of the IPI was non-responsive and
would never (or not in time) pick up the IPI sent to it.  From the
perspective of the task sending the IPI, the target CPU never finished
the work, so smp_call_function_single() continues its busy wait loop.
In the meantime, the NMI watchdog can kill the busy-waiting task instead
of somehow "unlocking" the CPU that did not handle the IPI in time.
This is often visible in the logs like e.g.:

Jul 18 00:43:06 ubuntu kernel: [1578895.078397] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 25s! [thermald:714]
...
Jul 18 00:43:06 ubuntu kernel: [1578895.078470] CPU: 1 PID: 714 Comm: thermald Tainted: G        W  OEL  4.2.0-30-generic #36-Ubuntu
Jul 18 00:43:06 ubuntu kernel: [1578895.078472] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00
07/02/2015
Jul 18 00:43:06 ubuntu kernel: [1578895.078475] task: ffff88003776e040 ti: ffff880038390000 task.ti: ffff880038390000
Jul 18 00:43:06 ubuntu kernel: [1578895.078477] RIP: 0010:[<ffffffff810fc9d8>]  [<ffffffff810fc9d8>] smp_call_function_single+0xd8/0x130
Jul 18 00:43:06 ubuntu kernel: [1578895.078486] RSP: 0018:ffff880038393c98  EFLAGS: 00000202
Jul 18 00:43:06 ubuntu kernel: [1578895.078487] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
Jul 18 00:43:06 ubuntu kernel: [1578895.078489] RDX: 0000000000000001 RSI: 0000000000000100 RDI: 0000000000000292
Jul 18 00:43:06 ubuntu kernel: [1578895.078490] RBP: ffff880038393ce8 R08: 0000000000000000 R09: 0000000000000001
Jul 18 00:43:06 ubuntu kernel: [1578895.078492] R10: ffff8800175de000 R11: ffff880035123180 R12: ffff880034c1c2d0
Jul 18 00:43:06 ubuntu kernel: [1578895.078493] R13: 0000000000000292 R14: 0000000000000292 R15: ffff880038393c38
Jul 18 00:43:06 ubuntu kernel: [1578895.078495] FS:  00007fa118895700(0000) GS:ffff88003d640000(0000) knlGS:0000000000000000
Jul 18 00:43:06 ubuntu kernel: [1578895.078497] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 18 00:43:06 ubuntu kernel: [1578895.078499] CR2: 00007fabda352000 CR3: 0000000034c22000 CR4: 00000000001406e0
Jul 18 00:43:06 ubuntu kernel: [1578895.078508] Stack:
Jul 18 00:43:06 ubuntu kernel: [1578895.078510]  0000000000000000 ffffffff8121e2c4 ffff880038393cd8 0000000000000000
Jul 18 00:43:06 ubuntu kernel: [1578895.078513]  ffffffff813f86b0 ffff880038393d00 0000000000000003 000000005a1685d7
Jul 18 00:43:06 ubuntu kernel: [1578895.078515]  ffff880038393d4c ffff880038393d48 ffff880038393d38 ffffffff813f884d
Jul 18 00:43:06 ubuntu kernel: [1578895.078518] Call Trace:
Jul 18 00:43:06 ubuntu kernel: [1578895.078526]  [<ffffffff8121e2c4>] ? mntput+0x24/0x40
Jul 18 00:43:06 ubuntu kernel: [1578895.078532]  [<ffffffff813f86b0>] ? ucs2_strncmp+0x50/0x50
Jul 18 00:43:06 ubuntu kernel: [1578895.078642]  [<ffffffff813f884d>] rdmsr_on_cpu+0x5d/0x90
Jul 18 00:43:06 ubuntu kernel: [1578895.078650]  [<ffffffffc030b2e7>] show_temp+0xa7/0xe0 [coretemp]
Jul 18 00:43:06 ubuntu kernel: [1578895.078656]  [<ffffffff8151d1d0>] dev_attr_show+0x20/0x50
Jul 18 00:43:06 ubuntu kernel: [1578895.078662]  [<ffffffff817f0c16>] ? mutex_lock+0x16/0x40
Jul 18 00:43:06 ubuntu kernel: [1578895.078667]  [<ffffffff8127a3dc>] sysfs_kf_seq_show+0xbc/0x130
Jul 18 00:43:06 ubuntu kernel: [1578895.078669]  [<ffffffff81278c13>] kernfs_seq_show+0x23/0x30
Jul 18 00:43:06 ubuntu kernel: [1578895.078672]  [<ffffffff81221c9c>] seq_read+0xec/0x390
Jul 18 00:43:06 ubuntu kernel: [1578895.078675]  [<ffffffff8127953a>] kernfs_fop_read+0x10a/0x160
Jul 18 00:43:06 ubuntu kernel: [1578895.078679]  [<ffffffff81326a00>] ? security_file_permission+0xa0/0xc0
Jul 18 00:43:06 ubuntu kernel: [1578895.078682]  [<ffffffff811fd5b8>] __vfs_read+0x18/0x40
Jul 18 00:43:06 ubuntu kernel: [1578895.078685]  [<ffffffff811fdb87>] vfs_read+0x87/0x130
Jul 18 00:43:06 ubuntu kernel: [1578895.078687]  [<ffffffff811fe905>] SyS_read+0x55/0xc0
Jul 18 00:43:07 ubuntu kernel: [1578895.078690]  [<ffffffff817f2d72>] entry_SYSCALL_64_fastpath+0x16/0x75
Jul 18 00:43:07 ubuntu kernel: [1578895.078692] Code: 25 28 00 00 00 75 70 48 83 c4 40 5b 41 5c 5d c3 48 8d 75 c8 48 89 d1 89 df 4c 89 e2 e8 12 fe ff ff 8b 55 e0 83 e2 01 74 cf f3 90 <8b> 55 e0 83 e2 01 75 f6 eb c3 8b 05 f0 6c df 00 85 c0 75 85 80

In this specific case, the "thermald" process, which was trying to send
an IPI via smp_call_function_single(), is not the most problematic one,
but it has been killed.

This problem can manifest itself in several different ways.  Imagine the
following situation:

    CPU0:              CPU1:
  -- work --         -- work --
  -- work --      -- STUCK / bug --
  -- work --           ...
-- send IPI --         ...
-- busy loop --        ...
      ...              ...
-- killed by
   NMI watchdog --     ...
 -- new work --        ...
 -- new work --      -- killed by
                        watchdog --

In this situation you will see two task dumps in the logs: first for the
smp_call_function_single() busy loop task, and then for the IPI target
task, with different "stuck" time for the CPU/core, e.g.:

Jul 12 14:32:32 ubuntu kernel: [1110493.211223] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/u128:1:23379]
...
Jul 12 14:32:32 ubuntu kernel: [1110502.681253] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 172s! [swapper/0:0]

(As an aside, it is unclear to us why both were syslogged with exactly
the same timestamp whereas their kernel's timestamps differ by 9 seconds.
We didn't notice this detail until much later.)

It is also possible to "mask" the problematic task/CPU in the following
scenario:

    CPU0:              CPU1:
  -- work --         -- work --
  -- work --      -- STUCK / bug --
  -- work --           ...
-- send IPI --         ...
-- busy loop --        ...
      ...              ...
-- killed by
   NMI watchdog --     ...
 -- new work --        ...
 -- new work --    -- CPU unstuck --
 -- new work --    -- continue work --

In that case, before the watchdog killed the problematic task on CPU1,
it "recovered" on its own and continued normal work.  But at the same
time the NMI watchdog killed the task which sent the IPI and was waiting
for it to finish.  In that case, in the logs you will only see a dump of
the task that was busy-waiting, and nothing about the original culprit.

In the source code:

270 int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
271                              int wait)
272 {
273         struct call_single_data *csd;
274         struct call_single_data csd_stack = { .flags = CSD_FLAG_LOCK | CSD_FLAG_SYNCHRONOUS };
275         int this_cpu;
276         int err;
277 
278         /*
279          * prevent preemption and reschedule on another processor,
280          * as well as CPU removal
281          */
282         this_cpu = get_cpu();
283 
284         /*
285          * Can deadlock when called with interrupts disabled.
286          * We allow cpu's that are not yet online though, as no one else can
287          * send smp call function interrupt to this cpu and as such deadlocks
288          * can't happen.
289          */
290         WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
291                      && !oops_in_progress);
292 
293         csd = &csd_stack;
294         if (!wait) {
295                 csd = this_cpu_ptr(&csd_data);
296                 csd_lock(csd);
297         }
298 
299         err = generic_exec_single(cpu, csd, func, info);
300 
301         if (wait)
302                 csd_lock_wait(csd);
303 
304         put_cpu();
305 
306         return err;
307 }
308 EXPORT_SYMBOL(smp_call_function_single);

csd_lock_wait() is defined as:

108 static __always_inline void csd_lock_wait(struct call_single_data *csd)
109 {
110         smp_cond_acquire(!(csd->flags & CSD_FLAG_LOCK));
111 }

Next:

318 #define smp_cond_acquire(cond)  do {            \
319         while (!(cond))                         \
320                 cpu_relax();                    \
321         smp_rmb(); /* ctrl + rmb := acquire */  \
322 } while (0)

so it is calling cpu_relax() until the condition is met.
On the x86 architecture, it's the following assembly stub:

564         asm volatile("rep; nop" ::: "memory");

smp_call_function_single()'s use of get_cpu() internally prevents the
core from being preemptive.

During testing, 20 Linux VMs with different distributions / kernel
versions were run under the VmWare hypervisor.  The most common "trouble
maker" tasks were:

- swapper:

Jul 12 14:32:32 ubuntu kernel: [1110502.682439] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W  OEL  4.2.0-30-generic #36-Ubuntu
Jul 12 14:32:32 ubuntu kernel: [1110502.682440] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00
07/02/2015
Jul 12 14:32:32 ubuntu kernel: [1110502.682443] task: ffffffff81c13500 ti: ffffffff81c00000 task.ti: ffffffff81c00000
Jul 12 14:32:32 ubuntu kernel: [1110502.682445] RIP: 0010:[<ffffffff817f2785>]  [<ffffffff817f2785>] _raw_spin_unlock_irqrestore+0x15/0x20
Jul 12 14:32:32 ubuntu kernel: [1110502.682455] RSP: 0018:ffff88003d603da8  EFLAGS: 00000296
Jul 12 14:32:32 ubuntu kernel: [1110502.682456] RAX: 0000000000000001 RBX: ffff880035cdc2e0 RCX: 0000000000000002
Jul 12 14:32:32 ubuntu kernel: [1110502.682458] RDX: 0000000000002090 RSI: 0000000000000296 RDI: 0000000000000296
Jul 12 14:32:32 ubuntu kernel: [1110502.682459] RBP: ffff88003d603da8 R08: 0000000000000296 R09: 000000000000080a
Jul 12 14:32:32 ubuntu kernel: [1110502.682461] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88003d603d18
Jul 12 14:32:32 ubuntu kernel: [1110502.682462] R13: ffffffff817f3bdb R14: ffff88003d603da8 R15: ffff880035cdc000
Jul 12 14:32:32 ubuntu kernel: [1110502.682464] FS:  0000000000000000(0000) GS:ffff88003d600000(0000) knlGS:0000000000000000
Jul 12 14:32:32 ubuntu kernel: [1110502.682466] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jul 12 14:32:32 ubuntu kernel: [1110502.682468] CR2: 00007f847a0db000 CR3: 0000000035598000 CR4: 00000000001406f0
Jul 12 14:32:32 ubuntu kernel: [1110502.682609] Stack:
Jul 12 14:32:32 ubuntu kernel: [1110502.682611]  ffff88003d603df8 ffffffff81619ddd ffff880035a43200 0000000000000296
Jul 12 14:32:32 ubuntu kernel: [1110502.682615]  ffffffff81d2a7c0 ffff880035cdc000 ffff880035cdc0d0 0000000000000100
Jul 12 14:32:32 ubuntu kernel: [1110502.682617]  ffffffff815dd7b0 ffff880035cdc000 ffff88003d603e38 ffffffff815dd6ab
Jul 12 14:32:32 ubuntu kernel: [1110502.682619] Call Trace:
Jul 12 14:32:32 ubuntu kernel: [1110502.682621]  <IRQ>
Jul 12 14:32:32 ubuntu kernel: [1110502.682629]  [<ffffffff81619ddd>] uhci_hub_status_data+0x6d/0x270
Jul 12 14:32:32 ubuntu kernel: [1110502.682636]  [<ffffffff815dd7b0>] ? usb_hcd_poll_rh_status+0x160/0x160
Jul 12 14:32:32 ubuntu kernel: [1110502.682638]  [<ffffffff815dd6ab>] usb_hcd_poll_rh_status+0x5b/0x160
Jul 12 14:32:32 ubuntu kernel: [1110502.682641]  [<ffffffff815dd7b0>] ? usb_hcd_poll_rh_status+0x160/0x160
Jul 12 14:32:32 ubuntu kernel: [1110502.682644]  [<ffffffff815dd7be>] rh_timer_func+0xe/0x10
Jul 12 14:32:32 ubuntu kernel: [1110502.682649]  [<ffffffff810e59d9>] call_timer_fn+0x39/0xf0
Jul 12 14:32:32 ubuntu kernel: [1110502.682652]  [<ffffffff815dd7b0>] ? usb_hcd_poll_rh_status+0x160/0x160
Jul 12 14:32:32 ubuntu kernel: [1110502.682655]  [<ffffffff810e6461>] run_timer_softirq+0x221/0x2d0
Jul 12 14:32:32 ubuntu kernel: [1110502.682659]  [<ffffffff8107fe46>] __do_softirq+0xf6/0x250
Jul 12 14:32:32 ubuntu kernel: [1110502.682662]  [<ffffffff81080113>] irq_exit+0xa3/0xb0
Jul 12 14:32:32 ubuntu kernel: [1110502.682667]  [<ffffffff817f5a46>] smp_apic_timer_interrupt+0x46/0x60
Jul 12 14:32:32 ubuntu kernel: [1110502.682670]  [<ffffffff817f3bdb>] apic_timer_interrupt+0x6b/0x70
Jul 12 14:32:32 ubuntu kernel: [1110502.682671]  <EOI>
Jul 12 14:32:32 ubuntu kernel: [1110502.682677]  [<ffffffff81060526>] ? native_safe_halt+0x6/0x10
Jul 12 14:32:32 ubuntu kernel: [1110502.682682]  [<ffffffff8101f24e>] default_idle+0x1e/0xa0
Jul 12 14:32:32 ubuntu kernel: [1110502.682685]  [<ffffffff8101f9ff>] arch_cpu_idle+0xf/0x20
Jul 12 14:32:32 ubuntu kernel: [1110502.682689]  [<ffffffff810bdb1a>] default_idle_call+0x2a/0x40
Jul 12 14:32:32 ubuntu kernel: [1110502.682691]  [<ffffffff810bde5e>] cpu_startup_entry+0x2ce/0x330
Jul 12 14:32:32 ubuntu kernel: [1110502.682696]  [<ffffffff817e047c>] rest_init+0x7c/0x80
Jul 12 14:32:32 ubuntu kernel: [1110502.682702]  [<ffffffff81d50025>] start_kernel+0x48b/0x4ac
Jul 12 14:32:32 ubuntu kernel: [1110502.682705]  [<ffffffff81d4f120>] ? early_idt_handler_array+0x120/0x120
Jul 12 14:32:32 ubuntu kernel: [1110502.682708]  [<ffffffff81d4f339>] x86_64_start_reservations+0x2a/0x2c
Jul 12 14:32:32 ubuntu kernel: [1110502.682711]  [<ffffffff81d4f485>] x86_64_start_kernel+0x14a/0x16d

- e1000_watchdog:

Jul 15 02:12:19 ubuntu kernel: [1325198.227183] INFO: rcu_sched self-detected stall on CPU { 0}  (t=79984 jiffies g=22282232 c=22282231 q=0)
Jul 15 02:12:19 ubuntu kernel: [1325198.228218] rcu_sched kthread starved for 79984 jiffies! g22282232 c22282231 f0x0
Jul 15 02:12:19 ubuntu kernel: [1325198.228221] Task dump for CPU 0:
Jul 15 02:12:19 ubuntu kernel: [1325198.228224] kworker/0:2     R  running task        0 57658      2 0x00000008
Jul 15 02:12:19 ubuntu kernel: [1325198.228359] Workqueue: events e1000_watchdog [e1000]
Jul 15 02:12:19 ubuntu kernel: [1325198.228363]  ffff880032508dc0 000000002c551eb3 ffff88003d603d78 ffffffff810a8799
Jul 15 02:12:19 ubuntu kernel: [1325198.228366]  0000000000000000 ffffffff81c51b80 ffff88003d603d98 ffffffff810aafbb
Jul 15 02:12:19 ubuntu kernel: [1325198.228369]  0000000000000083 0000000000000001 ffff88003d603dc8 ffffffff810ddf38
Jul 15 02:12:19 ubuntu kernel: [1325198.228372] Call Trace:
Jul 15 02:12:19 ubuntu kernel: [1325198.228374]  <IRQ>  [<ffffffff810a8799>] sched_show_task+0xa9/0x110
Jul 15 02:12:19 ubuntu kernel: [1325198.228390]  [<ffffffff810aafbb>] dump_cpu_task+0x3b/0x50
Jul 15 02:12:19 ubuntu kernel: [1325198.228395]  [<ffffffff810ddf38>] rcu_dump_cpu_stacks+0x88/0xd0
Jul 15 02:12:19 ubuntu kernel: [1325198.228399]  [<ffffffff810e19d5>] rcu_check_callbacks+0x485/0x760
Jul 15 02:12:19 ubuntu kernel: [1325198.228405]  [<ffffffff811375ac>] ? acct_account_cputime+0x1c/0x20
Jul 15 02:12:19 ubuntu kernel: [1325198.228407]  [<ffffffff810aba71>] ? account_system_time+0x81/0x120
Jul 15 02:12:19 ubuntu kernel: [1325198.228413]  [<ffffffff810f7450>] ? tick_sched_handle.isra.14+0x60/0x60
Jul 15 02:12:19 ubuntu kernel: [1325198.228416]  [<ffffffff810e7de9>] update_process_times+0x39/0x60
Jul 15 02:12:19 ubuntu kernel: [1325198.228419]  [<ffffffff810f7415>] tick_sched_handle.isra.14+0x25/0x60
Jul 15 02:12:19 ubuntu kernel: [1325198.228423]  [<ffffffff810f7494>] tick_sched_timer+0x44/0x80
Jul 15 02:12:19 ubuntu kernel: [1325198.228428]  [<ffffffff810e8703>] __hrtimer_run_queues+0xf3/0x220
Jul 15 02:12:19 ubuntu kernel: [1325198.228431]  [<ffffffff810e8e38>] hrtimer_interrupt+0xa8/0x1a0
Jul 15 02:12:19 ubuntu kernel: [1325198.228435]  [<ffffffff8104effc>] local_apic_timer_interrupt+0x3c/0x70
Jul 15 02:12:19 ubuntu kernel: [1325198.228440]  [<ffffffff817f5a41>] smp_apic_timer_interrupt+0x41/0x60
Jul 15 02:12:19 ubuntu kernel: [1325198.228443]  [<ffffffff817f3bdb>] apic_timer_interrupt+0x6b/0x70
Jul 15 02:12:19 ubuntu kernel: [1325198.228444]  <EOI>  [<ffffffff817f2785>] ? _raw_spin_unlock_irqrestore+0x15/0x20
Jul 15 02:12:19 ubuntu kernel: [1325198.228454]  [<ffffffffc007f742>] e1000_update_stats+0x702/0x740 [e1000]
Jul 15 02:12:19 ubuntu kernel: [1325198.228458]  [<ffffffffc007f7ea>] e1000_watchdog+0x6a/0x530 [e1000]
Jul 15 02:12:19 ubuntu kernel: [1325198.228465]  [<ffffffff810947aa>] process_one_work+0x1aa/0x440
Jul 15 02:12:19 ubuntu kernel: [1325198.228468]  [<ffffffff81094a8b>] worker_thread+0x4b/0x4c0
Jul 15 02:12:19 ubuntu kernel: [1325198.228471]  [<ffffffff81094a40>] ? process_one_work+0x440/0x440
Jul 15 02:12:19 ubuntu kernel: [1325198.228475]  [<ffffffff81094a40>] ? process_one_work+0x440/0x440
Jul 15 02:12:19 ubuntu kernel: [1325198.228478]  [<ffffffff8109ae28>] kthread+0xd8/0xf0
Jul 15 02:12:19 ubuntu kernel: [1325198.228481]  [<ffffffff8109ad50>] ? kthread_create_on_node+0x1f0/0x1f0
Jul 15 02:12:19 ubuntu kernel: [1325198.228483]  [<ffffffff817f319f>] ret_from_fork+0x3f/0x70
Jul 15 02:12:19 ubuntu kernel: [1325198.228486]  [<ffffffff8109ad50>] ? kthread_create_on_node+0x1f0/0x1f0

- disk_events_workfn:

Jul 18 00:41:52 ubuntu kernel: [1578849.085673] CPU: 1 PID: 31576 Comm: kworker/u128:0 Tainted: G        W  OEL  4.2.0-30-generic #36-Ubuntu
Jul 18 00:41:52 ubuntu kernel: [1578849.085676] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00
07/02/2015
Jul 18 00:41:52 ubuntu kernel: [1578849.085686] Workqueue: events_freezable_power_ disk_events_workfn
Jul 18 00:41:52 ubuntu kernel: [1578849.085688] task: ffff880036de5280 ti: ffff880006ff0000 task.ti: ffff880006ff0000
Jul 18 00:41:52 ubuntu kernel: [1578849.085690] RIP: 0010:[<ffffffff817f2785>]  [<ffffffff817f2785>] _raw_spin_unlock_irqrestore+0x15/0x20
Jul 18 00:41:52 ubuntu kernel: [1578849.085699] RSP: 0018:ffff880006ff3a08  EFLAGS: 00000293
Jul 18 00:41:52 ubuntu kernel: [1578849.085700] RAX: 0000000000000000 RBX: ffffffff81c1abc0 RCX: 000000000000001c
Jul 18 00:41:52 ubuntu kernel: [1578849.085702] RDX: ffff880034781e70 RSI: 0000000000000293 RDI: 0000000000000293
Jul 18 00:41:52 ubuntu kernel: [1578849.085703] RBP: ffff880006ff3a08 R08: 000000100000014a R09: 0000000800000010
Jul 18 00:41:52 ubuntu kernel: [1578849.085705] R10: ffff880034781af8 R11: 0000000000000000 R12: ffffffff8159ce7c
Jul 18 00:41:52 ubuntu kernel: [1578849.085706] R13: ffff880006ff39c8 R14: ffff880034781ab8 R15: 0000000000000002
Jul 18 00:41:52 ubuntu kernel: [1578849.085709] FS:  0000000000000000(0000) GS:ffff88003d640000(0000) knlGS:0000000000000000
Jul 18 00:41:52 ubuntu kernel: [1578849.085710] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jul 18 00:41:52 ubuntu kernel: [1578849.085712] CR2: 00007fb3a269c000 CR3: 000000003cb8d000 CR4: 00000000001406e0
Jul 18 00:41:52 ubuntu kernel: [1578849.085727] Stack:
Jul 18 00:41:52 ubuntu kernel: [1578849.085729]  ffff880006ff3a38 ffffffff815a5901 ffff88003313cf00 ffff88003492a260
Jul 18 00:41:52 ubuntu kernel: [1578849.085732]  ffff880035680800 ffff880036d4d050 ffff880006ff3a78 ffffffff8158010a
Jul 18 00:41:52 ubuntu kernel: [1578849.085734]  ffff880006ff3a78 ffff880035680800 ffff88003492a260 ffff880035680800
Jul 18 00:41:52 ubuntu kernel: [1578849.085737] Call Trace:
Jul 18 00:41:52 ubuntu kernel: [1578849.085785]  [<ffffffff815a5901>] ata_scsi_queuecmd+0xb1/0x250
Jul 18 00:41:52 ubuntu kernel: [1578849.085790]  [<ffffffff8158010a>] scsi_dispatch_cmd+0x10a/0x200
Jul 18 00:41:52 ubuntu kernel: [1578849.085794]  [<ffffffff815829f2>] scsi_request_fn+0x472/0x610
Jul 18 00:41:52 ubuntu kernel: [1578849.085798]  [<ffffffff813951e7>] __blk_run_queue+0x37/0x50
Jul 18 00:41:52 ubuntu kernel: [1578849.085802]  [<ffffffff8139eb71>] blk_execute_rq_nowait+0xb1/0x160
Jul 18 00:41:52 ubuntu kernel: [1578849.085808]  [<ffffffff811d3a41>] ? alloc_pages_current+0x91/0x100
Jul 18 00:41:52 ubuntu kernel: [1578849.085811]  [<ffffffff8139ecab>] blk_execute_rq+0x8b/0x140
Jul 18 00:41:52 ubuntu kernel: [1578849.085814]  [<ffffffff8139ab33>] ? blk_rq_bio_prep+0x63/0x80
Jul 18 00:41:52 ubuntu kernel: [1578849.085818]  [<ffffffff8139ea36>] ? blk_rq_map_kern+0xe6/0x140
Jul 18 00:41:52 ubuntu kernel: [1578849.085820]  [<ffffffff8157f637>] scsi_execute+0x137/0x1d0
Jul 18 00:41:52 ubuntu kernel: [1578849.085823]  [<ffffffff81581269>] ? scsi_execute_req_flags+0x49/0xf0
Jul 18 00:41:52 ubuntu kernel: [1578849.085826]  [<ffffffff815812ae>] scsi_execute_req_flags+0x8e/0xf0
Jul 18 00:41:52 ubuntu kernel: [1578849.085830]  [<ffffffff81591db7>] sr_check_events+0xb7/0x2d0
Jul 18 00:41:52 ubuntu kernel: [1578849.085835]  [<ffffffff815cdd5c>] cdrom_check_events+0x1c/0x40
Jul 18 00:41:52 ubuntu kernel: [1578849.085838]  [<ffffffff815921ea>] sr_block_check_events+0x2a/0x30
Jul 18 00:41:52 ubuntu kernel: [1578849.085841]  [<ffffffff813a9410>] disk_check_events+0x60/0x150
Jul 18 00:41:52 ubuntu kernel: [1578849.085844]  [<ffffffff813a9516>] disk_events_workfn+0x16/0x20
Jul 18 00:41:52 ubuntu kernel: [1578849.085850]  [<ffffffff810947aa>] process_one_work+0x1aa/0x440
Jul 18 00:41:52 ubuntu kernel: [1578849.085853]  [<ffffffff81094a8b>] worker_thread+0x4b/0x4c0
Jul 18 00:41:52 ubuntu kernel: [1578849.085856]  [<ffffffff81094a40>] ? process_one_work+0x440/0x440
Jul 18 00:41:52 ubuntu kernel: [1578849.085859]  [<ffffffff81094a40>] ? process_one_work+0x440/0x440
Jul 18 00:41:52 ubuntu kernel: [1578849.085863]  [<ffffffff8109ae28>] kthread+0xd8/0xf0
Jul 18 00:41:52 ubuntu kernel: [1578849.085866]  [<ffffffff8109ad50>] ? kthread_create_on_node+0x1f0/0x1f0
Jul 18 00:41:52 ubuntu kernel: [1578849.085869]  [<ffffffff817f319f>] ret_from_fork+0x3f/0x70
Jul 18 00:41:52 ubuntu kernel: [1578849.085872]  [<ffffffff8109ad50>] ? kthread_create_on_node+0x1f0/0x1f0
Jul 18 00:41:52 ubuntu kernel: [1578849.085874] Code: 00 00 eb c7 31 c0 eb cb e8 49 90 88 ff 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 c6 07 00 0f 1f 40 00 48 89 f7 57 9d <0f> 1f 44 00 00 5d c3 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 c6

as a result, it was common to see in the logs that another task was
killed during the busy loop, e.g.:

Jul 18 00:43:06 ubuntu kernel: [1578895.078397] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 25s! [thermald:714]
Jul 18 00:43:06 ubuntu kernel: [1578895.078470] CPU: 1 PID: 714 Comm: thermald Tainted: G        W  OEL  4.2.0-30-generic #36-Ubuntu
Jul 18 00:43:06 ubuntu kernel: [1578895.078472] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00
07/02/2015
Jul 18 00:43:06 ubuntu kernel: [1578895.078475] task: ffff88003776e040 ti: ffff880038390000 task.ti: ffff880038390000
Jul 18 00:43:06 ubuntu kernel: [1578895.078477] RIP: 0010:[<ffffffff810fc9d8>]  [<ffffffff810fc9d8>] smp_call_function_single+0xd8/0x130
Jul 18 00:43:06 ubuntu kernel: [1578895.078486] RSP: 0018:ffff880038393c98  EFLAGS: 00000202
Jul 18 00:43:06 ubuntu kernel: [1578895.078487] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
Jul 18 00:43:06 ubuntu kernel: [1578895.078489] RDX: 0000000000000001 RSI: 0000000000000100 RDI: 0000000000000292
Jul 18 00:43:06 ubuntu kernel: [1578895.078490] RBP: ffff880038393ce8 R08: 0000000000000000 R09: 0000000000000001
Jul 18 00:43:06 ubuntu kernel: [1578895.078492] R10: ffff8800175de000 R11: ffff880035123180 R12: ffff880034c1c2d0
Jul 18 00:43:06 ubuntu kernel: [1578895.078493] R13: 0000000000000292 R14: 0000000000000292 R15: ffff880038393c38
Jul 18 00:43:06 ubuntu kernel: [1578895.078495] FS:  00007fa118895700(0000) GS:ffff88003d640000(0000) knlGS:0000000000000000
Jul 18 00:43:06 ubuntu kernel: [1578895.078497] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 18 00:43:06 ubuntu kernel: [1578895.078499] CR2: 00007fabda352000 CR3: 0000000034c22000 CR4: 00000000001406e0
Jul 18 00:43:06 ubuntu kernel: [1578895.078508] Stack:
Jul 18 00:43:06 ubuntu kernel: [1578895.078510]  0000000000000000 ffffffff8121e2c4 ffff880038393cd8 0000000000000000
Jul 18 00:43:06 ubuntu kernel: [1578895.078513]  ffffffff813f86b0 ffff880038393d00 0000000000000003 000000005a1685d7
Jul 18 00:43:06 ubuntu kernel: [1578895.078515]  ffff880038393d4c ffff880038393d48 ffff880038393d38 ffffffff813f884d
Jul 18 00:43:06 ubuntu kernel: [1578895.078518] Call Trace:
Jul 18 00:43:06 ubuntu kernel: [1578895.078526]  [<ffffffff8121e2c4>] ? mntput+0x24/0x40
Jul 18 00:43:06 ubuntu kernel: [1578895.078532]  [<ffffffff813f86b0>] ? ucs2_strncmp+0x50/0x50
Jul 18 00:43:06 ubuntu kernel: [1578895.078642]  [<ffffffff813f884d>] rdmsr_on_cpu+0x5d/0x90
Jul 18 00:43:06 ubuntu kernel: [1578895.078650]  [<ffffffffc030b2e7>] show_temp+0xa7/0xe0 [coretemp]
Jul 18 00:43:06 ubuntu kernel: [1578895.078656]  [<ffffffff8151d1d0>] dev_attr_show+0x20/0x50
Jul 18 00:43:06 ubuntu kernel: [1578895.078662]  [<ffffffff817f0c16>] ? mutex_lock+0x16/0x40
Jul 18 00:43:06 ubuntu kernel: [1578895.078667]  [<ffffffff8127a3dc>] sysfs_kf_seq_show+0xbc/0x130
Jul 18 00:43:06 ubuntu kernel: [1578895.078669]  [<ffffffff81278c13>] kernfs_seq_show+0x23/0x30
Jul 18 00:43:06 ubuntu kernel: [1578895.078672]  [<ffffffff81221c9c>] seq_read+0xec/0x390
Jul 18 00:43:06 ubuntu kernel: [1578895.078675]  [<ffffffff8127953a>] kernfs_fop_read+0x10a/0x160
Jul 18 00:43:06 ubuntu kernel: [1578895.078679]  [<ffffffff81326a00>] ? security_file_permission+0xa0/0xc0
Jul 18 00:43:06 ubuntu kernel: [1578895.078682]  [<ffffffff811fd5b8>] __vfs_read+0x18/0x40
Jul 18 00:43:06 ubuntu kernel: [1578895.078685]  [<ffffffff811fdb87>] vfs_read+0x87/0x130
Jul 18 00:43:06 ubuntu kernel: [1578895.078687]  [<ffffffff811fe905>] SyS_read+0x55/0xc0
Jul 18 00:43:07 ubuntu kernel: [1578895.078690]  [<ffffffff817f2d72>] entry_SYSCALL_64_fastpath+0x16/0x75
Jul 18 00:43:07 ubuntu kernel: [1578895.078692] Code: 25 28 00 00 00 75 70 48 83 c4 40 5b 41 5c 5d c3 48 8d 75 c8 48 89 d1 89 df 4c 89 e2 e8 12 fe ff ff 8b 55 e0 83 e2 01 74 cf f3 90 <8b> 55 e0 83 e2 01 75 f6 eb c3 8b 05 f0 6c df 00 85 c0 75 85 80

Any other APIs wrapping smp_call_function_* are also affected by the
same problem.  However, functions like on_each_cpu() have additional
logic, making them less visible in the logs than functions using the
smp_call_function_single() API directly.

Another aspect of the described situation is that for stability of the
system as a whole, it might in fact be safer to kill the busy-waiting
task rather than the more problematic task that fails to handle an IPI.
Unfortunately, the resulting logs are very confusing.  They appear to
suggest that the problem resides on the correct execution context which
is killed and dumped, but not on the actually problematic context, which
might not be dumped.  This makes it hard to root-cause the problem even
if one is aware of this shortcoming of the killings and the logging.

Maybe better behavior in that case should be having the busy-looping
time out cleanly, with an error return from smp_call_function_single().
At the same time, we may send an NMI to all CPUs for printing a
backtrace, which will greatly help in diagnosing the problem, although
on systems with a lot of logical CPUs this may be impractical (e.g.,
with Knights Landing being up to 288 "CPUs" in one chip, such systems
may already be surprisingly common).

Alexander