linux-kernel - smp_call_function

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMiJ5CX5Ub66mQiVGwwq+PqV5Hf1Aoz2pdsSLSMKH7vJYuhSzg@mail.gmail.com>
Date:	Wed, 11 Feb 2015 11:19:11 -0200
From:	Rafael David Tinoco <inaddy@...ntu.com>
To:	LKML <linux-kernel@...r.kernel.org>
Cc:	torvalds@...ux-foundation.org,
	Thomas Gleixner <tglx@...utronix.de>,
	Jens Axboe <axboe@...nel.dk>
Subject: smp_call_function_single lockups

Linus, Thomas, Jens..

During the 3.18 - 3.19 "frequent lockups discussion", in some point
you have observed csd_lock() and csd_unlock() possible synchronization
problems. I think we have managed to reproduce that issue in a
constant basis with 3.13 (ubuntu) and 3.19 (latest vanilla).

- When running "open-stack tempest" in a nested-kvm environment we are
able to cause a lockup in question of hours (from 2 to 20 hours
usually). Trace from nested hypervisor (ubuntu 3.13):

crash> bt
PID: 29130  TASK: ffff8804288ac800  CPU: 1   COMMAND: "qemu-system-x86"
 #0 [ffff88043fd03d18] machine_kexec at ffffffff8104ac02
 #1 [ffff88043fd03d68] crash_kexec at ffffffff810e7203
 #2 [ffff88043fd03e30] panic at ffffffff81719ff4
 #3 [ffff88043fd03ea8] watchdog_timer_fn at ffffffff8110d7c5
 #4 [ffff88043fd03ed8] __run_hrtimer at ffffffff8108e787
 #5 [ffff88043fd03f18] hrtimer_interrupt at ffffffff8108ef4f
 #6 [ffff88043fd03f80] local_apic_timer_interrupt at ffffffff81043537
 #7 [ffff88043fd03f98] smp_apic_timer_interrupt at ffffffff81733d4f
 #8 [ffff88043fd03fb0] apic_timer_interrupt at ffffffff817326dd
--- <IRQ stack> ---
 #9 [ffff8804284a3bc8] apic_timer_interrupt at ffffffff817326dd
    [exception RIP: generic_exec_single+130]
    RIP: ffffffff810dbe62  RSP: ffff8804284a3c70  RFLAGS: 00000202
    RAX: 0000000000000002  RBX: ffff8804284a3c40  RCX: 0000000000000001
    RDX: ffffffff8180ad60  RSI: 0000000000000000  RDI: 0000000000000286
    RBP: ffff8804284a3ca0   R8: ffffffff8180ad48   R9: 0000000000000001
    R10: ffffffff81185cac  R11: ffffea00109b4a00  R12: ffff88042829f400
    R13: 0000000000000000  R14: ffffea001017d640  R15:
0000000000000005    ORIG_RAX: ffffffffffffff10  CS: 0010  SS: 0018
#10 [ffff8804284a3ca8] smp_call_function_single at ffffffff810dbf75
#11 [ffff8804284a3d20] smp_call_function_many at ffffffff810dc3a6
#12 [ffff8804284a3d80] native_flush_tlb_others at ffffffff8105c8f7
#13 [ffff8804284a3da8] flush_tlb_mm_range at ffffffff8105c9cb
#14 [ffff8804284a3dd8] tlb_flush_mmu at ffffffff811755b3
#15 [ffff8804284a3e00] tlb_finish_mmu at ffffffff81176145
#16 [ffff8804284a3e20] unmap_region at ffffffff8117e013
#17 [ffff8804284a3ee0] do_munmap at ffffffff81180356
#18 [ffff8804284a3f30] vm_munmap at ffffffff81180521
#19 [ffff8804284a3f60] sys_munmap at ffffffff81181482
#20 [ffff8804284a3f80] system_call_fastpath at ffffffff8173196d
    RIP: 00007fa3ed16c587  RSP: 00007fa3536f5c10  RFLAGS: 00000246
    RAX: 000000000000000b  RBX: ffffffff8173196d  RCX: 0000001d00000007
    RDX: 0000000000000000  RSI: 0000000000801000  RDI: 00007fa315ff4000
    RBP: 00007fa3167f49c0   R8: 0000000000000000   R9: 00007fa3f5396738
    R10: 00007fa3536f5a60  R11: 0000000000000202  R12: 00007fa3ed6562a0
    R13: 00007fa350ef19c0  R14: ffffffff81181482  R15: ffff8804284a3f78
    ORIG_RAX: 000000000000000b  CS: 0033  SS: 002b

- After applying patch provided by Thomas we were able to cause the
lockup only after 6 days (also locked inside
smp_call_function_single). Test performance (even for a nested kvm)
was reduced substantially with 3.19 + this patch. Trace from the
nested hypervisor (3.19 + patch):

 crash> bt
PID: 10467  TASK: ffff880817b3b1c0  CPU: 1   COMMAND: "qemu-system-x86"
 #0 [ffff88083fd03cc0] machine_kexec at ffffffff81052052
 #1 [ffff88083fd03d10] crash_kexec at ffffffff810f91c3
 #2 [ffff88083fd03de0] panic at ffffffff8176f713
 #3 [ffff88083fd03e60] watchdog_timer_fn at ffffffff8112316b
 #4 [ffff88083fd03ea0] __run_hrtimer at ffffffff810da087
 #5 [ffff88083fd03ef0] hrtimer_interrupt at ffffffff810da467
 #6 [ffff88083fd03f70] local_apic_timer_interrupt at ffffffff81049769
 #7 [ffff88083fd03f90] smp_apic_timer_interrupt at ffffffff8177fc25
 #8 [ffff88083fd03fb0] apic_timer_interrupt at ffffffff8177dcbd
--- <IRQ stack> ---
 #9 [ffff880817973a68] apic_timer_interrupt at ffffffff8177dcbd
    [exception RIP: generic_exec_single+218]
    RIP: ffffffff810ee0ca  RSP: ffff880817973b18  RFLAGS: 00000202
    RAX: 0000000000000002  RBX: 0000000000000292  RCX: 0000000000000001
    RDX: ffffffff8180e6e0  RSI: 0000000000000000  RDI: 0000000000000292
    RBP: ffff880817973b58   R8: ffffffff8180e6c8   R9: 0000000000000001
    R10: 000000000000b6e0  R11: 0000000000000001  R12: ffffffff811f6626
    R13: ffff880817973ab8  R14: ffffffff8109cfd2  R15: ffff880817973a78
    ORIG_RAX: ffffffffffffff10  CS: 0010  SS: 0018
#10 [ffff880817973b60] smp_call_function_single at ffffffff810ee1c7
#11 [ffff880817973b90] loaded_vmcs_clear at ffffffffa0309097 [kvm_intel]
#12 [ffff880817973ba0] vmx_vcpu_load at ffffffffa030defe [kvm_intel]
#13 [ffff880817973be0] kvm_arch_vcpu_load at ffffffffa01eba53 [kvm]
#14 [ffff880817973c00] kvm_sched_in at ffffffffa01d94a9 [kvm]
#15 [ffff880817973c20] finish_task_switch at ffffffff81099148
#16 [ffff880817973c60] __schedule at ffffffff817781ec
#17 [ffff880817973cd0] schedule at ffffffff81778699
#18 [ffff880817973ce0] kvm_vcpu_block at ffffffffa01d8dfd [kvm]
#19 [ffff880817973d40] kvm_arch_vcpu_ioctl_run at ffffffffa01ef64c [kvm]
#20 [ffff880817973e10] kvm_vcpu_ioctl at ffffffffa01dbc19 [kvm]
#21 [ffff880817973eb0] do_vfs_ioctl at ffffffff811f5948
#22 [ffff880817973f30] sys_ioctl at ffffffff811f5be1
#23 [ffff880817973f80] system_call_fastpath at ffffffff8177cc2d
    RIP: 00007f42f987fec7  RSP: 00007f42ef1bebd8  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: ffffffff8177cc2d  RCX: ffffffffffffffff
    RDX: 0000000000000000  RSI: 000000000000ae80  RDI: 000000000000000e
    RBP: 00007f430047b040   R8: 0000000000000000   R9: 00000000000000ff
    R10: 0000000000000000  R11: 0000000000000246  R12: 00007f42ff920240
    R13: 0000000000000000  R14: 0000000000000001  R15: 0000000000000001
    ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b

Not sure if you are still pursuing this, anyway.. let me know if you
think of any other change, I'll keep the environment.

-Tinoco

On Mon, 17 Nov 2014, Thomas Gleixner wrote:
> On Mon, 17 Nov 2014, Linus Torvalds wrote:
> >   llist_for_each_entry_safe(csd, csd_next, entry, llist) {
> > - csd->func(csd->info);
> > + smp_call_func_t func = csd->func;
> > + void *info = csd->info;
> >   csd_unlock(csd);
> > +
> > + func(info);
>
> No, that won't work for synchronous calls:
>
>     CPU 0           CPU 1
>
>     csd_lock(csd);
>     queue_csd();
>     ipi();
> func = csd->func;
> info = csd->info;
> csd_unlock(csd);
>     csd_lock_wait();
> func(info);
>
> The csd_lock_wait() side will succeed and therefor assume that the
> call has been completed while the function has not been called at
> all. Interesting explosions to follow.
>
> The proper solution is to revert that commit and properly analyze the
> problem which Jens was trying to solve and work from there.

So a combo of both (Jens and yours) might do the trick. Patch below.

I think what Jens was trying to solve is:

     CPU 0           CPU 1

     csd_lock(csd);
     queue_csd();
     ipi();
  csd->func(csd->info);
     wait_for_completion(csd);
  complete(csd);
     reuse_csd(csd);
csd_unlock(csd);
Thanks,

tglx

Index: linux/kernel/smp.c
===================================================================
--- linux.orig/kernel/smp.c
+++ linux/kernel/smp.c
@@ -126,7 +126,7 @@ static void csd_lock(struct call_single_

 static void csd_unlock(struct call_single_data *csd)
 {
- WARN_ON((csd->flags & CSD_FLAG_WAIT) && !(csd->flags & CSD_FLAG_LOCK));
+ WARN_ON(!(csd->flags & CSD_FLAG_LOCK));

  /*
  * ensure we're all done before releasing data:
@@ -250,8 +250,23 @@ static void flush_smp_call_function_queu
  }

  llist_for_each_entry_safe(csd, csd_next, entry, llist) {
- csd->func(csd->info);
- csd_unlock(csd);
+
+ /*
+ * For synchronous calls we are not allowed to unlock
+ * before the callback returned. For the async case
+ * its the responsibility of the caller to keep
+ * csd->info consistent while the callback runs.
+ */
+ if (csd->flags & CSD_FLAG_WAIT) {
+ csd->func(csd->info);
+ csd_unlock(csd);
+ } else {
+ smp_call_func_t func = csd->func;
+ void *info = csd->info;
+
+ csd_unlock(csd);
+ func(info);
+ }
  }

  /*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/