lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <56E80D21.7010607@huawei.com>
Date:	Tue, 15 Mar 2016 21:24:49 +0800
From:	Weidong Wang <wangweidong1@...wei.com>
To:	<tglx@...utronix.de>, <mingo@...hat.com>, <hpa@...or.com>,
	<x86@...nel.org>, <linux-kernel@...r.kernel.org>,
	<torvalds@...ux-foundation.org>
CC:	Fengtiantian <fengtiantian@...wei.com>, <liuyongan@...wei.com>,
	<wangweidong1@...wei.com>
Subject: [Ask for help] met a deadlock with switch_fpu_finish on suse 3.0.93-0.8-default
 kernel

Hi all,

We find a deadlock problem in suse 3.0.93-0.8-default kernel when restore_fpu_checking return error in task switch.
--------------------------------------------
The Call Trace is :
193 PID: 2415   TASK: ffff880b739d24c0  CPU: 5   COMMAND: "qemu-kvm"
194  #0 [ffff880c7f6a6e40] crash_nmi_callback at ffffffff8102460f
195  #1 [ffff880c7f6a6e50] notifier_call_chain at ffffffff81465027
196  #2 [ffff880c7f6a6e80] __atomic_notifier_call_chain at ffffffff8146506d
197  #3 [ffff880c7f6a6e90] notify_die at ffffffff814650bd
198  #4 [ffff880c7f6a6ec0] default_do_nmi at ffffffff81462507
199  #5 [ffff880c7f6a6ee0] do_nmi at ffffffff81462738
200  #6 [ffff880c7f6a6ef0] restart_nmi at ffffffff81461c91
201     [exception RIP: _raw_spin_lock+21]
202     RIP: ffffffff814611e5  RSP: ffff8809d8d1ba80  RFLAGS: 00000093
203     RAX: 0000000000000010  RBX: 0000000000000010  RCX: 0000000000000093
204     RDX: ffff8809d8d1ba80  RSI: 0000000000000018  RDI: 0000000000000001
205     RBP: ffffffff814611e5   R8: ffffffff814611e5   R9: 0000000000000018
206     R10: ffff8809d8d1ba80  R11: 0000000000000093  R12: ffffffffffffffff
207     R13: ffff880c7f6b0a00  R14: 0000000000000005  R15: 000000000000e2b8
208     ORIG_RAX: 000000000000e2b8  CS: 0010  SS: 0018
209 --- <DOUBLEFAULT exception stack> ---
210  #7 [ffff8809d8d1ba80] _raw_spin_lock at ffffffff814611e5
211  #8 [ffff8809d8d1ba80] try_to_wake_up at ffffffff81054afb
212  #9 [ffff8809d8d1bad0] pollwake at ffffffff8116cfc6
213 #10 [ffff8809d8d1bb10] __wake_up_common at ffffffff81046e1a
214 #11 [ffff8809d8d1bb50] __wake_up at ffffffff8104bf43
215 #12 [ffff8809d8d1bb90] __send_signal at ffffffff81074bfd
216 #13 [ffff8809d8d1bbd0] force_sig_info at ffffffff81076194
217 #14 [ffff8809d8d1bc00] __switch_to at ffffffff81001930
218 #15 [ffff8809d8d1bcf0] reschedule_interrupt at ffffffff8146a06e
219 #16 [ffff8809d8d1bd58] vmx_handle_external_intr at ffffffffa03c3f4c [kvm_intel]
220 #17 [ffff8809d8d1bd80] vcpu_enter_guest at ffffffffa0363487 [kvm]
221 #18 [ffff8809d8d1be00] __vcpu_run at ffffffffa0363743 [kvm]
222 #19 [ffff8809d8d1be40] kvm_arch_vcpu_ioctl_run at ffffffffa0364438 [kvm]
223 #20 [ffff8809d8d1be70] kvm_vcpu_ioctl at ffffffffa0350cee [kvm]
224 #21 [ffff8809d8d1bf10] do_vfs_ioctl at ffffffff8116bd1b
225 #22 [ffff8809d8d1bf40] sys_ioctl at ffffffff8116c0e1
226 #23 [ffff8809d8d1bf80] system_call_fastpath at ffffffff81469172
--------------------------------------------

We see the patch
commit 80ab6f1e8c981b1b6604b2f22e36c917526235cd
"i387: use 'restore_fpu_checking()' directly in task switching code"

this patch remove the __math_state_restore in switch_fpu_finish,like that:

 static inline void switch_fpu_finish(struct task_struct *new, fpu_switch_t fpu)
 {
-       if (fpu.preload)
-               __math_state_restore(new);
+       if (fpu.preload) {
+               if (unlikely(restore_fpu_checking(new)))
+                       __thread_fpu_end(new);
+       }
 }

So in switch_fpu_finish, when entered restore_fpu_checking fail, it won't call force_sig().


1. Would it will fix this issuse(deadlock)?
2. We don't understand why the restore_fpu_checking would failed? Any one know that?
3. if the patch can fix the problem, We want to know that
   "restore_fpu_checking(tsk) really fail,and we not force send the SIGSEGV to the task,
    Would it introuduce other issue?"

Regards,
Weidong



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ