[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20141004134729.GB5214@wfg-t540p.sh.intel.com>
Date: Sat, 4 Oct 2014 21:47:29 +0800
From: Fengguang Wu <fengguang.wu@...el.com>
To: Andi Kleen <ak@...ux.intel.com>
Cc: Jet Chen <jet.chen@...el.com>, Su Tao <tao.su@...el.com>,
Yuanhan Liu <yuanhan.liu@...el.com>, LKP <lkp@...org>,
linux-kernel@...r.kernel.org
Subject: [x86LKP] PANIC: double fault, error_code: 0xffffffffffffffff
Hi Andi,
0day kernel testing robot got the below dmesg and the first bad commit is
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git x86/fsgs-2
commit b8a868e9ea876a1b40020397305533c095921d7a
Author: Andi Kleen <ak@...ux.intel.com>
AuthorDate: Wed Apr 23 13:26:20 2014 -0700
Commit: Andi Kleen <ak@...ux.intel.com>
CommitDate: Fri Oct 3 15:19:56 2014 -0700
x86: Add support for rd/wr fs/gs base
IvyBridge added new instructions to directly write the fs and gs
64bit base registers. Previously this had to be done with a system
call to write to MSRs. The main use case is fast user space threading
and switching the fs/gs registers quickly there. Another use
case is having (relatively) cheap access to a new address
register per thread.
The instructions are opt-in and have to be explicitely enabled
by the OS.
Previously Linux couldn't support this because the paranoid
entry code relied on the gs base never being negative outside
the kernel to decide when to use swaps. It would check the gs MSR
value and assume it was already running in kernel if the value
was already negative.
To make this work we have to revamp the paranoid exception
path to not rely on this. We can use the new instructions
to get (relatively) quick access to the values.
This is also significantly faster than a MSR read, so will speed
NMIs (critical for profiling)
The original patch compared the gs with the kernel gs and
assumed that if it was the same swapgs is not needed
(and no user space processing was needed). This
was nice and simple and didn't need a lot of changes.
But this had the side effect that if a user process set its
GS to the same as the kernel it may lose rescheduling
checks (so a racing reschedule IPI would have been
only acted upon the next non paranoid interrupt)
This version now switches to full save/restore of the GS.
This requires quite some changes in the paranoid path.
Unfortunately didn't come up with a simpler scheme:
The kernel gs for the paranoid path is now stored at the
bottom of the IST stack (so that it can be derived from RSP).
For this we need to know the size of the IST stack
(4K or 8K), which is now passed in as a stack parameter
to save_paranoid.
Previously we had a flag in EBX that indicated whether
SWAPGS needs to be called later or not. In the new scheme
this turns into a tristate, with a new "restore from R15"
mode. The exit paths are all adjusted to handle this correctly.
There is one complication: to allow debuggers (especially
from the int3 or debug vectors) access to the user GS
we need to save it in the task struct. Normally
the next context switch would overwrite it with the wrong
value from kernel_gs, so we set new flag also in task_struct
that prevents it.
Also to prevent recursive interrupts clobbering this
state in the task_struct this is only done for interrupts
coming from ring 3.
After a schedule comes back we check if the flag is still
set. If it wasn't set the GS is back in the (swapped) kernel
gs so we revert to the SWAPGS mode, instead of restoring GS.
Then after these changes we need to also use the new instructions
to save/restore fs and gs, so that the new values set by the
users won't disappear. This is also significantly
faster for the case when the 64bit base has to be switched
(that is when GS is larger than 4GB), as we can replace
the slow MSR write with a faster wr[fg]sbase execution.
The instructions do not context switch
the segment index, so the old invariant that fs or gs index
have to be 0 for a different 64bit value to stick is still
true. Previously it was enforced by arch_prctl, now the user
program has to make sure it keeps the segment indexes zero.
If it doesn't the changes may not stick.
This is in term enables fast switching when there are
enough threads that their TLS segment does not fit below 4GB,
or alternatively programs that use fs as an additional base
register will not get a sigificant context switch penalty.
It is all done in a single patch to avoid bisect crash
holes.
v2: Change to save/restore GS instead of using swapgs
based on the value. Large scale changes.
Signed-off-by: Andi Kleen <ak@...ux.intel.com>
+------------------------------------------+------------+------------+------------+
| | 598d570a05 | b8a868e9ea | 8048975233 |
+------------------------------------------+------------+------------+------------+
| boot_successes | 900 | 280 | 79 |
| boot_failures | 0 | 20 | 2 |
| PANIC:double_fault, | 0 | 12 | 2 |
| Kernel_panic-not_syncing:Machine_halted | 0 | 11 | 2 |
| BUG:unable_to_handle_kernel | 0 | 5 | |
| Oops | 0 | 3 | |
| RIP:pgd_free | 0 | 1 | |
| BUG:kernel_boot_crashed | 0 | 4 | |
| RIP:show_stack_log_lvl | 0 | 1 | |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 1 | |
+------------------------------------------+------------+------------+------------+
[ 5.087621] Freeing unused kernel memory: 1248K (ffff8800014c8000 - ffff880001600000)
[ 5.136856] Freeing unused kernel memory: 1936K (ffff88000181c000 - ffff880001a00000)
[ 5.167951] random: init urandom read with 5 bits of entropy available
[ 19.307116] PANIC: double fault, error_code: 0xffffffffffffffff
[ 19.309941] Kernel panic - not syncing: Machine halted.
[ 19.310083] CPU: 1 PID: 150 Comm: trinity-main Not tainted 3.17.0-rc7-00004-gb8a868e #130
[ 19.310083] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 19.310083] 0000000000000000 ffff880012707e88 ffffffff814befff ffffffff8174c441
[ 19.310083] ffff880012707f08 ffffffff814bcd7d 0000000000000008 ffff880012707f18
[ 19.310083] ffff880012707eb0 ffffffff81ba8d00 0000000000000046 ffff880010c7ffd8
[ 19.310083] Call Trace:
[ 19.310083] <#DF> [<ffffffff814befff>] dump_stack+0x4d/0x66
[ 19.310083] [<ffffffff814bcd7d>] panic+0xc4/0x1d6
[ 19.310083] [<ffffffff8102a96c>] df_debug+0x2c/0x2c
[ 19.310083] [<ffffffff81002df9>] do_double_fault+0x62/0x7d
[ 19.310083] [<ffffffff814c5d8e>] double_fault+0x2e/0x40
[ 19.310083] [<ffffffff814c610d>] ? async_page_fault+0xd/0x30
[ 19.310083] <<EOE>> <UNK>
[ 19.310083] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
git bisect start 80489752332f4e4f75343d6b539095b366013bc6 fe82dcec644244676d55a1384c958d5f67979adb --
git bisect bad 0e3e2d7a608587a01aaa9eaabd7f75cad56cb8ea # 07:18 19- 6 Merge 'ak/x86/fsgs-2' into devel-lkp-hsx02-x86_64-201410040643
git bisect good 63007bebe851b304d6e2d66fd08307a4fd35cc50 # 08:09 300+ 1 0day base guard for 'devel-lkp-hsx02-x86_64-201410040643'
git bisect bad b8a868e9ea876a1b40020397305533c095921d7a # 08:18 78- 19 x86: Add support for rd/wr fs/gs base
git bisect good a0b0be64599f50dc2c9fa85734026701221f186a # 08:27 300+ 0 x86: Naturally align the debug IST stack
git bisect good 598d570a05cd31500fb15a843a92f68ddb1b3618 # 08:33 300+ 0 x86: Add intrinsics/macros for new rd/wr fs/gs base instructions
# first bad commit: [b8a868e9ea876a1b40020397305533c095921d7a] x86: Add support for rd/wr fs/gs base
git bisect good 598d570a05cd31500fb15a843a92f68ddb1b3618 # 08:38 900+ 0 x86: Add intrinsics/macros for new rd/wr fs/gs base instructions
git bisect bad 80489752332f4e4f75343d6b539095b366013bc6 # 08:38 0- 2 0day head guard for 'devel-lkp-hsx02-x86_64-201410040643'
git bisect good 126d4576cb73c8a440adc37c129589cd66051bcc # 08:43 900+ 0 Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
git bisect good 2e1d004b9645628c64a2db55ef6b81fadf5e6e91 # 08:55 900+ 0 Add linux-next specific files for 20141003
This script may reproduce the error.
----------------------------------------------------------------------------
#!/bin/bash
kernel=$1
initrd=quantal-core-x86_64.cgz
wget --no-clobber https://github.com/fengguang/reproduce-kernel-bug/raw/master/initrd/$initrd
kvm=(
qemu-system-x86_64
-cpu kvm64
-enable-kvm
-kernel $kernel
-initrd $initrd
-m 320
-smp 2
-net nic,vlan=1,model=e1000
-net user,vlan=1
-boot order=nc
-no-reboot
-watchdog i6300esb
-rtc base=localtime
-serial stdio
-display none
-monitor null
)
append=(
hung_task_panic=1
earlyprintk=ttyS0,115200
debug
apic=debug
sysrq_always_enabled
rcupdate.rcu_cpu_stall_timeout=100
panic=-1
softlockup_panic=1
nmi_watchdog=panic
oops=panic
load_ramdisk=2
prompt_ramdisk=0
console=ttyS0,115200
console=tty0
vga=normal
root=/dev/ram0
rw
drbd.minor_count=8
)
"${kvm[@]}" --append "${append[*]}"
----------------------------------------------------------------------------
Thanks,
Fengguang
View attachment "dmesg-quantal-lkp-nex04-6:20141004081755:x86_64-randconfig-hxb1-1004::" of type "text/plain" (46299 bytes)
Download attachment "x86_64-randconfig-hxb1-1004-80489752332f4e4f75343d6b539095b366013bc6-PANIC:-double-116710.log" of type "application/octet-stream" (22127 bytes)
View attachment "config-3.17.0-rc7-00004-gb8a868e" of type "text/plain" (77153 bytes)
_______________________________________________
LKP mailing list
LKP@...ux.intel.com
Powered by blists - more mailing lists