[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <5a194b7d.phrRCatyOVbLxz9h%fengguang.wu@intel.com>
Date: Sat, 25 Nov 2017 18:52:45 +0800
From: kernel test robot <fengguang.wu@...el.com>
To: Dave Hansen <dave.hansen@...ux.intel.com>
Cc: LKP <lkp@...org>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...nel.org>,
wfg@...ux.intel.com
Subject: 06222d856e ("x86/mm/kaiser: Use PCID feature to make user and
.."): WARNING: CPU: 0 PID: 1 at mm/early_ioremap.c:114 __early_ioremap
Greetings,
0day kernel testing robot got the below dmesg and the first bad commit is
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.x86/mm
commit 06222d856e45d727c18665ed37419d653f1dbef5
Author: Dave Hansen <dave.hansen@...ux.intel.com>
AuthorDate: Wed Nov 22 16:35:09 2017 -0800
Commit: Ingo Molnar <mingo@...nel.org>
CommitDate: Fri Nov 24 08:29:51 2017 +0100
x86/mm/kaiser: Use PCID feature to make user and kernel switches faster
Short summary: Use x86 PCID feature to avoid flushing the TLB at all
interrupts and syscalls. Speed them up. Makes context switches
and TLB flushing slower.
Background:
KAISER keeps two copies of the page tables. Switches between the
copies are performed by writing to the CR3 register. But, CR3
was really designed for context switches and writes to it also
flush the entire TLB (modulo global pages). This TLB flush
increases the cost of interrupts and context switches. For
syscall-heavy microbenchmarks it can cut the rate of syscalls by
2/3.
The kernel recently gained support for and Intel CPU feature
called Process Context IDentifiers (PCID) thanks to Andy
Lutomirski. This feature is intended to allow you to switch
between contexts without flushing the TLB.
Implementation:
PCIDs can be used to avoid flushing the TLB at kernel entry/exit.
This is speeds up both interrupts and syscalls.
First, the kernel and userspace must be assigned different ASIDs.
On entry from userspace, move over to the kernel page tables
*and* ASID. On exit, restore the user page tables and ASID.
Fortunately, the ASID is programmed via CR3, which is already
being used to switch between the user and kernel page tables.
This gives us convenient, one-stop shopping.
The CR3 write which is used to switch between processes provides
all the TLB flushing normally required at context switch time.
But, with KAISER, that CR3 write only flushes the current
(kernel) ASID. An extra TLB flush operation is now required in
order to flush the user ASID. This new instruction (INVPCID) is
probably ~100 cycles, but this is done with the assumption that
the time lost in context switches is more than made up for by
lower cost of interrupts and syscalls.
Support:
PCIDs are generally available on Sandybridge and newer CPUs. However,
the accompanying INVPCID instruction did not become available until
Haswell (the ones with "v4", or called fourth-generation Core). This
instruction allows non-current-PCID TLB entries to be flushed without
switching CR3 and global pages to be flushed without a double
MOV-to-CR4.
Without INVPCID, PCIDs are much harder to use. TLB invalidation gets
much more onerous:
1. Every kernel TLB flush (even for a single page) requires an
interrupts-off MOV-to-CR4 which is very expensive. This is because
there is no way to flush a kernel address that might be loaded
in *EVERY* PCID. Right now, there are "only" ~12 of these per-cpu,
but that's too painful to use the MOV-to-CR3 to flush them. That
leaves only the MOV-to-CR4.
2. Every userspace flush (even for a single page requires one of the
following:
a. A pair of flushing (bit 63 clear) CR3 writes: one for
the kernel ASID and another for userspace.
b. A pair of non-flushing CR3 writes (bit 63 set) with the
flush done for each. For instance, what is currently a
single instruction without KAISER:
invpcid_flush_one(current_pcid, addr);
becomes this with KAISER:
invpcid_flush_one(current_kern_pcid, addr);
invpcid_flush_one(current_user_pcid, addr);
and this without INVPCID:
__native_flush_tlb_single(addr);
write_cr3(mm->pgd | current_user_pcid | NOFLUSH);
__native_flush_tlb_single(addr);
write_cr3(mm->pgd | current_kern_pcid | NOFLUSH);
So, for now, fully disable PCIDs with KAISER when INVPCID is not
available. This is fixable, but it's an optimization that can be
performed later.
Hugh Dickins also points out that PCIDs really have two distinct
use-cases in the context of KAISER. The first way they can be used
is as "TLB preservation across context-switch", which is what
Andy Lutomirksi's 4.14 PCID code does. They can also be used as
a "KAISER syscall/interrupt accelerator". If we just use them to
speed up syscall/interrupts (and ignore the context-switch TLB
preservation), then the deficiency of not having INVPCID
becomes much less onerous.
Signed-off-by: Dave Hansen <dave.hansen@...ux.intel.com>
Cc: Andy Lutomirski <luto@...nel.org>
Cc: Borislav Petkov <bp@...en8.de>
Cc: Brian Gerst <brgerst@...il.com>
Cc: Daniel Gruss <daniel.gruss@...k.tugraz.at>
Cc: Denys Vlasenko <dvlasenk@...hat.com>
Cc: H. Peter Anvin <hpa@...or.com>
Cc: Hugh Dickins <hughd@...gle.com>
Cc: Josh Poimboeuf <jpoimboe@...hat.com>
Cc: Kees Cook <keescook@...gle.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Michael Schwarz <michael.schwarz@...k.tugraz.at>
Cc: Moritz Lipp <moritz.lipp@...k.tugraz.at>
Cc: Peter Zijlstra <peterz@...radead.org>
Cc: Richard Fellner <richard.fellner@...dent.tugraz.at>
Cc: Thomas Gleixner <tglx@...utronix.de>
Cc: linux-mm@...ck.org
Link: http://lkml.kernel.org/r/20171123003509.EC42DD15@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@...nel.org>
5ab2af1e02 x86/mm: Allow flushing for future ASID switches
06222d856e x86/mm/kaiser: Use PCID feature to make user and kernel switches faster
850f70b234 x86/mm/kaiser: Add Kconfig
acdad0aa07 Merge branch 'WIP.x86/mm'
+-----------------------------------------------------------------------------+------------+------------+------------+------------+
| | 5ab2af1e02 | 06222d856e | 850f70b234 | acdad0aa07 |
+-----------------------------------------------------------------------------+------------+------------+------------+------------+
| boot_successes | 37 | 0 | 2 | 0 |
| boot_failures | 37 | 23 | 43 | 31 |
| WARNING:at_kernel/locking/lockdep.c:#trace_hardirqs_off_caller | 37 | 21 | | |
| RIP:trace_hardirqs_off_caller | 37 | 21 | | |
| WARNING:at_drivers/pci/pci-sysfs.c:#pci_mmap_resource | 2 | 2 | 2 | 2 |
| RIP:pci_mmap_resource | 2 | 2 | 2 | 2 |
| WARNING:at_mm/early_ioremap.c:#__early_ioremap | 0 | 17 | 20 | 23 |
| RIP:__early_ioremap | 0 | 17 | 20 | 23 |
| Mem-Info | 0 | 1 | 1 | |
| BUG:kernel_hang_in_early-boot_stage,last_printk:early_console_in_setup_code | 0 | 2 | 19 | 8 |
| WARNING:possible_circular_locking_dependency_detected | 0 | 0 | 4 | |
+-----------------------------------------------------------------------------+------------+------------+------------+------------+
[ 0.012000] futex hash table entries: 16 (order: -2, 1664 bytes)
[ 0.012000] regulator-dummy: no parameters
[ 0.012000] NET: Registered protocol family 16
[ 0.012178] cpuidle: using governor menu
[ 0.012464] ------------[ cut here ]------------
[ 0.012787] WARNING: CPU: 0 PID: 1 at mm/early_ioremap.c:114 __early_ioremap+0x21/0x188
[ 0.013441] Modules linked in:
[ 0.013658] CPU: 0 PID: 1 Comm: swapper Not tainted 4.14.0-01247-g06222d8 #1
[ 0.014129] task: ffff880000055100 task.stack: ffff880000058000
[ 0.014527] RIP: 0010:__early_ioremap+0x21/0x188
[ 0.014851] RSP: 0000:ffff88000005be00 EFLAGS: 00010202
[ 0.015205] RAX: 8000000000000163 RBX: 000000000000040e RCX: 0000000000000002
[ 0.015687] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 000000000000040e
[ 0.016000] RBP: 0000000000000002 R08: 0000000000000004 R09: 0000000000000000
[ 0.016000] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 0.016000] R13: ffffffffb60d7731 R14: 0000000000000000 R15: 0000000000000000
[ 0.016000] FS: 0000000000000000(0000) GS:ffffffffb5c35000(0000) knlGS:0000000000000000
[ 0.016000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.016000] CR2: 0000000000000000 CR3: 0000000012a18001 CR4: 00000000001606b0
[ 0.016000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 0.016000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 0.016000] Call Trace:
[ 0.016000] ? do_early_param+0x88/0x88
[ 0.016000] early_memremap+0x2c/0x2f
[ 0.016000] ? cnb20le_res+0x25a/0x25a
[ 0.016000] acpi_find_root_pointer+0x18/0x141
[ 0.016000] ? cnb20le_res+0x25a/0x25a
[ 0.016000] ? do_early_param+0x88/0x88
[ 0.016000] acpi_os_get_root_pointer+0x24/0x42
[ 0.016000] broadcom_postcore_init+0x5/0x3b
[ 0.016000] do_one_initcall+0x8f/0x170
[ 0.016000] ? do_early_param+0x88/0x88
[ 0.016000] kernel_init_freeable+0x111/0x19b
[ 0.016000] ? rest_init+0x130/0x130
[ 0.016000] kernel_init+0x5/0xe1
[ 0.016000] ret_from_fork+0x24/0x30
[ 0.016000] Code: fa fe 0f ff b8 01 00 00 00 c3 41 57 41 56 48 89 f1 41 55 41 54 55 53 48 83 ec 20 48 89 54 24 10 31 d2 83 3d 92 5d db ff 00 74 02 <0f> ff 48 8b 04 d5 e0 f4 16 b6 41 89 d6 48 85 c0 0f 84 1e 01 00
[ 0.016000] ---[ end trace 0313b755f35329ac ]---
[ 0.016015] ------------[ cut here ]------------
[ 0.016015] ------------[ cut here ]------------
[ 0.016338] WARNING: CPU: 0 PID: 1 at mm/early_ioremap.c:114 __early_ioremap+0x21/0x188
[ 0.016992] Modules linked in:
[ 0.017205] CPU: 0 PID: 1 Comm: swapper Tainted: G W 4.14.0-01247-g06222d8 #1
[ 0.017755] task: ffff880000055100 task.stack: ffff880000058000
[ 0.018156] RIP: 0010:__early_ioremap+0x21/0x188
[ 0.018468] RSP: 0000:ffff88000005be00 EFLAGS: 00010202
[ 0.018821] RAX: 8000000000000163 RBX: 000000000009fc00 RCX: 0000000000000400
[ 0.019299] RDX: 0000000000000000 RSI: 0000000000000400 RDI: 000000000009fc00
[ 0.019775] RBP: 0000000000000400 R08: 0000000000000002 R09: 0000000000000000
[ 0.020000] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 0.020000] R13: ffffffffb60d7731 R14: 0000000000000000 R15: 0000000000000000
[ 0.020000] FS: 0000000000000000(0000) GS:ffffffffb5c35000(0000) knlGS:0000000000000000
[ 0.020000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.020000] CR2: 0000000000000000 CR3: 0000000012a18001 CR4: 00000000001606b0
[ 0.020000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 0.020000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 0.020000] Call Trace:
[ 0.020000] ? do_early_param+0x88/0x88
[ 0.020000] early_memremap+0x2c/0x2f
[ 0.020000] acpi_find_root_pointer+0x60/0x141
[ 0.020000] ? cnb20le_res+0x25a/0x25a
[ 0.020000] ? do_early_param+0x88/0x88
[ 0.020000] acpi_os_get_root_pointer+0x24/0x42
[ 0.020000] broadcom_postcore_init+0x5/0x3b
[ 0.020000] do_one_initcall+0x8f/0x170
[ 0.020000] ? do_early_param+0x88/0x88
[ 0.020000] kernel_init_freeable+0x111/0x19b
[ 0.020000] ? rest_init+0x130/0x130
[ 0.020000] kernel_init+0x5/0xe1
[ 0.020000] ret_from_fork+0x24/0x30
[ 0.020000] Code: fa fe 0f ff b8 01 00 00 00 c3 41 57 41 56 48 89 f1 41 55 41 54 55 53 48 83 ec 20 48 89 54 24 10 31 d2 83 3d 92 5d db ff 00 74 02 <0f> ff 48 8b 04 d5 e0 f4 16 b6 41 89 d6 48 85 c0 0f 84 1e 01 00
[ 0.020000] ---[ end trace 0313b755f35329ad ]---
[ 0.020015] ------------[ cut here ]------------
[ 0.020015] ------------[ cut here ]------------
[ 0.020340] WARNING: CPU: 0 PID: 1 at mm/early_ioremap.c:114 __early_ioremap+0x21/0x188
[ 0.020991] Modules linked in:
[ 0.021207] CPU: 0 PID: 1 Comm: swapper Tainted: G W 4.14.0-01247-g06222d8 #1
[ 0.021757] task: ffff880000055100 task.stack: ffff880000058000
[ 0.022158] RIP: 0010:__early_ioremap+0x21/0x188
[ 0.022470] RSP: 0000:ffff88000005be00 EFLAGS: 00010202
[ 0.022822] RAX: 8000000000000163 RBX: 00000000000e0000 RCX: 0000000000020000
[ 0.023302] RDX: 0000000000000000 RSI: 0000000000020000 RDI: 00000000000e0000
[ 0.023778] RBP: 0000000000020000 R08: 0000000000000400 R09: 0000000000000000
[ 0.024000] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 0.024000] R13: ffffffffff200c00 R14: 0000000000000000 R15: 0000000000000000
[ 0.024000] FS: 0000000000000000(0000) GS:ffffffffb5c35000(0000) knlGS:0000000000000000
[ 0.024000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.024000] CR2: 0000000000000000 CR3: 0000000012a18001 CR4: 00000000001606b0
[ 0.024000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 0.024000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 0.024000] Call Trace:
[ 0.024000] early_memremap+0x2c/0x2f
[ 0.024000] acpi_find_root_pointer+0xb8/0x141
[ 0.024000] ? cnb20le_res+0x25a/0x25a
[ 0.024000] ? do_early_param+0x88/0x88
[ 0.024000] acpi_os_get_root_pointer+0x24/0x42
[ 0.024000] broadcom_postcore_init+0x5/0x3b
[ 0.024000] do_one_initcall+0x8f/0x170
[ 0.024000] ? do_early_param+0x88/0x88
[ 0.024000] kernel_init_freeable+0x111/0x19b
[ 0.024000] ? rest_init+0x130/0x130
[ 0.024000] kernel_init+0x5/0xe1
[ 0.024000] ret_from_fork+0x24/0x30
[ 0.024000] Code: fa fe 0f ff b8 01 00 00 00 c3 41 57 41 56 48 89 f1 41 55 41 54 55 53 48 83 ec 20 48 89 54 24 10 31 d2 83 3d 92 5d db ff 00 74 02 <0f> ff 48 8b 04 d5 e0 f4 16 b6 41 89 d6 48 85 c0 0f 84 1e 01 00
[ 0.024000] ---[ end trace 0313b755f35329ae ]---
[ 0.024250] dca service started, version 1.12.1
# HH:MM RESULT GOOD BAD GOOD_BUT_DIRTY DIRTY_NOT_BAD
git bisect start 7f01082da66851579d8b65ab8411f0f5b8579d34 bebc6082da0a9f5d47a1ea2edc099bf671058bd4 --
git bisect good 820362aa942b1041a29e8f321d08e3605e9c1685 # 06:53 G 19 0 2 2 Merge 'dhowells-fs/rxrpc-fixes' into devel-catchup-201711241945
git bisect good f9cae3481c51684d55138f8c5bd89aa8566b31af # 07:18 G 19 0 1 1 Merge 'gvt-linux/gvt-next' into devel-catchup-201711241945
git bisect good a06e67eb8396da5f7db69a4bd21ccbf063690048 # 08:02 G 19 0 1 1 Merge 'tip/sched/urgent' into devel-catchup-201711241945
git bisect bad 959501409bc607a337b13d71012e20c37437cd12 # 08:41 B 0 8 23 1 Merge 'tip/master' into devel-catchup-201711241945
git bisect good f6751f178eeaf3da8c156d2a2fd7a0feccfab5ae # 09:27 G 19 0 0 0 tools/headers: Synchronize kernel x86 UAPI headers
git bisect good f1edf50fe0b9f78df5214f96e9a02221c88bf243 # 10:13 G 18 0 1 1 Merge branch 'WIP.timers'
git bisect good 0f973021371ba1bc27ed30fee64756562113bc08 # 10:49 G 19 0 10 10 x86/mm/kaiser: Prepare assembly for entry/exit CR3 switching
git bisect good 7334239d11aef5603241e24173a32abf7f0b8a5a # 11:16 G 19 0 9 10 x86/mm: Move CR3 construction functions
git bisect bad 10cc1871b0f8c49bebc40c2394c0e125b7ebf660 # 12:01 B 3 4 3 7 x86/mm/kaiser: Add debugfs file to turn KAISER on/off at runtime
git bisect good 5ab2af1e02f3b2fba5364984c7ef781fac4561f6 # 12:39 G 19 0 12 12 x86/mm: Allow flushing for future ASID switches
git bisect bad 4132e51f20ea0d05fedf0f1ba89200e4334b067d # 13:15 B 0 6 21 1 x86/mm/kaiser: Disable native VSYSCALL
git bisect bad 06222d856e45d727c18665ed37419d653f1dbef5 # 14:43 B 0 2 20 4 x86/mm/kaiser: Use PCID feature to make user and kernel switches faster
# first bad commit: [06222d856e45d727c18665ed37419d653f1dbef5] x86/mm/kaiser: Use PCID feature to make user and kernel switches faster
git bisect good 5ab2af1e02f3b2fba5364984c7ef781fac4561f6 # 15:21 G 58 0 21 37 x86/mm: Allow flushing for future ASID switches
# extra tests with debug options
git bisect bad 06222d856e45d727c18665ed37419d653f1dbef5 # 15:59 B 16 3 16 16 x86/mm/kaiser: Use PCID feature to make user and kernel switches faster
# extra tests on HEAD of linux-devel/devel-catchup-201711241945
git bisect bad 7f01082da66851579d8b65ab8411f0f5b8579d34 # 16:05 B 111 125 0 106 0day head guard for 'devel-catchup-201711241945'
# extra tests on tree/branch tip/WIP.x86/mm
git bisect bad 850f70b2343d7c69c8af560aa883c238a5c89701 # 17:33 B 15 18 15 19 x86/mm/kaiser: Add Kconfig
# extra tests with first bad commit reverted
git bisect good 6d9e29038f8f424f717f848b43295ee06ac845c0 # 18:08 G 39 0 12 12 Revert "x86/mm/kaiser: Use PCID feature to make user and kernel switches faster"
# extra tests on tree/branch tip/master
git bisect bad acdad0aa07134a0b5a74d3a37e9069f990b01a08 # 18:50 B 5 11 5 5 Merge branch 'WIP.x86/mm'
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/lkp Intel Corporation
Download attachment "dmesg-yocto-lkp-hsw01-101:20171125133335:x86_64-randconfig-s1-11241903:4.14.0-01247-g06222d8:1.gz" of type "application/gzip" (19154 bytes)
Download attachment "dmesg-vm-intel12-yocto-x86_64-12:20171125123704:x86_64-randconfig-s1-11241903:4.14.0-01246-g5ab2af1:1.gz" of type "application/gzip" (10979 bytes)
View attachment "reproduce-yocto-lkp-hsw01-101:20171125133335:x86_64-randconfig-s1-11241903:4.14.0-01247-g06222d8:1" of type "text/plain" (896 bytes)
View attachment "config-4.14.0-01247-g06222d8" of type "text/plain" (105127 bytes)
Powered by blists - more mailing lists