[<prev] [next>] [day] [month] [year] [list]
Message-ID: <5a1e5611.ka9AywLfcGqFxI3n%fengguang.wu@intel.com>
Date: Wed, 29 Nov 2017 14:39:13 +0800
From: kernel test robot <fengguang.wu@...el.com>
To: Dave Hansen <dave.hansen@...ux.intel.com>
Cc: LKP <lkp@...org>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>, wfg@...ux.intel.com
Subject: b345a34006 ("x86/mm/kaiser: Use PCID feature to make user and
.."): WARNING: possible circular locking dependency detected
Greetings,
0day kernel testing robot got the below dmesg and the first bad commit is
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
commit b345a34006d85c6cc2fd37baddce5bdbc0b3aef6
Author: Dave Hansen <dave.hansen@...ux.intel.com>
AuthorDate: Wed Nov 22 16:35:09 2017 -0800
Commit: Ingo Molnar <mingo@...nel.org>
CommitDate: Mon Nov 27 15:04:35 2017 +0100
x86/mm/kaiser: Use PCID feature to make user and kernel switches faster
Short summary: Use x86 PCID feature to avoid flushing the TLB at all
interrupts and syscalls. Speed them up. Makes context switches
and TLB flushing slower.
Background:
KAISER keeps two copies of the page tables. Switches between the
copies are performed by writing to the CR3 register. But, CR3
was really designed for context switches and writes to it also
flush the entire TLB (modulo global pages). This TLB flush
increases the cost of interrupts and context switches. For
syscall-heavy microbenchmarks it can cut the rate of syscalls by 2/3.
The kernel recently gained support for and Intel CPU feature
called Process Context IDentifiers (PCID) thanks to Andy
Lutomirski. This feature is intended to allow you to switch
between contexts without flushing the TLB.
Implementation:
PCIDs can be used to avoid flushing the TLB at kernel entry/exit.
This is speeds up both interrupts and syscalls.
First, the kernel and userspace must be assigned different ASIDs.
On entry from userspace, move over to the kernel page tables
*and* ASID. On exit, restore the user page tables and ASID.
Fortunately, the ASID is programmed via CR3, which is already
being used to switch between the user and kernel page tables.
This gives us convenient, one-stop shopping.
The CR3 write which is used to switch between processes provides
all the TLB flushing normally required at context switch time.
But, with KAISER, that CR3 write only flushes the current
(kernel) ASID. An extra TLB flush operation is now required in
order to flush the user ASID. This new instruction (INVPCID) is
probably ~100 cycles, but this is done with the assumption that
the time lost in context switches is more than made up for by
lower cost of interrupts and syscalls.
Support:
PCIDs are generally available on Sandybridge and newer CPUs. However,
the accompanying INVPCID instruction did not become available until
Haswell (the ones with "v4", or called fourth-generation Core). This
instruction allows non-current-PCID TLB entries to be flushed without
switching CR3 and global pages to be flushed without a double
MOV-to-CR4.
Without INVPCID, PCIDs are much harder to use. TLB invalidation gets
much more onerous:
1. Every kernel TLB flush (even for a single page) requires an
interrupts-off MOV-to-CR4 which is very expensive. This is because
there is no way to flush a kernel address that might be loaded
in *EVERY* PCID. Right now, there are "only" ~12 of these per-CPU,
but that's too painful to use the MOV-to-CR3 to flush them. That
leaves only the MOV-to-CR4.
2. Every userspace flush (even for a single page requires one of the
following:
a. A pair of flushing (bit 63 clear) CR3 writes: one for
the kernel ASID and another for userspace.
b. A pair of non-flushing CR3 writes (bit 63 set) with the
flush done for each. For instance, what is currently a
single instruction without KAISER:
invpcid_flush_one(current_pcid, addr);
becomes this with KAISER:
invpcid_flush_one(current_kern_pcid, addr);
invpcid_flush_one(current_user_pcid, addr);
and this without INVPCID:
__native_flush_tlb_single(addr);
write_cr3(mm->pgd | current_user_pcid | NOFLUSH);
__native_flush_tlb_single(addr);
write_cr3(mm->pgd | current_kern_pcid | NOFLUSH);
So, for now, fully disable PCIDs with KAISER when INVPCID is not
available. This is fixable, but it's an optimization that can be
performed later.
Hugh Dickins also points out that PCIDs really have two distinct
use-cases in the context of KAISER. The first way they can be used
is as "TLB preservation across context-switch", which is what
Andy Lutomirksi's 4.14 PCID code does. They can also be used as
a "KAISER syscall/interrupt accelerator". If we just use them to
speed up syscall/interrupts (and ignore the context-switch TLB
preservation), then the deficiency of not having INVPCID
becomes much less onerous.
Signed-off-by: Dave Hansen <dave.hansen@...ux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
Cc: Andy Lutomirski <luto@...nel.org>
Cc: Borislav Petkov <bp@...en8.de>
Cc: Brian Gerst <brgerst@...il.com>
Cc: Denys Vlasenko <dvlasenk@...hat.com>
Cc: H. Peter Anvin <hpa@...or.com>
Cc: Josh Poimboeuf <jpoimboe@...hat.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Peter Zijlstra <peterz@...radead.org>
Cc: Rik van Riel <riel@...hat.com>
Cc: daniel.gruss@...k.tugraz.at
Cc: hughd@...gle.com
Cc: keescook@...gle.com
Cc: linux-mm@...ck.org
Cc: michael.schwarz@...k.tugraz.at
Cc: moritz.lipp@...k.tugraz.at
Cc: richard.fellner@...dent.tugraz.at
Link: https://lkml.kernel.org/r/20171123003509.EC42DD15@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@...nel.org>
e794054d5a x86/mm: Allow flushing for future ASID switches
b345a34006 x86/mm/kaiser: Use PCID feature to make user and kernel switches faster
5bef2980ad Add linux-next specific files for 20171128
+-------------------------------------------------------+------------+------------+---------------+
| | e794054d5a | b345a34006 | next-20171128 |
+-------------------------------------------------------+------------+------------+---------------+
| boot_successes | 24 | 0 | 0 |
| boot_failures | 37 | 20 | 49 |
| WARNING:possible_circular_locking_dependency_detected | 37 | 11 | 45 |
| IP-Config:Auto-configuration_of_network_failed | 2 | 0 | 18 |
| kernel_BUG_at_arch/x86/kernel/mpparse.c | 0 | 9 | 4 |
| PANIC:early_exception | 0 | 9 | 4 |
| RIP:default_get_smp_config | 0 | 9 | 4 |
| kernel_BUG_at_drivers/base/driver.c | 0 | 7 | 8 |
| invalid_opcode:#[##] | 0 | 7 | 8 |
| RIP:driver_register | 0 | 7 | 8 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 7 | 8 |
+-------------------------------------------------------+------------+------------+---------------+
[ 0.983311] Unpacking initramfs...
[ 1.066328] Freeing initrd memory: 3932K
[ 1.067813] platform rtc_cmos: registered platform RTC device (no PNP device found)
[ 1.584102]
[ 1.584309] ======================================================
[ 1.584858] WARNING: possible circular locking dependency detected
[ 1.585360] 4.14.0-01253-gb345a340 #1 Not tainted
[ 1.585748] ------------------------------------------------------
[ 1.586248] kworker/0:1/14 is trying to acquire lock:
[ 1.586662] (ww_class_mutex){+.+.}, at: [<ffffffff810a9ca2>] test_abba_work+0x37/0xb7
[ 1.587312]
[ 1.587312] but now in release context of a crosslock acquired at the following:
[ 1.588004] ((completion)&abba.b_ready){+.+.}, at: [<ffffffff810aa5d6>] test_abba+0x146/0x234
[ 1.588004]
[ 1.588004] which lock already depends on the new lock.
[ 1.588004]
[ 1.588004] the existing dependency chain (in reverse order) is:
[ 1.588004]
[ 1.588004] -> #1 ((completion)&abba.b_ready){+.+.}:
[ 1.588004] __lock_acquire+0xb86/0xe99
[ 1.588004] test_abba+0x146/0x234
[ 1.588004] schedule_timeout+0x0/0xd3
[ 1.588004] lock_acquire+0x82/0xad
[ 1.588004] test_abba+0x146/0x234
[ 1.588004] __wait_for_common+0x57/0x219
[ 1.588004] test_abba+0x146/0x234
[ 1.588004] mark_held_locks+0x50/0x6d
[ 1.588004] _raw_spin_unlock_irqrestore+0x3d/0x61
[ 1.588004] test_abba+0x146/0x234
[ 1.588004] test_abba_work+0x0/0xb7
[ 1.588004] test_ww_mutex_init+0xe4/0x44d
[ 1.588004] test_ww_mutex_init+0x0/0x44d
[ 1.588004] do_one_initcall+0xd2/0x24c
[ 1.588004] parse_args+0x135/0x221
[ 1.588004] kernel_init_freeable+0x153/0x279
[ 1.588004] kernel_init+0x0/0xe6
[ 1.588004] kernel_init+0x5/0xe6
[ 1.588004] ret_from_fork+0x24/0x30
[ 1.588004]
[ 1.588004] -> #0 (ww_class_mutex){+.+.}:
[ 1.588004] test_abba_work+0x37/0xb7
[ 1.588004]
[ 1.588004] other info that might help us debug this:
[ 1.588004]
[ 1.588004] Possible unsafe locking scenario by crosslock:
[ 1.588004]
[ 1.588004] CPU0 CPU1
[ 1.588004] ---- ----
[ 1.588004] lock(ww_class_mutex);
[ 1.588004] lock((completion)&abba.b_ready);
[ 1.588004] lock(ww_class_mutex);
[ 1.588004] unlock((completion)&abba.b_ready);
[ 1.588004]
[ 1.588004] *** DEADLOCK ***
[ 1.588004]
[ 1.588004] 5 locks held by kworker/0:1/14:
[ 1.588004] #0: ((wq_completion)"events"){+.+.}, at: [<ffffffff8109036a>] process_one_work+0x164/0x303
[ 1.588004] #1: ((work_completion)(&abba.work)){+.+.}, at: [<ffffffff8109036a>] process_one_work+0x164/0x303
[ 1.588004] #2: (ww_class_acquire){+.+.}, at: [<ffffffff810a9c97>] test_abba_work+0x2c/0xb7
[ 1.588004] #3: (ww_class_mutex){+.+.}, at: [<ffffffff810a9ca2>] test_abba_work+0x37/0xb7
[ 1.588004] #4: (&x->wait#5){....}, at: [<ffffffff8109f90a>] complete+0x13/0x4b
[ 1.588004]
[ 1.588004] stack backtrace:
[ 1.588004] CPU: 0 PID: 14 Comm: kworker/0:1 Not tainted 4.14.0-01253-gb345a340 #1
[ 1.588004] Workqueue: events test_abba_work
[ 1.588004] Call Trace:
[ 1.588004] ? print_circular_bug+0x2a0/0x2ae
[ 1.588004] ? check_prev_add+0x95/0x253
[ 1.588004] ? look_up_lock_class+0x114/0x114
[ 1.588004] ? lock_commit_crosslock+0x322/0x3e1
[ 1.588004] ? complete+0x1f/0x4b
[ 1.588004] ? test_abba_work+0x43/0xb7
[ 1.588004] ? process_one_work+0x1d5/0x303
[ 1.588004] ? process_one_work+0x164/0x303
[ 1.588004] ? process_scheduled_works+0x27/0x27
[ 1.588004] ? worker_thread+0x1a7/0x25d
[ 1.588004] ? process_scheduled_works+0x27/0x27
[ 1.588004] ? kthread+0x126/0x12e
[ 1.588004] ? __list_del_entry+0x1d/0x1d
[ 1.588004] ? ret_from_fork+0x24/0x30
[ 2.080076] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x25641074d3b, max_idle_ns: 440795244898 ns
[ 7.641811] Initialise system trusted keyrings
[ 7.642238] workingset: timestamp_bits=46 max_order=17 bucket_order=0
[ 7.642782] zbud: loaded
[ 7.643661] QNX4 filesystem 0.2.3 registered.
# HH:MM RESULT GOOD BAD GOOD_BUT_DIRTY DIRTY_NOT_BAD
git bisect start 5bef2980adef8a6032d4f4709aebe9486181052f 4fbd8d194f06c8a3fd2af1ce560ddb31f7ec8323 --
git bisect good 6dae39d936707941d0c1fce8426028c01203d050 # 07:00 G 13 0 8 10 Merge remote-tracking branch 'hwmon-staging/hwmon-next'
git bisect good 6959ac327b8a044f95a4485aaa6f4e2b1f7084a3 # 07:58 G 13 0 8 8 Merge remote-tracking branch 'vfio/next'
git bisect bad 2ca8454a2f35c2780adeadf8a92561e2bbfcd235 # 08:25 B 4 3 4 4 Merge remote-tracking branch 'staging/staging-next'
git bisect bad 0d1f02010f2b7705f76bab40d6ee8ea9c7d2598e # 08:53 B 2 9 2 2 Merge remote-tracking branch 'percpu/for-next'
git bisect bad bdc88674ac338d2d1f769f80b87dddf46e951c90 # 09:36 B 6 6 6 10 Merge remote-tracking branch 'clockevents/clockevents/next'
git bisect good f85fb1ec7c5d1334a2083f931386963db58e52f9 # 10:06 G 16 0 7 11 Merge remote-tracking branch 'spi/for-next'
git bisect bad 102f0423f953969b3191c603fabc64338f6adcd7 # 10:50 B 6 6 6 6 Merge remote-tracking branch 'tip/auto-latest'
git bisect good f6751f178eeaf3da8c156d2a2fd7a0feccfab5ae # 11:23 G 16 0 5 9 tools/headers: Synchronize kernel x86 UAPI headers
git bisect good e0c2535f18156c71b68b25861e76e06ca77151e5 # 11:44 G 16 0 10 12 Merge branch 'x86/urgent'
git bisect good 83529b2d6168ee82520a4ec7cc3df9b18603aea4 # 12:02 G 16 0 11 11 x86/mm/kaiser: Prepare the x86/entry assembly code for entry/exit CR3 switching
git bisect good 01e673bc37a640d9708bf9e7f30ad06a89ecafae # 12:15 G 15 0 11 11 x86/mm: Remove hard-coded ASID limit checks
git bisect bad 293a2ca794ee17d490daa036171f79dd090fbc8c # 12:30 B 2 6 2 3 x86/mm/kaiser: Respect disabled CPU features
git bisect bad b345a34006d85c6cc2fd37baddce5bdbc0b3aef6 # 12:47 B 9 7 9 11 x86/mm/kaiser: Use PCID feature to make user and kernel switches faster
git bisect good e794054d5a5dd62e38ed47be37072f7d2ed7879b # 13:17 G 17 0 6 8 x86/mm: Allow flushing for future ASID switches
# first bad commit: [b345a34006d85c6cc2fd37baddce5bdbc0b3aef6] x86/mm/kaiser: Use PCID feature to make user and kernel switches faster
git bisect good e794054d5a5dd62e38ed47be37072f7d2ed7879b # 13:34 G 51 0 27 37 x86/mm: Allow flushing for future ASID switches
# extra tests with debug options
git bisect bad b345a34006d85c6cc2fd37baddce5bdbc0b3aef6 # 13:50 B 5 12 5 5 x86/mm/kaiser: Use PCID feature to make user and kernel switches faster
# extra tests on HEAD of linux-next/master
git bisect bad 5bef2980adef8a6032d4f4709aebe9486181052f # 13:51 B 0 8 101 41 Add linux-next specific files for 20171128
# extra tests on tree/branch linux-next/master
git bisect bad 5bef2980adef8a6032d4f4709aebe9486181052f # 13:52 B 0 8 101 41 Add linux-next specific files for 20171128
# extra tests with first bad commit reverted
git bisect good a4f77bcdc73cb380ae912949b819359287ac7185 # 14:37 G 17 0 7 7 Revert "x86/mm/kaiser: Use PCID feature to make user and kernel switches faster"
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/lkp Intel Corporation
Download attachment "dmesg-yocto-lkp-hsw01-106:20171129124645:x86_64-randconfig-s2-11282118:4.14.0-01253-gb345a340:1.gz" of type "application/gzip" (18652 bytes)
Download attachment "dmesg-vm-ivb41-yocto-ia32-29:20171129130752:x86_64-randconfig-s2-11282118:4.14.0-01252-ge794054:1.gz" of type "application/gzip" (21730 bytes)
View attachment "reproduce-yocto-lkp-hsw01-106:20171129124645:x86_64-randconfig-s2-11282118:4.14.0-01253-gb345a340:1" of type "text/plain" (896 bytes)
View attachment "config-4.14.0-01253-gb345a340" of type "text/plain" (114116 bytes)
Powered by blists - more mailing lists