lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200729165524.GA4178@lca.pw>
Date:   Wed, 29 Jul 2020 12:55:25 -0400
From:   Qian Cai <cai@....pw>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     LKML <linux-kernel@...r.kernel.org>, x86@...nel.org,
        linux-arch@...r.kernel.org, Will Deacon <will@...nel.org>,
        Arnd Bergmann <arnd@...db.de>,
        Mark Rutland <mark.rutland@....com>,
        Kees Cook <keescook@...omium.org>,
        Keno Fischer <keno@...iacomputing.com>,
        Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org,
        Gabriel Krisman Bertazi <krisman@...labora.com>,
        Sean Christopherson <sean.j.christopherson@...el.com>,
        sfr@...b.auug.org.au, linux-next@...r.kernel.org
Subject: Re: [patch V5 05/15] entry: Provide infrastructure for work before
 transitioning to guest mode

On Wed, Jul 22, 2020 at 11:59:59PM +0200, Thomas Gleixner wrote:
> From: Thomas Gleixner <tglx@...utronix.de>
> 
> Entering a guest is similar to exiting to user space. Pending work like
> handling signals, rescheduling, task work etc. needs to be handled before
> that.
> 
> Provide generic infrastructure to avoid duplication of the same handling
> code all over the place.
> 
> The transfer to guest mode handling is different from the exit to usermode
> handling, e.g. vs. rseq and live patching, so a separate function is used.
> 
> The initial list of work items handled is:
> 
>     TIF_SIGPENDING, TIF_NEED_RESCHED, TIF_NOTIFY_RESUME
> 
> Architecture specific TIF flags can be added via defines in the
> architecture specific include files.
> 
> The calling convention is also different from the syscall/interrupt entry
> functions as KVM invokes this from the outer vcpu_run() loop with
> interrupts and preemption enabled. To prevent missing a pending work item
> it invokes a check for pending TIF work from interrupt disabled code right
> before transitioning to guest mode. The lockdep, RCU and tracing state
> handling is also done directly around the switch to and from guest mode.
> 
> Signed-off-by: Thomas Gleixner <tglx@...utronix.de>

SR-IOV will start trigger a warning below in this commit,

A reproducer:
# echo 1 > /sys/class/net/eno58/device/sriov_numvfs (bnx2x)
# git clone https://gitlab.com/cailca/linux-mm
# cd linux-mm; make
# ./random -x 0-100 -k0000:02:09.0

https://gitlab.com/cailca/linux-mm/-/blob/master/random.c#L1247
(x86.config is also included in the repo.)

[  765.409027] ------------[ cut here ]------------
[  765.434611] WARNING: CPU: 13 PID: 3377 at include/linux/entry-kvm.h:75 kvm_arch_vcpu_ioctl_run+0xb52/0x1320 [kvm]
[  765.487229] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio loop nls_ascii nls_cp437 vfat fat
[  766.118568] CPU: 13 PID: 3377 Comm: qemu-kvm Not tainted 5.8.0-rc7-next-20200729 #2
[  766.147011]  ? kthread_create_worker_on_cpu+0xc0/0xc0
[  766.147016]  ret_from_fork+0x22/0x30
[  766.782999] Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018
[  766.817799] RIP: 0010:kvm_arch_vcpu_ioctl_run+0xb52/0x1320 [kvm]
[  766.850054] Code: e8 03 0f b6 04 18 84 c0 74 06 0f 8e 4a 03 00 00 41 c6 85 50 2d 00 00 00 e9 24 f8 ff ff 4c 89 ef e8 93 a8 02 00 e9 3d f8 ff ff <0f> 0b e9 f2 f8 ff ff 48 81 c6 c0 01 00 00 4c 89 ef e8 18 4c fe
 ff
[  766.942485] RSP: 0018:ffffc90007017b58 EFLAGS: 00010202
[  766.970899] RAX: 0000000000000001 RBX: dffffc0000000000 RCX: ffffffffc0f0ef8a
[  767.008488] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88956a3821a0
[  767.045726] RBP: ffffc90007017bb8 R08: ffffed1269290010 R09: ffffed1269290010
[  767.083242] R10: ffff88934948007f R11: ffffed126929000f R12: ffff889349480424
[  767.120809] R13: ffff889349480040 R14: ffff88934948006c R15: ffff88980e2da000
[  767.160531] FS:  00007f12b0e98700(0000) GS:ffff88985fa40000(0000) knlGS:0000000000000000
[  767.203199] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  767.235083] CR2: 0000000000000000 CR3: 000000120d186002 CR4: 00000000001726e0
[  767.272303] Call Trace:
[  767.272303] Call Trace:
[  767.286947]  kvm_vcpu_ioctl+0x3e9/0xb30 [kvm]
[  767.311162]  ? kvm_vcpu_block+0xd30/0xd30 [kvm]
[  767.335281]  ? find_held_lock+0x33/0x1c0
[  767.357492]  ? __fget_files+0x1cb/0x300
[  767.378687]  ? lock_downgrade+0x730/0x730
[  767.401410]  ? rcu_read_lock_held+0xaa/0xc0
[  767.424228]  ? rcu_read_lock_sched_held+0xd0/0xd0
[  767.450078]  ? find_held_lock+0x33/0x1c0
[  767.471876]  ? __fget_files+0x1e5/0x300
[  767.493841]  __x64_sys_ioctl+0x315/0x102f
[  767.516579]  ? generic_block_fiemap+0x60/0x60
[  767.540678]  ? syscall_enter_from_user_mode+0x1b/0x210
[  767.568612]  ? rcu_read_lock_sched_held+0xaa/0xd0
[  767.593950]  ? lockdep_hardirqs_on_prepare+0x33e/0x4e0
[  767.621355]  ? syscall_enter_from_user_mode+0x20/0x210
[  767.649283]  ? trace_hardirqs_on+0x20/0x1b5
[  767.673770]  do_syscall_64+0x33/0x40
[  767.694699]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  767.723392] RIP: 0033:0x7f12b999a87b
[  767.744252] Code: 0f 1e fa 48 8b 05 0d 96 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 95 2c 00 f7 d8 64 89  8
[  767.837418] RSP: 002b:00007f12b0e97678 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  767.877074] RAX: ffffffffffffffda RBX: 000055712c857c60 RCX: 00007f12b999a87b
[  767.914242] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000014
[  767.951945] RBP: 000055712c857cfb R08: 000055712ac4fad0 R09: 000055712b3f2080
[  767.989036] R10: 0000000000000000 R11: 0000000000000246 R12: 000055712ac38100
[  768.025851] R13: 00007ffd996e9edf R14: 00007f12becc5000 R15: 000055712c857c60
[  768.063363] irq event stamp: 5257
[  768.082559] hardirqs last  enabled at (5273): [<ffffffffa1800ce2>] asm_sysvec_call_function_single+0x12/0x20
[  768.132704] hardirqs last disabled at (5290): [<ffffffffa179ae3d>] irqentry_enter+0x1d/0x50
[  768.176563] softirqs last  enabled at (5108): [<ffffffffa1a0070f>] __do_softirq+0x70f/0xa9f
[  768.221270] softirqs last disabled at (5093): [<ffffffffa1800ec2>] asm_call_on_stack+0x12/0x20
[  768.267273] ---[ end trace 8730450ad8cfee9f ]---

> ---
> V5: Rename exit -> xfer (Sean)
> 
> V3: Reworked and simplified version adopted to recent X86 and KVM changes
>     
> V2: Moved KVM specific functions to kvm (Paolo)
>     Added lockdep assert (Andy)
>     Dropped live patching from enter guest mode work (Miroslav)
> ---
>  include/linux/entry-kvm.h |   80 ++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/kvm_host.h  |    8 ++++
>  kernel/entry/Makefile     |    3 +
>  kernel/entry/kvm.c        |   51 +++++++++++++++++++++++++++++
>  virt/kvm/Kconfig          |    3 +
>  5 files changed, 144 insertions(+), 1 deletion(-)
> 
> --- /dev/null
> +++ b/include/linux/entry-kvm.h
> @@ -0,0 +1,80 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __LINUX_ENTRYKVM_H
> +#define __LINUX_ENTRYKVM_H
> +
> +#include <linux/entry-common.h>
> +
> +/* Transfer to guest mode work */
> +#ifdef CONFIG_KVM_XFER_TO_GUEST_WORK
> +
> +#ifndef ARCH_XFER_TO_GUEST_MODE_WORK
> +# define ARCH_XFER_TO_GUEST_MODE_WORK	(0)
> +#endif
> +
> +#define XFER_TO_GUEST_MODE_WORK					\
> +	(_TIF_NEED_RESCHED | _TIF_SIGPENDING |			\
> +	 _TIF_NOTIFY_RESUME | ARCH_XFER_TO_GUEST_MODE_WORK)
> +
> +struct kvm_vcpu;
> +
> +/**
> + * arch_xfer_to_guest_mode_work - Architecture specific xfer to guest mode
> + *				  work function.
> + * @vcpu:	Pointer to current's VCPU data
> + * @ti_work:	Cached TIF flags gathered in xfer_to_guest_mode()
> + *
> + * Invoked from xfer_to_guest_mode_work(). Defaults to NOOP. Can be
> + * replaced by architecture specific code.
> + */
> +static inline int arch_xfer_to_guest_mode_work(struct kvm_vcpu *vcpu,
> +					      unsigned long ti_work);
> +
> +#ifndef arch_xfer_to_guest_mode_work
> +static inline int arch_xfer_to_guest_mode_work(struct kvm_vcpu *vcpu,
> +					       unsigned long ti_work)
> +{
> +	return 0;
> +}
> +#endif
> +
> +/**
> + * xfer_to_guest_mode - Check and handle pending work which needs to be
> + *			handled before returning to guest mode
> + * @vcpu:	Pointer to current's VCPU data
> + *
> + * Returns: 0 or an error code
> + */
> +int xfer_to_guest_mode(struct kvm_vcpu *vcpu);
> +
> +/**
> + * __xfer_to_guest_mode_work_pending - Check if work is pending
> + *
> + * Returns: True if work pending, False otherwise.
> + *
> + * Bare variant of xfer_to_guest_mode_work_pending(). Can be called from
> + * interrupt enabled code for racy quick checks with care.
> + */
> +static inline bool __xfer_to_guest_mode_work_pending(void)
> +{
> +	unsigned long ti_work = READ_ONCE(current_thread_info()->flags);
> +
> +	return !!(ti_work & XFER_TO_GUEST_MODE_WORK);
> +}
> +
> +/**
> + * xfer_to_guest_mode_work_pending - Check if work is pending which needs to be
> + *				     handled before returning to guest mode
> + *
> + * Returns: True if work pending, False otherwise.
> + *
> + * Has to be invoked with interrupts disabled before the transition to
> + * guest mode.
> + */
> +static inline bool xfer_to_guest_mode_work_pending(void)
> +{
> +	lockdep_assert_irqs_disabled();
> +	return __xfer_to_guest_mode_work_pending();
> +}
> +#endif /* CONFIG_KVM_XFER_TO_GUEST_WORK */
> +
> +#endif
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1439,4 +1439,12 @@ int kvm_vm_create_worker_thread(struct k
>  				uintptr_t data, const char *name,
>  				struct task_struct **thread_ptr);
>  
> +#ifdef CONFIG_KVM_XFER_TO_GUEST_WORK
> +static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu)
> +{
> +	vcpu->run->exit_reason = KVM_EXIT_INTR;
> +	vcpu->stat.signal_exits++;
> +}
> +#endif /* CONFIG_KVM_XFER_TO_GUEST_WORK */
> +
>  #endif
> --- a/kernel/entry/Makefile
> +++ b/kernel/entry/Makefile
> @@ -9,4 +9,5 @@ KCOV_INSTRUMENT := n
>  CFLAGS_REMOVE_common.o	 = -fstack-protector -fstack-protector-strong
>  CFLAGS_common.o		+= -fno-stack-protector
>  
> -obj-$(CONFIG_GENERIC_ENTRY) += common.o
> +obj-$(CONFIG_GENERIC_ENTRY) 		+= common.o
> +obj-$(CONFIG_KVM_XFER_TO_GUEST_WORK)	+= kvm.o
> --- /dev/null
> +++ b/kernel/entry/kvm.c
> @@ -0,0 +1,51 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/entry-kvm.h>
> +#include <linux/kvm_host.h>
> +
> +static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu, unsigned long ti_work)
> +{
> +	do {
> +		int ret;
> +
> +		if (ti_work & _TIF_SIGPENDING) {
> +			kvm_handle_signal_exit(vcpu);
> +			return -EINTR;
> +		}
> +
> +		if (ti_work & _TIF_NEED_RESCHED)
> +			schedule();
> +
> +		if (ti_work & _TIF_NOTIFY_RESUME) {
> +			clear_thread_flag(TIF_NOTIFY_RESUME);
> +			tracehook_notify_resume(NULL);
> +		}
> +
> +		ret = arch_xfer_to_guest_mode_work(vcpu, ti_work);
> +		if (ret)
> +			return ret;
> +
> +		ti_work = READ_ONCE(current_thread_info()->flags);
> +	} while (ti_work & XFER_TO_GUEST_MODE_WORK || need_resched());
> +	return 0;
> +}
> +
> +int xfer_to_guest_mode(struct kvm_vcpu *vcpu)
> +{
> +	unsigned long ti_work;
> +
> +	/*
> +	 * This is invoked from the outer guest loop with interrupts and
> +	 * preemption enabled.
> +	 *
> +	 * KVM invokes xfer_to_guest_mode_work_pending() with interrupts
> +	 * disabled in the inner loop before going into guest mode. No need
> +	 * to disable interrupts here.
> +	 */
> +	ti_work = READ_ONCE(current_thread_info()->flags);
> +	if (!(ti_work & XFER_TO_GUEST_MODE_WORK))
> +		return 0;
> +
> +	return xfer_to_guest_mode_work(vcpu, ti_work);
> +}
> +EXPORT_SYMBOL_GPL(xfer_to_guest_mode);
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -60,3 +60,6 @@ config HAVE_KVM_VCPU_RUN_PID_CHANGE
>  
>  config HAVE_KVM_NO_POLL
>         bool
> +
> +config KVM_XFER_TO_GUEST_WORK
> +       bool
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ