lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <868qu63mdo.wl-maz@kernel.org>
Date: Tue, 29 Oct 2024 16:27:31 +0000
From: Marc Zyngier <maz@...nel.org>
To: Raghavendra Rao Ananta <rananta@...gle.com>
Cc: Oliver Upton <oliver.upton@...ux.dev>,
	linux-arm-kernel@...ts.infradead.org,
	kvmarm@...ts.linux.dev,
	linux-kernel@...r.kernel.org,
	kvm@...r.kernel.org,
	stable@...r.kernel.org,
	syzbot <syzkaller@...glegroups.com>
Subject: Re: [PATCH v2] KVM: arm64: Get rid of userspace_irqchip_in_use

On Mon, 28 Oct 2024 23:45:33 +0000,
Raghavendra Rao Ananta <rananta@...gle.com> wrote:
> 
> Improper use of userspace_irqchip_in_use led to syzbot hitting the
> following WARN_ON() in kvm_timer_update_irq():
> 
> WARNING: CPU: 0 PID: 3281 at arch/arm64/kvm/arch_timer.c:459
> kvm_timer_update_irq+0x21c/0x394
> Call trace:
>   kvm_timer_update_irq+0x21c/0x394 arch/arm64/kvm/arch_timer.c:459
>   kvm_timer_vcpu_reset+0x158/0x684 arch/arm64/kvm/arch_timer.c:968
>   kvm_reset_vcpu+0x3b4/0x560 arch/arm64/kvm/reset.c:264
>   kvm_vcpu_set_target arch/arm64/kvm/arm.c:1553 [inline]
>   kvm_arch_vcpu_ioctl_vcpu_init arch/arm64/kvm/arm.c:1573 [inline]
>   kvm_arch_vcpu_ioctl+0x112c/0x1b3c arch/arm64/kvm/arm.c:1695
>   kvm_vcpu_ioctl+0x4ec/0xf74 virt/kvm/kvm_main.c:4658
>   vfs_ioctl fs/ioctl.c:51 [inline]
>   __do_sys_ioctl fs/ioctl.c:907 [inline]
>   __se_sys_ioctl fs/ioctl.c:893 [inline]
>   __arm64_sys_ioctl+0x108/0x184 fs/ioctl.c:893
>   __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
>   invoke_syscall+0x78/0x1b8 arch/arm64/kernel/syscall.c:49
>   el0_svc_common+0xe8/0x1b0 arch/arm64/kernel/syscall.c:132
>   do_el0_svc+0x40/0x50 arch/arm64/kernel/syscall.c:151
>   el0_svc+0x54/0x14c arch/arm64/kernel/entry-common.c:712
>   el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
>   el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
> 
> The following sequence led to the scenario:
>  - Userspace creates a VM and a vCPU.
>  - The vCPU is initialized with KVM_ARM_VCPU_PMU_V3 during
>    KVM_ARM_VCPU_INIT.
>  - Without any other setup, such as vGIC or vPMU, userspace issues
>    KVM_RUN on the vCPU. Since the vPMU is requested, but not setup,
>    kvm_arm_pmu_v3_enable() fails in kvm_arch_vcpu_run_pid_change().
>    As a result, KVM_RUN returns after enabling the timer, but before
>    incrementing 'userspace_irqchip_in_use':
>    kvm_arch_vcpu_run_pid_change()
>        ret = kvm_arm_pmu_v3_enable()
>            if (!vcpu->arch.pmu.created)
>                return -EINVAL;
>        if (ret)
>            return ret;
>        [...]
>        if (!irqchip_in_kernel(kvm))
>            static_branch_inc(&userspace_irqchip_in_use);
>  - Userspace ignores the error and issues KVM_ARM_VCPU_INIT again.
>    Since the timer is already enabled, control moves through the
>    following flow, ultimately hitting the WARN_ON():
>    kvm_timer_vcpu_reset()
>        if (timer->enabled)
>           kvm_timer_update_irq()
>               if (!userspace_irqchip())
>                   ret = kvm_vgic_inject_irq()
>                       ret = vgic_lazy_init()
>                           if (unlikely(!vgic_initialized(kvm)))
>                               if (kvm->arch.vgic.vgic_model !=
>                                   KVM_DEV_TYPE_ARM_VGIC_V2)
>                                       return -EBUSY;
>                   WARN_ON(ret);
> 
> Theoretically, since userspace_irqchip_in_use's functionality can be

nit: this isn't theoretical at all.

> simply replaced by '!irqchip_in_kernel()', get rid of the static key
> to avoid the mismanagement, which also helps with the syzbot issue.

Did you have a chance to check whether this had any negative impact on
actual workloads? Since the entry/exit code is a bit of a hot spot,
I'd like to make sure we're not penalising the common case (I only
wrote this patch while waiting in an airport, and didn't test it at
all).

Any such data about it would be very welcome in the commit message.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ