[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20251008201955.3919537-1-salil.mehta@opnsrc.net>
Date: Wed, 8 Oct 2025 20:19:55 +0000
From: salil.mehta@...src.net
To: linux-kernel@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org
Cc: salil.mehta@...wei.com,
jonathan.cameron@...wei.com,
maz@...nel.org,
will@...nel.org,
catalin.marinas@....com,
mark.rutland@....com,
james.morse@....com,
sudeep.holla@....com,
lpieralisi@...nel.org,
jean-philippe@...aro.org,
tglx@...utronix.de,
oliver.upton@...ux.dev,
peter.maydell@...aro.org,
richard.henderson@...aro.org,
andrew.jones@...ux.dev,
mst@...hat.com,
david@...hat.com,
philmd@...aro.org,
ardb@...nel.org,
borntraeger@...ux.ibm.com,
alex.bennee@...aro.org,
gustavo.romero@...aro.org,
npiggin@...il.com,
linux@...linux.org.uk,
karl.heubaum@...cle.com,
miguel.luis@...cle.com,
darren@...amperecomputing.com,
ilkka@...amperecomputing.com,
vishnu@...amperecomputing.com,
gankulkarni@...amperecomputing.com,
salil.mehta@...src.net,
wangyanan55@...wei.com,
wangzhou1@...ilicon.com,
linuxarm@...wei.com
Subject: [RFC PATCH] KVM: arm64: vgic-v3: Cache ICC_CTLR_EL1 and allow lockless read when ready
From: Salil Mehta <salil.mehta@...wei.com>
[A rough illustration of the problem and the probable solution]
Userspace reads of ICC_CTLR_EL1 via KVM device attributes currently takes a slow
path that may acquire all vCPU locks. Under workloads that exercise userspace
PSCI CPU_ON flows or frequent vCPU resets, this can cause vCPU lock contention
in KVM and, in the worst cases, -EBUSY returns to userspace.
When PSCI CPU_ON and CPU_OFF calls are handled entirely in KVM, these operations
are executed under KVM vCPU locks in the host kernel (EL1) and appear atomic to
other vCPU threads. In this context, system register accesses are serialized
under KVM vCPU locks, ensuring atomicity with respect to other vCPUs. After
SMCCC filtering was introduced, PSCI CPU_ON and CPU_OFF calls can now exit to
userspace (QEMU). During the handling of PSCI CPU_ON call in userspace, a
cpu_reset() is exerted which reads ICC_CTLR_EL1 through KVM device attribute
IOCTLs. To avoid transient inconsistency and -EBUSY errors, QEMU is forced to
pause all vCPUs before issuing these IOCTLs. Later operation should be avoided
as it ends up in freezing VM till the entire KVM device IOCTL operation is
completed (and this could happen every time when PSCI CPU_ON is requested by
the Guest user). There are also known issues with the implementaiton of the
{pause, resume}_all_vcpus() in the QEMU which can lead to race conditions.
Introduce a per-vCPU shadow of ICC_CTLR_EL1 that allows userspace to read this
register without taking any vCPU-wide locks once the VGIC has been initialized.
The new helper vgic_v3_try_icc_ctlr_el1_lockless_access() attempts a fast,
lockless read of the per-vCPU shadow and returns -EBUSY if the VGIC is not yet
initialized, allowing the caller to fall back to the existing contended path.
This reduces cross-vCPU contention and prevents spurious -EBUSY errors during
userspace-driven CPU_ON sequences, without changing architectural behaviour.
The shadow is updated in set_gic_ctlr(): after VMCR fields are written, we
recompute ICC_CTLR_EL1 via get_gic_ctlr(). Writes remain unchanged and continue
to use the existing synchronized path.
Without this change, QEMU must cache ICC_CTLR_EL1 itself to avoid racy
pause_all_vcpus() usage during cpu_reset(), as described in:
https://lore.kernel.org/qemu-devel/20251001010127.3092631-22-salil.mehta@opnsrc.net/
RFC Discussion (Architectural Considerations):
=============================================
It might have been helpful if the ARM architecture were to have clearer
boundaries between dynamic and static (or pseudo-static) system registers(?)
Registers such as ICC_CTLR_EL1 contain configuration fields that are effectively
static once the VM is initialized (even the EOIMode, CBPR & PMHE are guest
GICv3 driver configured constants), yet the architecture allows them to coexist
with other more dynamic runtime fields of the ICH_ VMCR_EL2. If these
configuration registers had been architecturally immutable after VM
initialization, their accesses could have avoided expensive all-vCPU locking
otherwise required to guarantee atomic visibility across vCPUs.
ICC_CTLR_EL1 [63:0]
63 32
+---------------------------------------------------------------------------+
| RES0 |
+---------------------------------------------------------------------------+
31 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
+------------+--+--+--+--+--+--+--+--+--+---+---+---+--+--+--+--+--+--+--+--+
| RES0 |Ex|RS|RES0 |A3|SE| IDbits | PRIbits |R0|PM| RES0 |EO|CB|
+------------+--+--+--+--+--+--+--+--+--+---+---+---+--+--+--+--+--+--+--+--+
| | | | | | | |
| | | | | | | +CBPR
| | | | | | +EOImode
| | | | | +-PMHE
| | | | +----RES0
| | | +--SEIS
| | +-----A3V
| +--------------RSS
+-----------------ExtRange
Access: {Ex, RS, A3, SE, IDbits, PRIbits} = RO;
{PMHE} = RW*;
{EO, CB} = RW**;
others = RES0.
Notes : * impl-def (may be RO when DS=0)
** CB may be RO when DS=0 (EO stays RW)
Source: Arm GIC Architecture Specification (IHI 0069H.b),
§12.2.6 “ICC_CTLR_EL1”, pp. 12-233…12-237
Signed-off-by: Salil Mehta <salil.mehta@...wei.com>
---
arch/arm64/kvm/vgic-sys-reg-v3.c | 14 ++++++++++++
arch/arm64/kvm/vgic/vgic-kvm-device.c | 32 +++++++++++++++++++++++++++
include/kvm/arm_vgic.h | 3 +++
3 files changed, 49 insertions(+)
diff --git a/arch/arm64/kvm/vgic-sys-reg-v3.c b/arch/arm64/kvm/vgic-sys-reg-v3.c
index bdc2d57370b2..80d6e38b4aa7 100644
--- a/arch/arm64/kvm/vgic-sys-reg-v3.c
+++ b/arch/arm64/kvm/vgic-sys-reg-v3.c
@@ -10,12 +10,17 @@
#include "vgic/vgic.h"
#include "sys_regs.h"
+static int get_gic_ctlr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r,
+ u64 *valp);
+
static int set_gic_ctlr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r,
u64 val)
{
u32 host_pri_bits, host_id_bits, host_seis, host_a3v, seis, a3v;
+ struct vgic_v3_cpu_if *cif = &vcpu->arch.vgic_cpu.vgic_v3;
struct vgic_cpu *vgic_v3_cpu = &vcpu->arch.vgic_cpu;
struct vgic_vmcr vmcr;
+ u64 sysreg;
vgic_get_vmcr(vcpu, &vmcr);
@@ -53,6 +58,15 @@ static int set_gic_ctlr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r,
vmcr.eoim = FIELD_GET(ICC_CTLR_EL1_EOImode_MASK, val);
vgic_set_vmcr(vcpu, &vmcr);
+ /* update the ICC_CTLR_EL1 shadow, if required */
+ get_gic_ctlr(vcpu, r, &sysreg);
+
+ /*
+ * Update the ICC_CTLR_EL1 shadow to allow lock free access of static
+ * or pesudo static register fields from KVM device IOCTLs
+ */
+ cif->icc_ctlr_el1_shadow = (u32)sysreg;
+
return 0;
}
diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c b/arch/arm64/kvm/vgic/vgic-kvm-device.c
index 3d1a776b716d..4a2971525c1a 100644
--- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
+++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
@@ -522,6 +522,29 @@ static bool reg_allowed_pre_init(struct kvm_device_attr *attr)
}
}
+/*
+ * Fast, lockless read of ICC_CTLR_EL1 for userspace ioctl. Saves calls from
+ * -EBUSY due to inability to acquire vCPU mutexes for all the vCPUs
+ */
+static inline int
+vgic_v3_try_icc_ctlr_el1_lockless_access(struct kvm_device *dev,
+ const struct kvm_device_attr *attr,
+ struct kvm_vcpu *vcpu)
+{
+ struct vgic_v3_cpu_if *vgic_cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
+ u32 __user *uaddr;
+ u32 shadow;
+
+ /* pathological check, in case we are here too early */
+ if (!vgic_initialized(dev->kvm))
+ return -EBUSY;
+
+ shadow = vgic_cpu_if->icc_ctlr_el1_shadow;
+ uaddr = (u32 __user *)(unsigned long)attr->addr;
+
+ return put_user(shadow, uaddr) ? -EFAULT : 0;
+}
+
/*
* vgic_v3_attr_regs_access - allows user space to access VGIC v3 state
*
@@ -533,6 +556,7 @@ static int vgic_v3_attr_regs_access(struct kvm_device *dev,
struct kvm_device_attr *attr,
bool is_write)
{
+ u32 sysreg = attr->attr & ~KVM_DEV_ARM_VGIC_CPUID_MASK;
struct vgic_reg_attr reg_attr;
gpa_t addr;
struct kvm_vcpu *vcpu;
@@ -549,6 +573,14 @@ static int vgic_v3_attr_regs_access(struct kvm_device *dev,
switch (attr->group) {
case KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS:
+ /* Try lockless ICC_CTLR_EL1 read first */
+ if (!is_write && sysreg == SYS_ICC_CTLR_EL1) {
+ ret = vgic_v3_try_icc_ctlr_el1_lockless_access(dev,
+ attr, vcpu);
+ if(!ret)
+ return ret;
+ }
+
/* Sysregs uaccess is performed by the sysreg handling code */
uaccess = false;
break;
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 4000ff16f295..42c767c62291 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -334,6 +334,9 @@ struct vgic_v3_cpu_if {
struct its_vpe its_vpe;
unsigned int used_lrs;
+
+ /* ICC_CTLR_EL1 shadow (published to readers) */
+ u32 icc_ctlr_el1_shadow; /* upper 32 bits of this reg are reserved */
};
struct vgic_cpu {
--
2.34.1
Powered by blists - more mailing lists