linux-kernel - Re: [PATCH v4] RISC-V: KVM: add KVM_CAP_RISCV_USERSPACE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250527-10d0318cf26e3602948545db@orel>
Date: Tue, 27 May 2025 08:58:45 +0200
From: Andrew Jones <ajones@...tanamicro.com>
To: Anup Patel <anup@...infault.org>
Cc: Radim Krčmář <rkrcmar@...tanamicro.com>, 
	kvm-riscv@...ts.infradead.org, kvm@...r.kernel.org, linux-riscv@...ts.infradead.org, 
	linux-kernel@...r.kernel.org, Atish Patra <atishp@...shpatra.org>, 
	Paul Walmsley <paul.walmsley@...ive.com>, Palmer Dabbelt <palmer@...belt.com>, 
	Albert Ou <aou@...s.berkeley.edu>, Alexandre Ghiti <alex@...ti.fr>
Subject: Re: [PATCH v4] RISC-V: KVM: add KVM_CAP_RISCV_USERSPACE_SBI

On Tue, May 27, 2025 at 09:23:05AM +0530, Anup Patel wrote:
> On Mon, May 26, 2025 at 8:09 PM Andrew Jones <ajones@...tanamicro.com> wrote:
> >
> > On Mon, May 26, 2025 at 06:12:19PM +0530, Anup Patel wrote:
> > > On Mon, May 26, 2025 at 2:52 PM Andrew Jones <ajones@...tanamicro.com> wrote:
> > > >
> > > > On Fri, May 23, 2025 at 01:33:49PM +0200, Radim Krčmář wrote:
> > > > > The new capability allows userspace to implement SBI extensions that KVM
> > > > > does not handle.  This allows userspace to implement any SBI ecall as
> > > > > userspace already has the ability to disable acceleration of selected
> > > > > SBI extensions.
> > > > > The base extension is made controllable as well, but only with the new
> > > > > capability, because it was previously handled specially for some reason.
> > > > > *** The related compatibility TODO in the code needs addressing. ***
> > > > >
> > > > > This is a VM capability, because userspace will most likely want to have
> > > > > the same behavior for all VCPUs.  We can easily make it both a VCPU and
> > > > > a VM capability if there is demand in the future.
> > > > >
> > > > > Signed-off-by: Radim Krčmář <rkrcmar@...tanamicro.com>
> > > > > ---
> > > > > v4:
> > > > > * forward base extension as well
> > > > > * change the id to 242, because 241 is already taken in linux-next
> > > > > * QEMU example: https://github.com/radimkrcmar/qemu/tree/mp_state_reset
> > > > > v3: new
> > > > > ---
> > > > >  Documentation/virt/kvm/api.rst    | 11 +++++++++++
> > > > >  arch/riscv/include/asm/kvm_host.h |  3 +++
> > > > >  arch/riscv/include/uapi/asm/kvm.h |  1 +
> > > > >  arch/riscv/kvm/vcpu_sbi.c         | 17 ++++++++++++++---
> > > > >  arch/riscv/kvm/vm.c               |  5 +++++
> > > > >  include/uapi/linux/kvm.h          |  1 +
> > > > >  6 files changed, 35 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > > > > index e107694fb41f..c9d627d13a5e 100644
> > > > > --- a/Documentation/virt/kvm/api.rst
> > > > > +++ b/Documentation/virt/kvm/api.rst
> > > > > @@ -8507,6 +8507,17 @@ given VM.
> > > > >  When this capability is enabled, KVM resets the VCPU when setting
> > > > >  MP_STATE_INIT_RECEIVED through IOCTL.  The original MP_STATE is preserved.
> > > > >
> > > > > +7.44 KVM_CAP_RISCV_USERSPACE_SBI
> > > > > +--------------------------------
> > > > > +
> > > > > +:Architectures: riscv
> > > > > +:Type: VM
> > > > > +:Parameters: None
> > > > > +:Returns: 0 on success, -EINVAL if arg[0] is not zero
> > > > > +
> > > > > +When this capability is enabled, KVM forwards ecalls from disabled or unknown
> > > > > +SBI extensions to userspace.
> > > > > +
> > > > >  8. Other capabilities.
> > > > >  ======================
> > > > >
> > > > > diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
> > > > > index 85cfebc32e4c..6f17cd923889 100644
> > > > > --- a/arch/riscv/include/asm/kvm_host.h
> > > > > +++ b/arch/riscv/include/asm/kvm_host.h
> > > > > @@ -122,6 +122,9 @@ struct kvm_arch {
> > > > >
> > > > >       /* KVM_CAP_RISCV_MP_STATE_RESET */
> > > > >       bool mp_state_reset;
> > > > > +
> > > > > +     /* KVM_CAP_RISCV_USERSPACE_SBI */
> > > > > +     bool userspace_sbi;
> > > > >  };
> > > > >
> > > > >  struct kvm_cpu_trap {
> > > > > diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h
> > > > > index 5f59fd226cc5..dd3a5dc53d34 100644
> > > > > --- a/arch/riscv/include/uapi/asm/kvm.h
> > > > > +++ b/arch/riscv/include/uapi/asm/kvm.h
> > > > > @@ -204,6 +204,7 @@ enum KVM_RISCV_SBI_EXT_ID {
> > > > >       KVM_RISCV_SBI_EXT_DBCN,
> > > > >       KVM_RISCV_SBI_EXT_STA,
> > > > >       KVM_RISCV_SBI_EXT_SUSP,
> > > > > +     KVM_RISCV_SBI_EXT_BASE,
> > > > >       KVM_RISCV_SBI_EXT_MAX,
> > > > >  };
> > > > >
> > > > > diff --git a/arch/riscv/kvm/vcpu_sbi.c b/arch/riscv/kvm/vcpu_sbi.c
> > > > > index 31fd3cc98d66..497d5b023153 100644
> > > > > --- a/arch/riscv/kvm/vcpu_sbi.c
> > > > > +++ b/arch/riscv/kvm/vcpu_sbi.c
> > > > > @@ -39,7 +39,7 @@ static const struct kvm_riscv_sbi_extension_entry sbi_ext[] = {
> > > > >               .ext_ptr = &vcpu_sbi_ext_v01,
> > > > >       },
> > > > >       {
> > > > > -             .ext_idx = KVM_RISCV_SBI_EXT_MAX, /* Can't be disabled */
> > > > > +             .ext_idx = KVM_RISCV_SBI_EXT_BASE,
> > > > >               .ext_ptr = &vcpu_sbi_ext_base,
> > > > >       },
> > > > >       {
> > > > > @@ -217,6 +217,11 @@ static int riscv_vcpu_set_sbi_ext_single(struct kvm_vcpu *vcpu,
> > > > >       if (!sext || scontext->ext_status[sext->ext_idx] == KVM_RISCV_SBI_EXT_STATUS_UNAVAILABLE)
> > > > >               return -ENOENT;
> > > > >
> > > > > +     // TODO: probably remove, the extension originally couldn't be
> > > > > +     // disabled, but it doesn't seem necessary
> > > > > +     if (!vcpu->kvm->arch.userspace_sbi && sext->ext_id == KVM_RISCV_SBI_EXT_BASE)
> > > > > +             return -ENOENT;
> > > > > +
> > > >
> > > > I agree that we don't need to babysit userspace and it's even conceivable
> > > > to have guests that don't need SBI. KVM should only need checks in its
> > > > UAPI to protect itself from userspace and to enforce proper use of the
> > > > API. It's not KVM's place to ensure userspace doesn't violate the SBI spec
> > > > or create broken guests (userspace is the boss, even if it's a boss that
> > > > doesn't make sense)
> > > >
> > > > So, I vote we drop the check.
> > > >
> > > > >       scontext->ext_status[sext->ext_idx] = (reg_val) ?
> > > > >                       KVM_RISCV_SBI_EXT_STATUS_ENABLED :
> > > > >                       KVM_RISCV_SBI_EXT_STATUS_DISABLED;
> > > > > @@ -471,8 +476,14 @@ int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu, struct kvm_run *run)
> > > > >  #endif
> > > > >               ret = sbi_ext->handler(vcpu, run, &sbi_ret);
> > > > >       } else {
> > > > > -             /* Return error for unsupported SBI calls */
> > > > > -             cp->a0 = SBI_ERR_NOT_SUPPORTED;
> > > > > +             if (vcpu->kvm->arch.userspace_sbi) {
> > > > > +                     next_sepc = false;
> > > > > +                     ret = 0;
> > > > > +                     kvm_riscv_vcpu_sbi_forward(vcpu, run);
> > > > > +             } else {
> > > > > +                     /* Return error for unsupported SBI calls */
> > > > > +                     cp->a0 = SBI_ERR_NOT_SUPPORTED;
> > > > > +             }
> > > > >               goto ecall_done;
> > > > >       }
> > > > >
> > > > > diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
> > > > > index b27ec8f96697..0b6378b83955 100644
> > > > > --- a/arch/riscv/kvm/vm.c
> > > > > +++ b/arch/riscv/kvm/vm.c
> > > > > @@ -217,6 +217,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
> > > > >                       return -EINVAL;
> > > > >               kvm->arch.mp_state_reset = true;
> > > > >               return 0;
> > > > > +     case KVM_CAP_RISCV_USERSPACE_SBI:
> > > > > +             if (cap->flags)
> > > > > +                     return -EINVAL;
> > > > > +             kvm->arch.userspace_sbi = true;
> > > > > +             return 0;
> > > > >       default:
> > > > >               return -EINVAL;
> > > > >       }
> > > > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > > > > index 454b7d4a0448..bf23deb6679e 100644
> > > > > --- a/include/uapi/linux/kvm.h
> > > > > +++ b/include/uapi/linux/kvm.h
> > > > > @@ -931,6 +931,7 @@ struct kvm_enable_cap {
> > > > >  #define KVM_CAP_X86_GUEST_MODE 238
> > > > >  #define KVM_CAP_ARM_WRITABLE_IMP_ID_REGS 239
> > > > >  #define KVM_CAP_RISCV_MP_STATE_RESET 240
> > > > > +#define KVM_CAP_RISCV_USERSPACE_SBI 242
> > > > >
> > > > >  struct kvm_irq_routing_irqchip {
> > > > >       __u32 irqchip;
> > > > > --
> > > > > 2.49.0
> > > > >
> > > >
> > > > Otherwise,
> > > >
> > > > Reviewed-by: Andrew Jones <ajones@...tanamicro.com>
> > >
> > > We are not going ahead with this approach for the reasons
> > > mentioned in v3 series [1].
> >
> > IIUC, the main concern in that thread is that userspace won't know what to
> > do with some of the exits it gets or that it'll try to take control of
> > extensions that it can't emulate. I feel like not exiting to userspace in
> > those cases is trying to second guess it, i.e. KVM is trying to enforce a
> > policy on userspace. But, KVM shouldn't be doing that, as userspace should
> > be the policy maker. If userspace uses this capability to opt into getting
> > all the SBI exits (which it doesn't want KVM to handle), then it should be
> > allowed to get them -- and, if userspace doesn't know what it's doing,
> > then it can keep all the pieces.
> 
> The userspace already has a mechanism to opt-in for select SBI exits
> which it can implement such as SBI DBCN and SBI SUSP. With SBI v3.0,
> userspace will be able implement SBI MPXY but SBI SSE, FWFT, and
> DBTR will be in kernel space due to reasons mentioned in the v3 series.
> 
> There is no point in forwarding all SBI exits to userspace when userspace
> has no mechanism to implement many critical SBI extensions.

Userspace can implement all truly disabled extensions, it just returns
not-supported. So, while I may be contriving this example a bit, enabling
userspace to log attempts to use disabled extensions would be one reason
to forward them.

Thanks,
drew