linux-kernel - Re: [PATCH v4] RISC-V: KVM: add KVM_CAP_RISCV_USERSPACE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAhSdy0uX9dEPdGiMgDcYbR6WS+Nc=V3rLvK_TEoA=fzeD9wBg@mail.gmail.com>
Date: Tue, 27 May 2025 09:23:05 +0530
From: Anup Patel <anup@...infault.org>
To: Andrew Jones <ajones@...tanamicro.com>
Cc: Radim Krčmář <rkrcmar@...tanamicro.com>, 
	kvm-riscv@...ts.infradead.org, kvm@...r.kernel.org, 
	linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org, 
	Atish Patra <atishp@...shpatra.org>, Paul Walmsley <paul.walmsley@...ive.com>, 
	Palmer Dabbelt <palmer@...belt.com>, Albert Ou <aou@...s.berkeley.edu>, 
	Alexandre Ghiti <alex@...ti.fr>
Subject: Re: [PATCH v4] RISC-V: KVM: add KVM_CAP_RISCV_USERSPACE_SBI

On Mon, May 26, 2025 at 8:09 PM Andrew Jones <ajones@...tanamicro.com> wrote:
>
> On Mon, May 26, 2025 at 06:12:19PM +0530, Anup Patel wrote:
> > On Mon, May 26, 2025 at 2:52 PM Andrew Jones <ajones@...tanamicro.com> wrote:
> > >
> > > On Fri, May 23, 2025 at 01:33:49PM +0200, Radim Krčmář wrote:
> > > > The new capability allows userspace to implement SBI extensions that KVM
> > > > does not handle.  This allows userspace to implement any SBI ecall as
> > > > userspace already has the ability to disable acceleration of selected
> > > > SBI extensions.
> > > > The base extension is made controllable as well, but only with the new
> > > > capability, because it was previously handled specially for some reason.
> > > > *** The related compatibility TODO in the code needs addressing. ***
> > > >
> > > > This is a VM capability, because userspace will most likely want to have
> > > > the same behavior for all VCPUs.  We can easily make it both a VCPU and
> > > > a VM capability if there is demand in the future.
> > > >
> > > > Signed-off-by: Radim Krčmář <rkrcmar@...tanamicro.com>
> > > > ---
> > > > v4:
> > > > * forward base extension as well
> > > > * change the id to 242, because 241 is already taken in linux-next
> > > > * QEMU example: https://github.com/radimkrcmar/qemu/tree/mp_state_reset
> > > > v3: new
> > > > ---
> > > >  Documentation/virt/kvm/api.rst    | 11 +++++++++++
> > > >  arch/riscv/include/asm/kvm_host.h |  3 +++
> > > >  arch/riscv/include/uapi/asm/kvm.h |  1 +
> > > >  arch/riscv/kvm/vcpu_sbi.c         | 17 ++++++++++++++---
> > > >  arch/riscv/kvm/vm.c               |  5 +++++
> > > >  include/uapi/linux/kvm.h          |  1 +
> > > >  6 files changed, 35 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > > > index e107694fb41f..c9d627d13a5e 100644
> > > > --- a/Documentation/virt/kvm/api.rst
> > > > +++ b/Documentation/virt/kvm/api.rst
> > > > @@ -8507,6 +8507,17 @@ given VM.
> > > >  When this capability is enabled, KVM resets the VCPU when setting
> > > >  MP_STATE_INIT_RECEIVED through IOCTL.  The original MP_STATE is preserved.
> > > >
> > > > +7.44 KVM_CAP_RISCV_USERSPACE_SBI
> > > > +--------------------------------
> > > > +
> > > > +:Architectures: riscv
> > > > +:Type: VM
> > > > +:Parameters: None
> > > > +:Returns: 0 on success, -EINVAL if arg[0] is not zero
> > > > +
> > > > +When this capability is enabled, KVM forwards ecalls from disabled or unknown
> > > > +SBI extensions to userspace.
> > > > +
> > > >  8. Other capabilities.
> > > >  ======================
> > > >
> > > > diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
> > > > index 85cfebc32e4c..6f17cd923889 100644
> > > > --- a/arch/riscv/include/asm/kvm_host.h
> > > > +++ b/arch/riscv/include/asm/kvm_host.h
> > > > @@ -122,6 +122,9 @@ struct kvm_arch {
> > > >
> > > >       /* KVM_CAP_RISCV_MP_STATE_RESET */
> > > >       bool mp_state_reset;
> > > > +
> > > > +     /* KVM_CAP_RISCV_USERSPACE_SBI */
> > > > +     bool userspace_sbi;
> > > >  };
> > > >
> > > >  struct kvm_cpu_trap {
> > > > diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h
> > > > index 5f59fd226cc5..dd3a5dc53d34 100644
> > > > --- a/arch/riscv/include/uapi/asm/kvm.h
> > > > +++ b/arch/riscv/include/uapi/asm/kvm.h
> > > > @@ -204,6 +204,7 @@ enum KVM_RISCV_SBI_EXT_ID {
> > > >       KVM_RISCV_SBI_EXT_DBCN,
> > > >       KVM_RISCV_SBI_EXT_STA,
> > > >       KVM_RISCV_SBI_EXT_SUSP,
> > > > +     KVM_RISCV_SBI_EXT_BASE,
> > > >       KVM_RISCV_SBI_EXT_MAX,
> > > >  };
> > > >
> > > > diff --git a/arch/riscv/kvm/vcpu_sbi.c b/arch/riscv/kvm/vcpu_sbi.c
> > > > index 31fd3cc98d66..497d5b023153 100644
> > > > --- a/arch/riscv/kvm/vcpu_sbi.c
> > > > +++ b/arch/riscv/kvm/vcpu_sbi.c
> > > > @@ -39,7 +39,7 @@ static const struct kvm_riscv_sbi_extension_entry sbi_ext[] = {
> > > >               .ext_ptr = &vcpu_sbi_ext_v01,
> > > >       },
> > > >       {
> > > > -             .ext_idx = KVM_RISCV_SBI_EXT_MAX, /* Can't be disabled */
> > > > +             .ext_idx = KVM_RISCV_SBI_EXT_BASE,
> > > >               .ext_ptr = &vcpu_sbi_ext_base,
> > > >       },
> > > >       {
> > > > @@ -217,6 +217,11 @@ static int riscv_vcpu_set_sbi_ext_single(struct kvm_vcpu *vcpu,
> > > >       if (!sext || scontext->ext_status[sext->ext_idx] == KVM_RISCV_SBI_EXT_STATUS_UNAVAILABLE)
> > > >               return -ENOENT;
> > > >
> > > > +     // TODO: probably remove, the extension originally couldn't be
> > > > +     // disabled, but it doesn't seem necessary
> > > > +     if (!vcpu->kvm->arch.userspace_sbi && sext->ext_id == KVM_RISCV_SBI_EXT_BASE)
> > > > +             return -ENOENT;
> > > > +
> > >
> > > I agree that we don't need to babysit userspace and it's even conceivable
> > > to have guests that don't need SBI. KVM should only need checks in its
> > > UAPI to protect itself from userspace and to enforce proper use of the
> > > API. It's not KVM's place to ensure userspace doesn't violate the SBI spec
> > > or create broken guests (userspace is the boss, even if it's a boss that
> > > doesn't make sense)
> > >
> > > So, I vote we drop the check.
> > >
> > > >       scontext->ext_status[sext->ext_idx] = (reg_val) ?
> > > >                       KVM_RISCV_SBI_EXT_STATUS_ENABLED :
> > > >                       KVM_RISCV_SBI_EXT_STATUS_DISABLED;
> > > > @@ -471,8 +476,14 @@ int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu, struct kvm_run *run)
> > > >  #endif
> > > >               ret = sbi_ext->handler(vcpu, run, &sbi_ret);
> > > >       } else {
> > > > -             /* Return error for unsupported SBI calls */
> > > > -             cp->a0 = SBI_ERR_NOT_SUPPORTED;
> > > > +             if (vcpu->kvm->arch.userspace_sbi) {
> > > > +                     next_sepc = false;
> > > > +                     ret = 0;
> > > > +                     kvm_riscv_vcpu_sbi_forward(vcpu, run);
> > > > +             } else {
> > > > +                     /* Return error for unsupported SBI calls */
> > > > +                     cp->a0 = SBI_ERR_NOT_SUPPORTED;
> > > > +             }
> > > >               goto ecall_done;
> > > >       }
> > > >
> > > > diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
> > > > index b27ec8f96697..0b6378b83955 100644
> > > > --- a/arch/riscv/kvm/vm.c
> > > > +++ b/arch/riscv/kvm/vm.c
> > > > @@ -217,6 +217,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
> > > >                       return -EINVAL;
> > > >               kvm->arch.mp_state_reset = true;
> > > >               return 0;
> > > > +     case KVM_CAP_RISCV_USERSPACE_SBI:
> > > > +             if (cap->flags)
> > > > +                     return -EINVAL;
> > > > +             kvm->arch.userspace_sbi = true;
> > > > +             return 0;
> > > >       default:
> > > >               return -EINVAL;
> > > >       }
> > > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > > > index 454b7d4a0448..bf23deb6679e 100644
> > > > --- a/include/uapi/linux/kvm.h
> > > > +++ b/include/uapi/linux/kvm.h
> > > > @@ -931,6 +931,7 @@ struct kvm_enable_cap {
> > > >  #define KVM_CAP_X86_GUEST_MODE 238
> > > >  #define KVM_CAP_ARM_WRITABLE_IMP_ID_REGS 239
> > > >  #define KVM_CAP_RISCV_MP_STATE_RESET 240
> > > > +#define KVM_CAP_RISCV_USERSPACE_SBI 242
> > > >
> > > >  struct kvm_irq_routing_irqchip {
> > > >       __u32 irqchip;
> > > > --
> > > > 2.49.0
> > > >
> > >
> > > Otherwise,
> > >
> > > Reviewed-by: Andrew Jones <ajones@...tanamicro.com>
> >
> > We are not going ahead with this approach for the reasons
> > mentioned in v3 series [1].
>
> IIUC, the main concern in that thread is that userspace won't know what to
> do with some of the exits it gets or that it'll try to take control of
> extensions that it can't emulate. I feel like not exiting to userspace in
> those cases is trying to second guess it, i.e. KVM is trying to enforce a
> policy on userspace. But, KVM shouldn't be doing that, as userspace should
> be the policy maker. If userspace uses this capability to opt into getting
> all the SBI exits (which it doesn't want KVM to handle), then it should be
> allowed to get them -- and, if userspace doesn't know what it's doing,
> then it can keep all the pieces.

The userspace already has a mechanism to opt-in for select SBI exits
which it can implement such as SBI DBCN and SBI SUSP. With SBI v3.0,
userspace will be able implement SBI MPXY but SBI SSE, FWFT, and
DBTR will be in kernel space due to reasons mentioned in the v3 series.

There is no point in forwarding all SBI exits to userspace when userspace
has no mechanism to implement many critical SBI extensions.

Regards,
Anup