[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aR8DM6Xf8cmTGVR0@google.com>
Date: Thu, 20 Nov 2025 12:01:55 +0000
From: Vincent Donnefort <vdonnefort@...gle.com>
To: Marc Zyngier <maz@...nel.org>
Cc: rostedt@...dmis.org, mhiramat@...nel.org,
mathieu.desnoyers@...icios.com, linux-trace-kernel@...r.kernel.org,
oliver.upton@...ux.dev, joey.gouly@....com, suzuki.poulose@....com,
yuzenghui@...wei.com, kvmarm@...ts.linux.dev,
linux-arm-kernel@...ts.infradead.org, jstultz@...gle.com,
qperret@...gle.com, will@...nel.org, aneesh.kumar@...nel.org,
kernel-team@...roid.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v8 21/28] KVM: arm64: Add tracing capability for the pKVM
hyp
On Wed, Nov 19, 2025 at 05:06:38PM +0000, Marc Zyngier wrote:
> On Fri, 07 Nov 2025 09:38:33 +0000,
> Vincent Donnefort <vdonnefort@...gle.com> wrote:
> >
> > When running with protected mode, the host has very little knowledge
> > about what is happening in the hypervisor. Of course this is an
> > essential feature for security but nonetheless, that piece of code
> > growing with more responsibilities, we need now a way to debug and
> > profile it. Tracefs by its reliability, versatility and support for
> > user-space is the perfect tool.
> >
> > There's no way the hypervisor could log events directly into the host
> > tracefs ring-buffers. So instead let's use our own, where the hypervisor
> > is the writer and the host the reader.
> >
> > Signed-off-by: Vincent Donnefort <vdonnefort@...gle.com>
> >
> > diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> > index 9da54d4ee49e..ad02dee140d3 100644
> > --- a/arch/arm64/include/asm/kvm_asm.h
> > +++ b/arch/arm64/include/asm/kvm_asm.h
> > @@ -89,6 +89,10 @@ enum __kvm_host_smccc_func {
> > __KVM_HOST_SMCCC_FUNC___pkvm_vcpu_load,
> > __KVM_HOST_SMCCC_FUNC___pkvm_vcpu_put,
> > __KVM_HOST_SMCCC_FUNC___pkvm_tlb_flush_vmid,
> > + __KVM_HOST_SMCCC_FUNC___pkvm_load_tracing,
> > + __KVM_HOST_SMCCC_FUNC___pkvm_unload_tracing,
> > + __KVM_HOST_SMCCC_FUNC___pkvm_enable_tracing,
> > + __KVM_HOST_SMCCC_FUNC___pkvm_swap_reader_tracing,
> > };
> >
> > #define DECLARE_KVM_VHE_SYM(sym) extern char sym[]
> > diff --git a/arch/arm64/include/asm/kvm_hyptrace.h b/arch/arm64/include/asm/kvm_hyptrace.h
> > new file mode 100644
> > index 000000000000..9c30a479bc36
> > --- /dev/null
> > +++ b/arch/arm64/include/asm/kvm_hyptrace.h
> > @@ -0,0 +1,13 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +#ifndef __ARM64_KVM_HYPTRACE_H_
> > +#define __ARM64_KVM_HYPTRACE_H_
> > +
> > +#include <linux/ring_buffer.h>
> > +
> > +struct hyp_trace_desc {
> > + unsigned long bpages_backing_start;
>
> Why is this an integer type? You keep casting it all over the place,
> which tells me that's not the ideal type.
That's because it is a kern VA the hyp needs to convert. However it would indeed
make my life easier to declare it as a struct simple_buffer_page * in the
struct hyp_trace_buffer below.
>
> > + size_t bpages_backing_size;
> > + struct trace_buffer_desc trace_buffer_desc;
> > +
> > +};
> > +#endif
> > diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> > index 4f803fd1c99a..580426cdbe77 100644
> > --- a/arch/arm64/kvm/Kconfig
> > +++ b/arch/arm64/kvm/Kconfig
> > @@ -83,4 +83,11 @@ config PTDUMP_STAGE2_DEBUGFS
> >
> > If in doubt, say N.
> >
> > +config PKVM_TRACING
> > + bool
> > + depends on KVM
> > + depends on TRACING
> > + select SIMPLE_RING_BUFFER
> > + default y
>
> I'd rather this is made to depend on NVHE_EL2_DEBUG, just like the
> other debug options.
NVHE_EL2_DEBUG is unsafe for production because of the stage-2 relax on panic.
While this one is. So ideally this should be usable even without NVHE_EL2_DEBUG.
I can remove this hidden PKVM_TRACING option and use everywhere CONFIG_TRACING. But
then I need something to select SIMPLE_RING_BUFFER.
Perhaps with the following?
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 2ae6bf499236..c561bf9d4754 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -38,6 +38,7 @@ menuconfig KVM
select SCHED_INFO
select GUEST_PERF_EVENTS if PERF_EVENTS
select KVM_GUEST_MEMFD
+ select SIMPLE_RING_BUFFER if CONFIG_TRACING
>
> > +
> > endif # VIRTUALIZATION
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/trace.h b/arch/arm64/kvm/hyp/include/nvhe/trace.h
> > new file mode 100644
> > index 000000000000..996e90c0974f
> > --- /dev/null
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/trace.h
> > @@ -0,0 +1,23 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +#ifndef __ARM64_KVM_HYP_NVHE_TRACE_H
> > +#define __ARM64_KVM_HYP_NVHE_TRACE_H
> > +#include <asm/kvm_hyptrace.h>
> > +
> > +#ifdef CONFIG_PKVM_TRACING
> > +void *tracing_reserve_entry(unsigned long length);
> > +void tracing_commit_entry(void);
> > +
> > +int __pkvm_load_tracing(unsigned long desc_va, size_t desc_size);
> > +void __pkvm_unload_tracing(void);
> > +int __pkvm_enable_tracing(bool enable);
> > +int __pkvm_swap_reader_tracing(unsigned int cpu);
> > +#else
> > +static inline void *tracing_reserve_entry(unsigned long length) { return NULL; }
> > +static inline void tracing_commit_entry(void) { }
> > +
> > +static inline int __pkvm_load_tracing(unsigned long desc_va, size_t desc_size) { return -ENODEV; }
> > +static inline void __pkvm_unload_tracing(void) { }
> > +static inline int __pkvm_enable_tracing(bool enable) { return -ENODEV; }
> > +static inline int __pkvm_swap_reader_tracing(unsigned int cpu) { return -ENODEV; }
> > +#endif
> > +#endif
> > diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
> > index f55a9a17d38f..504c3b9caef8 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/Makefile
> > +++ b/arch/arm64/kvm/hyp/nvhe/Makefile
> > @@ -29,7 +29,7 @@ hyp-obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
> > ../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o
> > hyp-obj-y += ../../../kernel/smccc-call.o
> > hyp-obj-$(CONFIG_LIST_HARDENED) += list_debug.o
> > -hyp-obj-$(CONFIG_PKVM_TRACING) += clock.o
> > +hyp-obj-$(CONFIG_PKVM_TRACING) += clock.o trace.o ../../../../../kernel/trace/simple_ring_buffer.o
>
> Can we get something less awful here? Surely there is a way to get an
> absolute path from the kbuild infrastructure? $(objtree) springs to
> mind...
Ack.
[...]
> > +int __pkvm_load_tracing(unsigned long desc_hva, size_t desc_size)
> > +{
> > + struct hyp_trace_desc *desc = (struct hyp_trace_desc *)kern_hyp_va(desc_hva);
> > + int ret;
> > +
> > + if (!desc_size || !PAGE_ALIGNED(desc_hva) || !PAGE_ALIGNED(desc_size))
> > + return -EINVAL;
> > +
> > + ret = __pkvm_host_donate_hyp(hyp_virt_to_pfn((void *)desc),
> > + desc_size >> PAGE_SHIFT);
> > + if (ret)
> > + return ret;
> > +
> > + if (!hyp_trace_desc_validate(desc, desc_size))
> > + goto err_donate_desc;
> > +
> > + hyp_spin_lock(&trace_buffer.lock);
> > +
> > + ret = hyp_trace_buffer_load(&trace_buffer, desc);
> > +
> > + hyp_spin_unlock(&trace_buffer.lock);
> > +
> > +err_donate_desc:
> > + WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn((void *)desc),
> > + desc_size >> PAGE_SHIFT));
>
> That's basically a guaranteed panic if anything goes wrong. Are you
> sure you want to do that?
A failure would mean a lost page for the kernel. As there's really no reason for
this to happen (the host_donate_hyp worked few lines above), it sounds alright
to panic here in this case.
In reclaim_pgtable_pages() applies the same reasoning: if hyp_donate_host fails,
something really wrong happened.
>
> > + return ret;
> > +}
> > +
> > +void __pkvm_unload_tracing(void)
> > +{
> > + hyp_spin_lock(&trace_buffer.lock);
> > + hyp_trace_buffer_unload(&trace_buffer);
> > + hyp_spin_unlock(&trace_buffer.lock);
> > +}
> > +
> > +int __pkvm_enable_tracing(bool enable)
> > +{
> > + int cpu, ret = enable ? -EINVAL : 0;
> > +
> > + hyp_spin_lock(&trace_buffer.lock);
> > +
> > + if (!hyp_trace_buffer_loaded(&trace_buffer))
> > + goto unlock;
> > +
> > + for (cpu = 0; cpu < hyp_nr_cpus; cpu++)
> > + simple_ring_buffer_enable_tracing(per_cpu_ptr(trace_buffer.simple_rbs, cpu),
> > + enable);
> > +
> > + ret = 0;
> > +
> > +unlock:
> > + hyp_spin_unlock(&trace_buffer.lock);
> > +
> > + return ret;
> > +}
> > +
> > +int __pkvm_swap_reader_tracing(unsigned int cpu)
> > +{
> > + int ret;
> > +
> > + if (cpu >= hyp_nr_cpus)
> > + return -EINVAL;
> > +
> > + hyp_spin_lock(&trace_buffer.lock);
> > +
> > + if (hyp_trace_buffer_loaded(&trace_buffer))
> > + ret = simple_ring_buffer_swap_reader_page(
> > + per_cpu_ptr(trace_buffer.simple_rbs, cpu));
>
> Please keep these things on a single line. I don't care what people
> (of checkpatch) say.
Ack.
>
> > + else
> > + ret = -ENODEV;
> > +
> > + hyp_spin_unlock(&trace_buffer.lock);
> > +
> > + return ret;
> > +}
> > --
> > 2.51.2.1041.gc1ab5b90ca-goog
> >
> >
>
> Thanks,
>
> M.
>
> --
> Without deviation from the norm, progress is not possible.
Powered by blists - more mailing lists