[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z79ZJVOHtNu6YsVt@google.com>
Date: Wed, 26 Feb 2025 18:10:45 +0000
From: Quentin Perret <qperret@...gle.com>
To: Marc Zyngier <maz@...nel.org>
Cc: Oliver Upton <oliver.upton@...ux.dev>, Joey Gouly <joey.gouly@....com>,
Suzuki K Poulose <suzuki.poulose@....com>,
Zenghui Yu <yuzenghui@...wei.com>,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>, linux-arm-kernel@...ts.infradead.org,
kvmarm@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 3/4] KVM: arm64: Selftest for pKVM transitions
On Wednesday 26 Feb 2025 at 14:32:52 (+0000), Marc Zyngier wrote:
> On Tue, 25 Feb 2025 01:53:26 +0000,
> Quentin Perret <qperret@...gle.com> wrote:
> >
> > We have recently found a bug [1] in the pKVM memory ownership
> > transitions by code inspection, but it could have been caught with a
> > test.
> >
> > Introduce a boot-time selftest exercising all the known pKVM memory
> > transitions and importantly checks the rejection of illegal transitions.
> >
> > The new test is hidden behind a new Kconfig option separate from
> > CONFIG_EL2_NVHE_DEBUG on purpose as that has side effects on the
> > transition checks ([1] doesn't reproduce with EL2 debug enabled).
>
> That's a bit annoying, isn't it? Without EL2_DEBUG selected, you won't
> get any stacktrace, and the WARN_ON()s are a guaranteed panic. Yes,
> this is better than nothing, but I'm a bit worried this is going to be
> hard to use.
Right, so you _can_ enable EL2_DEBUG on top of the selftest stuff, and
if you're not hitting one of those hard-to-find bugs described in the
commit message above, then you're golden. In practice I suspect that if
enabling the selftest alone leads to a panic, the next logical step is
to enable EL2_DEBUG and see what you get. If enabling EL2_DEBUG makes
the issue go away, then that'll require digging a bit deeper, but that
should be pretty rare I presume.
> Is there a way to reduce the impact the EL2 debug has on the rest of
> the code? It feels like it is more invasive than it should be...
Turns out I have a WiP series that moves the hypervisor ownership state
to the hyp_vmemmap, similar to what we did for the host ownership. A
nice property of that is that hyp state lookups become really cheap, no
page-table walks required. So we could probably afford to drop the
EL2_DEBUG ifdefery in host_share_hyp() and friends, and just
unconditionally cross-check the hyp state on all transitions where it is
involved. And with that we should probably just fold the pkvm selftest
under EL2_DEBUG and call it a day. Would that work?
Thanks,
Quentin
Powered by blists - more mailing lists