linux-kernel - Re: [PATCH v3 1/1] KVM: arm64: Allow cacheable stage 2 mapping using VMA flags

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <86r033olwv.wl-maz@kernel.org>
Date: Tue, 11 Mar 2025 11:18:40 +0000
From: Marc Zyngier <maz@...nel.org>
To: Ankit Agrawal <ankita@...dia.com>
Cc: Jason Gunthorpe <jgg@...dia.com>,
	"oliver.upton@...ux.dev"
	<oliver.upton@...ux.dev>,
	"joey.gouly@....com" <joey.gouly@....com>,
	"suzuki.poulose@....com" <suzuki.poulose@....com>,
	"yuzenghui@...wei.com"
	<yuzenghui@...wei.com>,
	"catalin.marinas@....com" <catalin.marinas@....com>,
	"will@...nel.org" <will@...nel.org>,
	"ryan.roberts@....com"
	<ryan.roberts@....com>,
	"shahuang@...hat.com" <shahuang@...hat.com>,
	"lpieralisi@...nel.org" <lpieralisi@...nel.org>,
	"david@...hat.com"
	<david@...hat.com>,
	Aniket Agashe <aniketa@...dia.com>,
	Neo Jia
	<cjia@...dia.com>,
	Kirti Wankhede <kwankhede@...dia.com>,
	"Tarun Gupta\
 (SW-GPU)" <targupta@...dia.com>,
	Vikram Sethi <vsethi@...dia.com>,
	Andy
 Currid <acurrid@...dia.com>,
	Alistair Popple <apopple@...dia.com>,
	John
 Hubbard <jhubbard@...dia.com>,
	Dan Williams <danw@...dia.com>,
	Zhi Wang
	<zhiw@...dia.com>,
	Matt Ochs <mochs@...dia.com>,
	Uday Dhoke
	<udhoke@...dia.com>,
	Dheeraj Nigam <dnigam@...dia.com>,
	Krishnakant Jaju
	<kjaju@...dia.com>,
	"alex.williamson@...hat.com"
	<alex.williamson@...hat.com>,
	"sebastianene@...gle.com"
	<sebastianene@...gle.com>,
	"coltonlewis@...gle.com" <coltonlewis@...gle.com>,
	"kevin.tian@...el.com" <kevin.tian@...el.com>,
	"yi.l.liu@...el.com"
	<yi.l.liu@...el.com>,
	"ardb@...nel.org" <ardb@...nel.org>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"gshan@...hat.com"
	<gshan@...hat.com>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"ddutile@...hat.com" <ddutile@...hat.com>,
	"tabba@...gle.com"
	<tabba@...gle.com>,
	"qperret@...gle.com" <qperret@...gle.com>,
	"seanjc@...gle.com" <seanjc@...gle.com>,
	"kvmarm@...ts.linux.dev"
	<kvmarm@...ts.linux.dev>,
	"linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>,
	"linux-arm-kernel@...ts.infradead.org"
	<linux-arm-kernel@...ts.infradead.org>
Subject: Re: [PATCH v3 1/1] KVM: arm64: Allow cacheable stage 2 mapping using VMA flags

On Tue, 11 Mar 2025 03:42:23 +0000,
Ankit Agrawal <ankita@...dia.com> wrote:
> 
> >> +     /*
> >> +      *  When FWB is unsupported KVM needs to do cache flushes
> >> +      *  (via dcache_clean_inval_poc()) of the underlying memory. This is
> >> +      *  only possible if the memory is already mapped into the kernel map.
> >> +      *
> >> +      *  Outright reject as the cacheable device memory is not present in
> >> +      *  the kernel map and not suitable for cache management.
> >> +      */
> >> +     if (cacheable_devmem && !stage2_has_fwb(pgt)) {
> >> +             ret = -EINVAL;
> >> +             goto out_unlock;
> >> +     }
> >> +
> >
> > These new error reasons should at least be complemented by an
> > equivalent check at the point where the memslot is registered. It
> 
> Understood. I can add such check in kvm_arch_prepare_memory_region().
> 
> 
> > maybe OK to blindly return an error at fault time (because userspace
> > has messed with the mapping behind our back), but there should at
> > least be something telling a well behaved userspace that there is a
> > bunch of combination we're unwilling to support.
> 
> How about WARN_ON() or BUG() for the faulty situation?

Absolutely not. Do you really want any user to randomly crash the
kernel because they flip a mapping, which they can do anytime they
want?

The way KVM works is that we return to userspace for the VMM to fix
things. Either by emulating something we can't do in the kernel, or by
fixing things so that the kernel can replay the fault and sort it out.

Either way, this requires some form of fault syndrome so that usespace
has a chance of understanding WTF is going on.

> > Which brings me to the next point: FWB is not discoverable from
> > userspace. How do you expect a VMM to know what it can or cannot do?
> 
> Good point. I am not sure if it can. I suppose you are concerned about error
> during fault handling when !FWB without VMM having any clear indications
> of the cause?

No, I'm concerned that a well established API (populating a memslot)
works in some case and doesn't work in another without a clear
indication of *why* we have this behaviour.

To me, this indicates that userspace needs to buy in this new
behaviour, and that behaviour needs to be advertised by a capability,
which is in turn conditional on FWB.

> Perhaps we can gracefully fall back to the default device mapping
> in such case? But that would cause VM to crash as soon as it makes some
> access violating DEVICE_nGnRE.

Which would now be a regression...

My take is that this cacheable PNFMAP contraption must only be exposed
to a guest if FWB is available. We can't prevent someone to do an
mmap() behind our back, but we can at least:

- tell userspace whether this is supported
- only handle the fault if userspace has bought in this mode
- report the fault to userspace for it to fix things otherwise

	M.

-- 
Without deviation from the norm, progress is not possible.