linux-kernel - Re: [PATCH] KVM: arm64: Drop mte_allowed check during memslot creation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <86ikozqmsl.wl-maz@kernel.org>
Date: Mon, 24 Feb 2025 17:23:38 +0000
From: Marc Zyngier <maz@...nel.org>
To: Aneesh Kumar K.V <aneesh.kumar@...nel.org>
Cc: Catalin Marinas <catalin.marinas@....com>,
	linux-kernel@...r.kernel.org,
	linux-arm-kernel@...ts.infradead.org,
	kvmarm@...ts.linux.dev,
	Oliver Upton <oliver.upton@...ux.dev>,
	Joey Gouly <joey.gouly@....com>,
	Zenghui Yu <yuzenghui@...wei.com>,
	Will Deacon <will@...nel.org>,
	Suzuki K Poulose <Suzuki.Poulose@....com>,
	Steven Price <steven.price@....com>,
	Peter Collingbourne <pcc@...gle.com>
Subject: Re: [PATCH] KVM: arm64: Drop mte_allowed check during memslot creation

On Mon, 24 Feb 2025 16:44:06 +0000,
Aneesh Kumar K.V <aneesh.kumar@...nel.org> wrote:
> 
> Marc Zyngier <maz@...nel.org> writes:
> 
> > On Mon, 24 Feb 2025 14:39:16 +0000,
> > Catalin Marinas <catalin.marinas@....com> wrote:
> >>
> >> On Mon, Feb 24, 2025 at 12:24:14PM +0000, Marc Zyngier wrote:
> >> > On Mon, 24 Feb 2025 11:05:33 +0000,
> >> > Catalin Marinas <catalin.marinas@....com> wrote:
> >> > > On Mon, Feb 24, 2025 at 03:09:38PM +0530, Aneesh Kumar K.V (Arm) wrote:
> >> > > > This change is needed because, without it, users are not able to use MTE
> >> > > > with VFIO passthrough (currently the mapping is either Device or
> >> > > > NonCacheable for which tag access check is not applied.), as shown
> >> > > > below (kvmtool VMM).
> >> > >
> >> > > Another nit: "users are not able to user VFIO passthrough when MTE is
> >> > > enabled". At a first read, the above sounded to me like one wants to
> >> > > enable MTE for VFIO passthrough mappings.
> >> >
> >> > What the commit message doesn't spell out is how MTE and VFIO are
> >> > interacting here. I also don't understand the reference to Device or
> >> > NC memory here.
> >>
> >> I guess it's saying that the guest cannot turn MTE on (Normal Tagged)
> >> for these ranges anyway since Stage 2 is Device or Normal NC. So we
> >> don't break any use-case specific to VFIO.
> >>
> >> > Isn't the issue that DMA doesn't check/update tags, and therefore it
> >> > makes little sense to prevent non-tagged memory being associated with
> >> > a memslot?
> >>
> >> The issue is that some MMIO memory range that does not support MTE
> >> (well, all MMIO) could be mapped by the guest as Normal Tagged and we
> >> have no clue what the hardware does as tag accesses, hence we currently
> >> prevent it altogether. It's not about DMA.
> >>
> >> This patch still prevents such MMIO+MTE mappings but moves the decision
> >> to user_mem_abort() and it's slightly more relaxed - only rejecting it
> >> if !VM_MTE_ALLOWED _and_ the Stage 2 is cacheable. The side-effect is
> >> that it allows device assignment into the guest since Stage 2 is not
> >> Normal Cacheable (at least for now, we have some patches Ankit but they
> >> handle the MTE case).
> >
> > The other side effect is that it also allows non-tagged cacheable
> > memory to be given to the MTE-enabled guest, and the guest has no way
> > to distinguish between what is tagged and what's not.
> >
> >>
> >> > My other concern is that this gives pretty poor consistency to the
> >> > guest, which cannot know what can be tagged and what cannot, and
> >> > breaks a guarantee that the guest should be able to rely on.
> >>
> >> The guest should not set Normal Tagged on anything other than what it
> >> gets as standard RAM. We are not changing this here. KVM than needs to
> >> prevent a broken/malicious guest from setting MTE on other (physical)
> >> ranges that don't support MTE. Currently it can only do this by forcing
> >> Device or Normal NC (or disable MTE altogether). Later we'll add
> >> FEAT_MTE_PERM to permit Stage 2 Cacheable but trap on tag accesses.
> >>
> >> The ABI change is just for the VMM, the guest shouldn't be aware as
> >> long as it sticks to the typical recommendations for MTE - only enable
> >> on standard RAM.
> >
> > See above. You fall into the same trap with standard memory, since you
> > now allow userspace to mix things at will, and only realise something
> > has gone wrong on access (and -EFAULT is not very useful).
> >
> >>
> >> Does any VMM rely on the memory slot being rejected on registration if
> >> it does not support MTE? After this change, we'd get an exit to the VMM
> >> on guest access with MTE turned on (even if it's not mapped as such at
> >> Stage 1).
> >
> > I really don't know what userspace expects w.r.t. mixing tagged and
> > non-tagged memory. But I don't expect anything good to come out of it,
> > given that we provide zero information about the fault context.
> >
> > Honestly, if we are going to change this, then let's make sure we give
> > enough information for userspace to go and fix the mess. Not just "it
> > all went wrong".
> >
> 
> What if we trigger a memory fault exit with the TAGACCESS flag, allowing
> the VMM to use the GPA to retrieve additional details and print extra
> information to aid in analysis? BTW, we will do this on the first fault
> in cacheable, non-tagged memory even if there is no tagaccess in that
> region. This can be further improved using the NoTagAccess series I
> posted earlier, which ensures the memory fault exit occurs only on
> actual tag access
> 
> Something like below?

Something like that, only with:

- a capability informing userspace of this behaviour

- a per-VM (or per-VMA) flag as a buy-in for that behaviour

- the relaxation is made conditional on the memslot not being memory
(i.e. really MMIO-only).

and keep the current behaviour otherwise.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.