lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250318230909.GD9311@nvidia.com>
Date: Tue, 18 Mar 2025 20:09:09 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Oliver Upton <oliver.upton@...ux.dev>
Cc: Marc Zyngier <maz@...nel.org>,
	Catalin Marinas <catalin.marinas@....com>,
	Ankit Agrawal <ankita@...dia.com>,
	"joey.gouly@....com" <joey.gouly@....com>,
	"suzuki.poulose@....com" <suzuki.poulose@....com>,
	"yuzenghui@...wei.com" <yuzenghui@...wei.com>,
	"will@...nel.org" <will@...nel.org>,
	"ryan.roberts@....com" <ryan.roberts@....com>,
	"shahuang@...hat.com" <shahuang@...hat.com>,
	"lpieralisi@...nel.org" <lpieralisi@...nel.org>,
	"david@...hat.com" <david@...hat.com>,
	Aniket Agashe <aniketa@...dia.com>, Neo Jia <cjia@...dia.com>,
	Kirti Wankhede <kwankhede@...dia.com>,
	"Tarun Gupta (SW-GPU)" <targupta@...dia.com>,
	Vikram Sethi <vsethi@...dia.com>, Andy Currid <acurrid@...dia.com>,
	Alistair Popple <apopple@...dia.com>,
	John Hubbard <jhubbard@...dia.com>, Dan Williams <danw@...dia.com>,
	Zhi Wang <zhiw@...dia.com>, Matt Ochs <mochs@...dia.com>,
	Uday Dhoke <udhoke@...dia.com>, Dheeraj Nigam <dnigam@...dia.com>,
	Krishnakant Jaju <kjaju@...dia.com>,
	"alex.williamson@...hat.com" <alex.williamson@...hat.com>,
	"sebastianene@...gle.com" <sebastianene@...gle.com>,
	"coltonlewis@...gle.com" <coltonlewis@...gle.com>,
	"kevin.tian@...el.com" <kevin.tian@...el.com>,
	"yi.l.liu@...el.com" <yi.l.liu@...el.com>,
	"ardb@...nel.org" <ardb@...nel.org>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"gshan@...hat.com" <gshan@...hat.com>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"ddutile@...hat.com" <ddutile@...hat.com>,
	"tabba@...gle.com" <tabba@...gle.com>,
	"qperret@...gle.com" <qperret@...gle.com>,
	"seanjc@...gle.com" <seanjc@...gle.com>,
	"kvmarm@...ts.linux.dev" <kvmarm@...ts.linux.dev>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-arm-kernel@...ts.infradead.org" <linux-arm-kernel@...ts.infradead.org>
Subject: Re: [PATCH v3 1/1] KVM: arm64: Allow cacheable stage 2 mapping using
 VMA flags

On Tue, Mar 18, 2025 at 12:30:43PM -0700, Oliver Upton wrote:
> On Tue, Mar 18, 2025 at 09:55:27AM -0300, Jason Gunthorpe wrote:
> > On Tue, Mar 18, 2025 at 09:39:30AM +0000, Marc Zyngier wrote:
> > 
> > > The memslot must also be created with a new flag ((2c) in the taxonomy
> > > above) that carries the "Please map VM_PFNMAP VMAs as cacheable". This
> > > flag is only allowed if (1) is valid.
> > > 
> > > This results in the following behaviours:
> > > 
> > > - If the VMM creates the memslot with the cacheable attribute without
> > >   (1) being advertised, we fail.
> > > 
> > > - If the VMM creates the memslot without the cacheable attribute, we
> > >   map as NC, as it is today.
> > 
> > Is that OK though?
> > 
> > Now we have the MM page tables mapping this memory as cachable but KVM
> > and the guest is accessing it as non-cached.
> > 
> > I thought ARM tried hard to avoid creating such mismatches? This is
> > why the pgprot flags were used to drive this, not an opt-in flag. To
> > prevent userspace from forcing a mismatch.
> 
> It's far more problematic the other way around, e.g. the host knows that
> something needs a Device-* attribute and the VM has done something
> cacheable. The endpoint for that PA could, for example, fall over when
> lines pulled in by the guest are written back, which of course can't
> always be traced back to the offending VM.
> 
> OTOH, if the host knows that a PA is cacheable and the guest does
> something non-cacheable, you 'just' have to deal with the usual
> mismatched attributes problem as laid out in the ARM ARM.

I think the issue is that KVM doesn't do that usual stuff (ie cache
flushing) for memory that doesn't have a struct page backing.

So nothing in the hypervisor does any cache flushing and I belive you
end up with a situation where the VMM could have zero'd this cachable
memory using cachable stores to sanitize it across VMs and then KVM
can put that memory into the VM as uncached and the VM could then
access stale non-zeroed data from a prior VM. Yes? This is a security
problem.

As I understand things KVM must either do the cache flushing, or must
not allow mismatched attributes, as a matter of security.

This is why FWB comes into the picture because KVM cannot do the cache
flushing of PFNMAP VMAs. So we force the MM and KVM S2 to be cachable
and use S2 FWB to prevent the guest from ever making it
uncachable. Thus the cache flushes are not required and everything is
security safe.

So I think the logic we want here in the fault handler is to:
  Get the mm's PTE
  If it is cachable:
    Check if it has a struct page:
       Yes - KVM flushes it and can use a non-FWB path
       No - KVM either fails to install it, or installs it using FWB
           to force cachability. KVM never allows degrading cachable
	   to non-cachable when it can't do flushing.
  Not cachable:
    Install it with Normal-NC as was previously discussed and merged

> Userspace should be stating intentions on the memslot with the sort of
> mapping that it wants to create, and a memslot flag to say "I allow
> cacheable mappings" seems to fit the bill.

I'm not sure about this, I don't see that the userspace has any
choice. As above, KVM has to follow whatever is in the PTEs, the
userspace can't ask for something different here. At best you could
make non-struct page cachable memory always fail unless the flag is
given - but why?

It seems sufficient for fast fail to check if the VMA has PFNMAP and
pgprot cachable then !FEAT_FWB fails the memslot. There is no real
recovery from this, the VMM is doing something that cannot be
supported.

>  - Stage-2 faults serviced w/ a non-cacheable mapping if flag is not
>    set

As above, I think this creates a bug :\

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ