lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250422135452.GL823903@nvidia.com>
Date: Tue, 22 Apr 2025 10:54:52 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Oliver Upton <oliver.upton@...ux.dev>
Cc: Ankit Agrawal <ankita@...dia.com>,
	Sean Christopherson <seanjc@...gle.com>,
	Marc Zyngier <maz@...nel.org>,
	Catalin Marinas <catalin.marinas@....com>,
	"joey.gouly@....com" <joey.gouly@....com>,
	"suzuki.poulose@....com" <suzuki.poulose@....com>,
	"yuzenghui@...wei.com" <yuzenghui@...wei.com>,
	"will@...nel.org" <will@...nel.org>,
	"ryan.roberts@....com" <ryan.roberts@....com>,
	"shahuang@...hat.com" <shahuang@...hat.com>,
	"lpieralisi@...nel.org" <lpieralisi@...nel.org>,
	"david@...hat.com" <david@...hat.com>,
	Aniket Agashe <aniketa@...dia.com>, Neo Jia <cjia@...dia.com>,
	Kirti Wankhede <kwankhede@...dia.com>,
	"Tarun Gupta (SW-GPU)" <targupta@...dia.com>,
	Vikram Sethi <vsethi@...dia.com>, Andy Currid <acurrid@...dia.com>,
	Alistair Popple <apopple@...dia.com>,
	John Hubbard <jhubbard@...dia.com>, Dan Williams <danw@...dia.com>,
	Zhi Wang <zhiw@...dia.com>, Matt Ochs <mochs@...dia.com>,
	Uday Dhoke <udhoke@...dia.com>, Dheeraj Nigam <dnigam@...dia.com>,
	Krishnakant Jaju <kjaju@...dia.com>,
	"alex.williamson@...hat.com" <alex.williamson@...hat.com>,
	"sebastianene@...gle.com" <sebastianene@...gle.com>,
	"coltonlewis@...gle.com" <coltonlewis@...gle.com>,
	"kevin.tian@...el.com" <kevin.tian@...el.com>,
	"yi.l.liu@...el.com" <yi.l.liu@...el.com>,
	"ardb@...nel.org" <ardb@...nel.org>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"gshan@...hat.com" <gshan@...hat.com>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"ddutile@...hat.com" <ddutile@...hat.com>,
	"tabba@...gle.com" <tabba@...gle.com>,
	"qperret@...gle.com" <qperret@...gle.com>,
	"kvmarm@...ts.linux.dev" <kvmarm@...ts.linux.dev>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-arm-kernel@...ts.infradead.org" <linux-arm-kernel@...ts.infradead.org>
Subject: Re: [PATCH v3 1/1] KVM: arm64: Allow cacheable stage 2 mapping using
 VMA flags

On Tue, Apr 22, 2025 at 12:49:28AM -0700, Oliver Upton wrote:
> The reality is that userspace is an equal participant in remaining coherent with
> the guest. Whether or not FWB is employed for a particular region of IPA
> space is useful information for userspace deciding what it needs to do to access guest
> memory. Ignoring the Nvidia widget for a second, userspace also needs to know this for
> 'normal', kernel-managed memory so it understands what CMOs may be necessary when (for
> example) doing live migration of the VM.

Really? How does it work today then? Is this another existing problem?
Userspace is doing CMOs during live migration that are not necessary?

> So this KVM CAP needs to be paired with a memslot flag.
> 
>  - The capability says KVM is able to enforce Write-Back at stage-2

Sure

>  - The memslot flag says userspace expects a particular GFN range to guarantee
>    Write-Back semantics. This can be applied to 'normal', kernel-managed memory
>    and PFNMAP thingies that have cacheable attributes at host stage-1.

Userspace doesn't actaully know if it has a cachable mapping from VFIO
though :(

I don't really see a point in this. If the KVM has the cap then
userspace should assume the S2FWB behavior for all cachable memslots.

What should happen if you have S2FWB but don't pass the flag? For
normal kernel memory it should still use S2FWB. Thus for cachable
PFNMAP it makes sense that it should also still use S2FWB without the
flag?

So, if you set the flag and don't have S2FWB it will fail the memslot,
but then why not just rely on userspace to read the CAP and not create
the memslot in the first place?

If you don't set the flag then it should go ahead and use S2FWB anyhow
and not fail anyhow..

It doesn't make alot of sense to me and brings more complexity to
force userspace to discover the cachability of the VFIO side.

>  - Under no situation do we allow userspace to create non-cacheable mapping at
>    stage-2 for something PFNMAP cacheable at stage-1.

Yes. memslot creation should fail, and page fault should fail.

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ