linux-kernel - Re: [PATCH v3 1/1] KVM: arm64: Allow cacheable stage 2 mapping using VMA flags

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <dff27537-97f9-ad46-b991-e1de6b589e3e@loongson.cn>
Date: Thu, 20 Mar 2025 15:24:12 +0800
From: bibo mao <maobibo@...ngson.cn>
To: Oliver Upton <oliver.upton@...ux.dev>,
 David Hildenbrand <david@...hat.com>
Cc: Catalin Marinas <catalin.marinas@....com>,
 Jason Gunthorpe <jgg@...dia.com>, Marc Zyngier <maz@...nel.org>,
 Ankit Agrawal <ankita@...dia.com>, "joey.gouly@....com"
 <joey.gouly@....com>, "suzuki.poulose@....com" <suzuki.poulose@....com>,
 "yuzenghui@...wei.com" <yuzenghui@...wei.com>,
 "will@...nel.org" <will@...nel.org>,
 "ryan.roberts@....com" <ryan.roberts@....com>,
 "shahuang@...hat.com" <shahuang@...hat.com>,
 "lpieralisi@...nel.org" <lpieralisi@...nel.org>,
 Aniket Agashe <aniketa@...dia.com>, Neo Jia <cjia@...dia.com>,
 Kirti Wankhede <kwankhede@...dia.com>,
 "Tarun Gupta (SW-GPU)" <targupta@...dia.com>,
 Vikram Sethi <vsethi@...dia.com>, Andy Currid <acurrid@...dia.com>,
 Alistair Popple <apopple@...dia.com>, John Hubbard <jhubbard@...dia.com>,
 Dan Williams <danw@...dia.com>, Zhi Wang <zhiw@...dia.com>,
 Matt Ochs <mochs@...dia.com>, Uday Dhoke <udhoke@...dia.com>,
 Dheeraj Nigam <dnigam@...dia.com>, Krishnakant Jaju <kjaju@...dia.com>,
 "alex.williamson@...hat.com" <alex.williamson@...hat.com>,
 "sebastianene@...gle.com" <sebastianene@...gle.com>,
 "coltonlewis@...gle.com" <coltonlewis@...gle.com>,
 "kevin.tian@...el.com" <kevin.tian@...el.com>,
 "yi.l.liu@...el.com" <yi.l.liu@...el.com>, "ardb@...nel.org"
 <ardb@...nel.org>, "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
 "gshan@...hat.com" <gshan@...hat.com>,
 "linux-mm@...ck.org" <linux-mm@...ck.org>,
 "ddutile@...hat.com" <ddutile@...hat.com>,
 "tabba@...gle.com" <tabba@...gle.com>,
 "qperret@...gle.com" <qperret@...gle.com>,
 "seanjc@...gle.com" <seanjc@...gle.com>,
 "kvmarm@...ts.linux.dev" <kvmarm@...ts.linux.dev>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 "linux-arm-kernel@...ts.infradead.org" <linux-arm-kernel@...ts.infradead.org>
Subject: Re: [PATCH v3 1/1] KVM: arm64: Allow cacheable stage 2 mapping using
 VMA flags



On 2025/3/20 上午11:30, bibo mao wrote:
> 
> 
> On 2025/3/19 上午3:40, Oliver Upton wrote:
>> On Tue, Mar 18, 2025 at 08:35:38PM +0100, David Hildenbrand wrote:
>>> On 18.03.25 20:27, Catalin Marinas wrote:
>>>> On Tue, Mar 18, 2025 at 09:55:27AM -0300, Jason Gunthorpe wrote:
>>>>> On Tue, Mar 18, 2025 at 09:39:30AM +0000, Marc Zyngier wrote:
>>>>>> The memslot must also be created with a new flag ((2c) in the 
>>>>>> taxonomy
>>>>>> above) that carries the "Please map VM_PFNMAP VMAs as cacheable". 
>>>>>> This
>>>>>> flag is only allowed if (1) is valid.
>>>>>>
>>>>>> This results in the following behaviours:
>>>>>>
>>>>>> - If the VMM creates the memslot with the cacheable attribute without
>>>>>>     (1) being advertised, we fail.
>>>>>>
>>>>>> - If the VMM creates the memslot without the cacheable attribute, we
>>>>>>     map as NC, as it is today.
>>>>>
>>>>> Is that OK though?
>>>>>
>>>>> Now we have the MM page tables mapping this memory as cachable but KVM
>>>>> and the guest is accessing it as non-cached.
>>>>
>>>> I don't think we should allow this.
>>>>
>>>>> I thought ARM tried hard to avoid creating such mismatches? This is
>>>>> why the pgprot flags were used to drive this, not an opt-in flag. To
>>>>> prevent userspace from forcing a mismatch.
>>>>
>>>> We have the vma->vm_page_prot when the memslot is added, so we could 
>>>> use
>>>> this instead of additional KVM flags.
>>>
>>> I thought we try to avoid messing with the VMA when adding memslots; 
>>> because
>>> KVM_CAP_SYNC_MMU allows user space for changing the VMAs afterwards 
>>> without
>>> changing the memslot?
>>
>> Any checks on the VMA at memslot creation is done out of courtesy to
>> userspace so it 'fails fast'. We repeat checks on the VMA at the time of
>> fault to handle userspace twiddling VMAs behind our back.
> yes, I think it is better to add cachable attribute in memslot, it can 
> be checked on the VMA at memslot creation. Also cache attribute can be 
> abstracted with cachable/uc/wc type rather than detailed arch specified.
Sorry, I do not state this clearly. My meaning is tor add cachable 
attribute in memslot. It is acquired from prot of VMA at memslot 
creation, and checked at S2 page fault fastpath.

So it is unnecessary to find vma at S2 page fault fastpath, only memslot 
is enough. And cache attribute write-combined should be added also since 
some GPU memory can be mapped with WC attribute in vfio-gpu driver.

Regards
Bibo Mao

> 
> Regards
> Bibo Mao
>>
>> VM_MTE_ALLOWED is an example of this.
>>
>> Thanks,
>> Oliver
>>
>