lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <923d57f1-55f1-411f-b659-9fe4fafa734a@linux.intel.com>
Date: Tue, 3 Jun 2025 08:54:07 +0800
From: Binbin Wu <binbin.wu@...ux.intel.com>
To: Ackerley Tng <ackerleytng@...gle.com>
Cc: kvm@...r.kernel.org, linux-mm@...ck.org, linux-kernel@...r.kernel.org,
 x86@...nel.org, linux-fsdevel@...r.kernel.org, aik@....com,
 ajones@...tanamicro.com, akpm@...ux-foundation.org, amoorthy@...gle.com,
 anthony.yznaga@...cle.com, anup@...infault.org, aou@...s.berkeley.edu,
 bfoster@...hat.com, brauner@...nel.org, catalin.marinas@....com,
 chao.p.peng@...el.com, chenhuacai@...nel.org, dave.hansen@...el.com,
 david@...hat.com, dmatlack@...gle.com, dwmw@...zon.co.uk,
 erdemaktas@...gle.com, fan.du@...el.com, fvdl@...gle.com, graf@...zon.com,
 haibo1.xu@...el.com, hch@...radead.org, hughd@...gle.com,
 ira.weiny@...el.com, isaku.yamahata@...el.com, jack@...e.cz,
 james.morse@....com, jarkko@...nel.org, jgg@...pe.ca, jgowans@...zon.com,
 jhubbard@...dia.com, jroedel@...e.de, jthoughton@...gle.com,
 jun.miao@...el.com, kai.huang@...el.com, keirf@...gle.com,
 kent.overstreet@...ux.dev, kirill.shutemov@...el.com,
 liam.merwick@...cle.com, maciej.wieczor-retman@...el.com,
 mail@...iej.szmigiero.name, maz@...nel.org, mic@...ikod.net,
 michael.roth@....com, mpe@...erman.id.au, muchun.song@...ux.dev,
 nikunj@....com, nsaenz@...zon.es, oliver.upton@...ux.dev,
 palmer@...belt.com, pankaj.gupta@....com, paul.walmsley@...ive.com,
 pbonzini@...hat.com, pdurrant@...zon.co.uk, peterx@...hat.com,
 pgonda@...gle.com, pvorel@...e.cz, qperret@...gle.com,
 quic_cvanscha@...cinc.com, quic_eberman@...cinc.com,
 quic_mnalajal@...cinc.com, quic_pderrin@...cinc.com,
 quic_pheragu@...cinc.com, quic_svaddagi@...cinc.com, quic_tsoni@...cinc.com,
 richard.weiyang@...il.com, rick.p.edgecombe@...el.com, rientjes@...gle.com,
 roypat@...zon.co.uk, rppt@...nel.org, seanjc@...gle.com, shuah@...nel.org,
 steven.price@....com, steven.sistare@...cle.com, suzuki.poulose@....com,
 tabba@...gle.com, thomas.lendacky@....com, vannapurve@...gle.com,
 vbabka@...e.cz, viro@...iv.linux.org.uk, vkuznets@...hat.com,
 wei.w.wang@...el.com, will@...nel.org, willy@...radead.org,
 xiaoyao.li@...el.com, yan.y.zhao@...el.com, yilun.xu@...el.com,
 yuzenghui@...wei.com, zhiquan1.li@...el.com
Subject: Re: [RFC PATCH v2 04/51] KVM: guest_memfd: Introduce
 KVM_GMEM_CONVERT_SHARED/PRIVATE ioctls



On 5/31/2025 4:10 AM, Ackerley Tng wrote:
> Binbin Wu <binbin.wu@...ux.intel.com> writes:
>
>> On 5/15/2025 7:41 AM, Ackerley Tng wrote:
>>
>> [...]
>>> +
>>> +static int kvm_gmem_convert_range(struct file *file, pgoff_t start,
>>> +				  size_t nr_pages, bool shared,
>>> +				  pgoff_t *error_index)
>>> +{
>>> +	struct conversion_work *work, *tmp, *rollback_stop_item;
>>> +	LIST_HEAD(work_list);
>>> +	struct inode *inode;
>>> +	enum shareability m;
>>> +	int ret;
>>> +
>>> +	inode = file_inode(file);
>>> +
>>> +	filemap_invalidate_lock(inode->i_mapping);
>>> +
>>> +	m = shared ? SHAREABILITY_ALL : SHAREABILITY_GUEST;
>>> +	ret = kvm_gmem_convert_compute_work(inode, start, nr_pages, m, &work_list);
>>> +	if (ret || list_empty(&work_list))
>>> +		goto out;
>>> +
>>> +	list_for_each_entry(work, &work_list, list)
>>> +		kvm_gmem_convert_invalidate_begin(inode, work);
>>> +
>>> +	list_for_each_entry(work, &work_list, list) {
>>> +		ret = kvm_gmem_convert_should_proceed(inode, work, shared,
>>> +						      error_index);
>> Since kvm_gmem_invalidate_begin() begins to handle shared memory,
>> kvm_gmem_convert_invalidate_begin() will zap the table.
>> The shared mapping could be zapped in kvm_gmem_convert_invalidate_begin() even
>> when kvm_gmem_convert_should_proceed() returns error.
>> The sequence is a bit confusing to me, at least in this patch so far.
>>
> It is true that zapping of pages from the guest page table will happen
> before we figure out whether conversion is allowed.
>
> For a shared-to-private conversion, we will definitely unmap from the
> host before checking if conversion is allowed, and there's no choice
> there since conversion is allowed if there are no unexpected refcounts,
> and the way to eliminate expected refcounts is to unmap from the host.
>
> Since we're unmapping before checking if conversion is allowed, I
> thought it would be fine to also zap from guest page tables before
> checking if conversion is allowed.
>
> Conversion is not meant to happen very regularly, and even if it is
> unmapped or zapped, the next access will fault in the page anyway, so
> there is a performance but not a functionality impact.
Yes, it's OK for shared mapping.

>
> Hope that helps.

It helped, thanks!

> Is it still odd to zap before checking if conversion
> should proceed?
>
>>> +		if (ret)
>>> +			goto invalidate_end;
>>> +	}
>>> +
>>> +	list_for_each_entry(work, &work_list, list) {
>>> +		rollback_stop_item = work;
>>> +		ret = kvm_gmem_shareability_apply(inode, work, m);
>>> +		if (ret)
>>> +			break;
>>> +	}
>>> +
>>> +	if (ret) {
>>> +		m = shared ? SHAREABILITY_GUEST : SHAREABILITY_ALL;
>>> +		list_for_each_entry(work, &work_list, list) {
>>> +			if (work == rollback_stop_item)
>>> +				break;
>>> +
>>> +			WARN_ON(kvm_gmem_shareability_apply(inode, work, m));
>>> +		}
>>> +	}
>>> +
>>> +invalidate_end:
>>> +	list_for_each_entry(work, &work_list, list)
>>> +		kvm_gmem_convert_invalidate_end(inode, work);
>>> +out:
>>> +	filemap_invalidate_unlock(inode->i_mapping);
>>> +
>>> +	list_for_each_entry_safe(work, tmp, &work_list, list) {
>>> +		list_del(&work->list);
>>> +		kfree(work);
>>> +	}
>>> +
>>> +	return ret;
>>> +}
>>> +
>> [...]
>>> @@ -186,15 +490,26 @@ static void kvm_gmem_invalidate_begin(struct kvm_gmem *gmem, pgoff_t start,
>>>    	unsigned long index;
>>>    
>>>    	xa_for_each_range(&gmem->bindings, index, slot, start, end - 1) {
>>> +		enum kvm_gfn_range_filter filter;
>>>    		pgoff_t pgoff = slot->gmem.pgoff;
>>>    
>>> +		filter = KVM_FILTER_PRIVATE;
>>> +		if (kvm_gmem_memslot_supports_shared(slot)) {
>>> +			/*
>>> +			 * Unmapping would also cause invalidation, but cannot
>>> +			 * rely on mmu_notifiers to do invalidation via
>>> +			 * unmapping, since memory may not be mapped to
>>> +			 * userspace.
>>> +			 */
>>> +			filter |= KVM_FILTER_SHARED;
>>> +		}
>>> +
>>>    		struct kvm_gfn_range gfn_range = {
>>>    			.start = slot->base_gfn + max(pgoff, start) - pgoff,
>>>    			.end = slot->base_gfn + min(pgoff + slot->npages, end) - pgoff,
>>>    			.slot = slot,
>>>    			.may_block = true,
>>> -			/* guest memfd is relevant to only private mappings. */
>>> -			.attr_filter = KVM_FILTER_PRIVATE,
>>> +			.attr_filter = filter,
>>>    		};
>>>    
>>>    		if (!found_memslot) {
>>> @@ -484,11 +799,49 @@ EXPORT_SYMBOL_GPL(kvm_gmem_memslot_supports_shared);
>>>    #define kvm_gmem_mmap NULL
>>>    #endif /* CONFIG_KVM_GMEM_SHARED_MEM */
>>>    
>> [...]


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ