lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <beb61666-020a-d99e-e84f-c16111039e66@huaweicloud.com>
Date: Wed, 24 Dec 2025 10:20:01 +0800
From: Hou Tao <houtao@...weicloud.com>
To: Logan Gunthorpe <logang@...tatee.com>, linux-kernel@...r.kernel.org
Cc: linux-pci@...r.kernel.org, linux-mm@...ck.org,
 linux-nvme@...ts.infradead.org, Bjorn Helgaas <bhelgaas@...gle.com>,
 Alistair Popple <apopple@...dia.com>, Leon Romanovsky <leonro@...dia.com>,
 Greg Kroah-Hartman <gregkh@...uxfoundation.org>, Tejun Heo <tj@...nel.org>,
 "Rafael J . Wysocki" <rafael@...nel.org>, Danilo Krummrich
 <dakr@...nel.org>, Andrew Morton <akpm@...ux-foundation.org>,
 David Hildenbrand <david@...nel.org>,
 Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Keith Busch
 <kbusch@...nel.org>, Jens Axboe <axboe@...nel.dk>,
 Christoph Hellwig <hch@....de>, Sagi Grimberg <sagi@...mberg.me>,
 houtao1@...wei.com
Subject: Re: [PATCH 10/13] PCI/P2PDMA: support compound page in
 p2pmem_alloc_mmap()



On 12/23/2025 1:04 AM, Logan Gunthorpe wrote:
>
> On 2025-12-19 21:04, Hou Tao wrote:
>> From: Hou Tao <houtao1@...wei.com>
>>
>> P2PDMA memory has already supported compound page and the helpers which
>> support inserting compound page into vma is also ready, therefore, add
>> support for compound page in p2pmem_alloc_mmap() as well. It will reduce
>> the overhead of mmap() and get_user_pages() a lot when compound page is
>> enabled for p2pdma memory.
>>
>> The use of vm_private_data to save the alignment of p2pdma memory needs
>> explanation. The normal way to get the alignment is through pci_dev. It
>> can be achieved by either invoking kernfs_of() and sysfs_file_kobj() or
>> defining a new struct kernfs_vm_ops to pass the kobject to the
>> may_split() and ->pagesize() callbacks. The former approach depends too
>> much on kernfs implementation details, and the latter would lead to
>> excessive churn. Therefore, choose the simpler way of saving alignment
>> in vm_private_data instead.
>>
>> Signed-off-by: Hou Tao <houtao1@...wei.com>
>> ---
>>  drivers/pci/p2pdma.c | 48 ++++++++++++++++++++++++++++++++++++++++----
>>  1 file changed, 44 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
>> index e97f5da73458..4a133219ac43 100644
>> --- a/drivers/pci/p2pdma.c
>> +++ b/drivers/pci/p2pdma.c
>> @@ -128,6 +128,25 @@ static unsigned long p2pmem_get_unmapped_area(struct file *filp, struct kobject
>>  	return mm_get_unmapped_area(filp, uaddr, len, pgoff, flags);
>>  }
>>  
>> +static int p2pmem_may_split(struct vm_area_struct *vma, unsigned long addr)
>> +{
>> +	size_t align = (uintptr_t)vma->vm_private_data;
>> +
>> +	if (!IS_ALIGNED(addr, align))
>> +		return -EINVAL;
>> +	return 0;
>> +}
>> +
>> +static unsigned long p2pmem_pagesize(struct vm_area_struct *vma)
>> +{
>> +	return (uintptr_t)vma->vm_private_data;
>> +}
>> +
>> +static const struct vm_operations_struct p2pmem_vm_ops = {
>> +	.may_split = p2pmem_may_split,
>> +	.pagesize = p2pmem_pagesize,
>> +};
>> +
>>  static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj,
>>  		const struct bin_attribute *attr, struct vm_area_struct *vma)
>>  {
>> @@ -136,6 +155,7 @@ static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj,
>>  	struct pci_p2pdma *p2pdma;
>>  	struct percpu_ref *ref;
>>  	unsigned long vaddr;
>> +	size_t align;
>>  	void *kaddr;
>>  	int ret;
>>  
>> @@ -161,6 +181,16 @@ static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj,
>>  		goto out;
>>  	}
>>  
>> +	align = p2pdma->align;
>> +	if (vma->vm_start & (align - 1) || vma->vm_end & (align - 1)) {
>> +		pci_info_ratelimited(pdev,
>> +				     "%s: unaligned vma (%#lx~%#lx, %#lx)\n",
>> +				     current->comm, vma->vm_start, vma->vm_end,
>> +				     align);
>> +		ret = -EINVAL;
>> +		goto out;
>> +	}
> I'm a bit confused by some aspects of these changes. Why does the
> alignment become a property of the PCI device? It appears that if the
> CPU supports different sized huge pages then the size and alignment
> restrictions on P2PDMA memory become greater. So if someone is only
> allocating a few KB these changes will break their code and refuse to
> allocate single pages.
>
> I would have expected this code to allocate an appropriately aligned
> block of the p2p memory based on the requirements of the current
> mapping, not based on alignment requirements established when the device
> is probed.

The behavior mimics device-dax in which the creation of device-dax
device needs to specify the alignment property. Supporting different
alignments for different userspace mapping could work. However, it is no
way for the userspace to tell whether or not the the alignment is
mandatory. Take the below procedure as an example:

1) the size of CMB bar is 4MB
2) application 1 allocates 4KB. Its mapping is 4KB aligned
3) application 2 allocates 2MB. If the allocation from gen_pool is not
aligned, the mapping only supports 4KB-aligned mapping. If the
allocation support aligned allocation, the mapping could support
2MB-aligned mapping. However, the mmap implementation in the kernel
doesn't know which way is appropriate. If the alignment is specified in
the p2pdma, the implement could know the aligned 2MB mapping is appropriate.

> Logan
>
>
>
> .


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ