lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 6 May 2024 14:47:01 +0800
From: Chengchang Tang <tangchengchang@...wei.com>
To: Jason Gunthorpe <jgg@...pe.ca>, Junxian Huang
	<huangjunxian6@...ilicon.com>
CC: <leon@...nel.org>, <linux-rdma@...r.kernel.org>, <linuxarm@...wei.com>,
	<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH for-next] RDMA/hns: Support flexible WQE buffer page size



On 2024/4/30 21:41, Jason Gunthorpe wrote:
> On Tue, Apr 30, 2024 at 05:28:45PM +0800, Junxian Huang wrote:
>> From: Chengchang Tang <tangchengchang@...wei.com>
>>
>> Currently, driver fixedly allocates 4K pages for userspace WQE buffer
>> and results in HW reading WQE with a granularity of 4K even in a 64K
>> system. HW has to switch pages every 4K, leading to a loss of performance.
> 
>> In order to improve performance, add support for userspace to allocate
>> flexible WQE buffer page size between 4K to system PAGESIZE.
>> @@ -90,7 +90,8 @@ struct hns_roce_ib_create_qp {
>>  	__u8    log_sq_bb_count;
>>  	__u8    log_sq_stride;
>>  	__u8    sq_no_prefetch;
>> -	__u8    reserved[5];
>> +	__u8    pageshift;
>> +	__u8    reserved[4];
> 
> It doesn't make any sense to pass in a pageshift from userspace.
> 
> Kernel should detect whatever underlying physical contiguity userspace
> has been able to create and configure the hardware optimally. The umem
> already has all the tools to do this trivially.
> 
> Why would you need to specify anything?
> 
> Jason
> 

For hns roce, QPs requires three wqe buffers, namely SQ wqe buffer, RQ wqe
buffer and EXT_SGE buffer.  Due to HW constraints, they need to be configured
with the same page size. The memory of these three buffers is allocated by
the user-mode driver now. The user-mode driver will calculate the size of
each region and align them to the page size. Finally, the driver will merge
the memories of these three regions together, apply for a memory with
continuous virtual addresses, and send the address to the kernel-mode driver
(during this process, the user-mode driver and the kernel-mode driver only
exchange addresses, but not the the sizes of these three areas or other
information).

Since the three regions share one umem, through umem's tools, such as
ib_umem_find_best_pgsz(), they will eventually calculate the best page size
of the entire umem, not each region. For this reason, coupled with the fact
that currently only the address is passed when the kernel mode driver interacts
with the user mode driver, and no other information is passed, it makes it more
difficult to calculate the page size used by the user mode driver from the
kernel mode driver. In this case, it is a relatively simpler method to let user
mode directly tell kernel mode which pageshift it uses, and it is also easier
in terms of forward and backward compatibility.

Chengchang

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ