lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 7 May 2024 17:22:13 +0800
From: Junxian Huang <huangjunxian6@...ilicon.com>
To: Jason Gunthorpe <jgg@...pe.ca>
CC: <leon@...nel.org>, <linux-rdma@...r.kernel.org>, <linuxarm@...wei.com>,
	<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH for-next] RDMA/hns: Support flexible WQE buffer page size



On 2024/4/30 21:41, Jason Gunthorpe wrote:
> On Tue, Apr 30, 2024 at 05:28:45PM +0800, Junxian Huang wrote:
>> From: Chengchang Tang <tangchengchang@...wei.com>
>>
>> Currently, driver fixedly allocates 4K pages for userspace WQE buffer
>> and results in HW reading WQE with a granularity of 4K even in a 64K
>> system. HW has to switch pages every 4K, leading to a loss of performance.
> 
>> In order to improve performance, add support for userspace to allocate
>> flexible WQE buffer page size between 4K to system PAGESIZE.
>> @@ -90,7 +90,8 @@ struct hns_roce_ib_create_qp {
>>  	__u8    log_sq_bb_count;
>>  	__u8    log_sq_stride;
>>  	__u8    sq_no_prefetch;
>> -	__u8    reserved[5];
>> +	__u8    pageshift;
>> +	__u8    reserved[4];
> 
> It doesn't make any sense to pass in a pageshift from userspace.
> 
> Kernel should detect whatever underlying physical contiguity userspace
> has been able to create and configure the hardware optimally. The umem
> already has all the tools to do this trivially.
> 
> Why would you need to specify anything?
> 
> Jason

Hi Jason. Sorry for the late response.

WQE buffer of hns HW actually consists of 3 regions: SQ WQE, RQ WQE and
ext SGE. Userspace and kernel driver both computes buffer size and start
offset of these 3 regions based on the page shift. Kernel needs to obtains
the page shift from userspace to ensure the buffer size and start offset
are the same between kernel and userspace and avoid invalid memory access.

The "tools of umem" you said refers to ib_umem_find_best_pgsz() I assume.
This API cannot ensure returning the same page size as userspace, and
kernel cannot determine the start offset of the 3 regions in userspace in
this case.

Junxian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ