lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260203091458.GA89766@j66a10360.sqa.eu95>
Date: Tue, 3 Feb 2026 17:14:58 +0800
From: "D. Wythe" <alibuda@...ux.alibaba.com    >
To: Jason Gunthorpe <jgg@...pe.ca>
Cc: "D. Wythe" <alibuda@...ux.alibaba.com>,
	Leon Romanovsky <leon@...nel.org>,
	Uladzislau Rezki <urezki@...il.com>,
	"David S. Miller" <davem@...emloft.net>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Dust Li <dust.li@...ux.alibaba.com>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	Sidraya Jayagond <sidraya@...ux.ibm.com>,
	Wenjia Zhang <wenjia@...ux.ibm.com>,
	Mahanta Jambigi <mjambigi@...ux.ibm.com>,
	Simon Horman <horms@...nel.org>, Tony Lu <tonylu@...ux.alibaba.com>,
	Wen Gu <guwen@...ux.alibaba.com>, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org, linux-rdma@...r.kernel.org,
	linux-s390@...r.kernel.org, netdev@...r.kernel.org,
	oliver.yang@...ux.alibaba.com
Subject: Re: [PATCH net-next 2/3] mm: vmalloc: export find_vm_area()

On Fri, Jan 30, 2026 at 11:16:36AM -0400, Jason Gunthorpe wrote:
> On Fri, Jan 30, 2026 at 04:51:31PM +0800, D. Wythe wrote:
> > On Thu, Jan 29, 2026 at 09:20:58AM -0400, Jason Gunthorpe wrote:
> > > On Thu, Jan 29, 2026 at 07:36:09PM +0800, D. Wythe wrote:
> > > 
> > > > > From there you can check the resulting scatterlist and compute the
> > > > > page_size to pass to ib_map_mr_sg().
> > > 
> > > I should clarify this is done after DMA mapping the scatterlist. dma
> > > mapping can improve the page size.
> > > 
> > > And maybe the core code should be helping compute the MR's target page
> > > size for a scatterlist.. We already have code to do this in umem, and
> > > it is a pretty bit tricky considering the IOVA related rules.
> > >
> > 
> > Hi Jason,
> > 
> > After a deep dive into ib_umem_find_best_pgsz(), I have to say it is
> > much more subtle than it first appears. The IOVA-to-PA relative offset
> > rules, in particular, make it quite easy to get wrong.
> > 
> > While SMC could duplicate this logic, it is certainly not ideal for
> > maintenance. Are there any plans to refactor this into a generic RDMA
> > core helper—for instance, one that can determine the best page size
> > directly from an sg_table or scatterlist?
> 
> I have not heard of anyone touching this.
> 
> It looks like there are only two users in the kernel that pass
> something other than PAGE_SIZE, so it seems nobody has cared about
> this till now.
> 
> With high order folios being more common it seems like something
> missing.
> 
> However, I wonder what the drivers do with the input page size, 
> segmenting a scatterlist is a bit hard and we have helpers for that
> already too.
> 
> It is a bigger project but probably the right thing is to remove the
> page size input, wrap the scatterlist in a umem and fixup the drivers
> to use the existing umem support for building mtts, splitting
> scatterlists into blocks and so on.
> 
> The kernel side here has been left alone for a long time..

I am also curious about the original design intent behind requiring the 
caller to explicitly pass `page_size`. From what I can see, its primary 
role is to define the memory size per MTTE, but calculating the optimal 
value is surprisingly complex.

I completely agree that providing an automatic way to optimize or 
calculate the best page size should be the responsibility of the drivers
or the RDMA core themselves. Handling such low-level hardware-related 
details in a ULP like SMC feels misplaced.

Since it appears this isn't a high-priority issue for the community at
the moment, and a proper fix requires a much larger architectural effort 
in the RDMA core, I will withdraw this patch series. 

I'll keep an eye on the RDMA subsystem's progress and see if a more 
generic solution emerges in the future.

Thanks,
D. Wythe



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ