lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8d4b6398-1b7d-4c14-b390-0456a6158681@huawei.com>
Date: Mon, 5 Aug 2024 20:50:22 +0800
From: Yunsheng Lin <linyunsheng@...wei.com>
To: Alexander Duyck <alexander.duyck@...il.com>, Yonglong Liu
	<liuyonglong@...wei.com>
CC: "David S. Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
	<pabeni@...hat.com>, <hawk@...nel.org>, <ilias.apalodimas@...aro.org>,
	<netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>, Alexei Starovoitov
	<ast@...nel.org>, "shenjian (K)" <shenjian15@...wei.com>, Salil Mehta
	<salil.mehta@...wei.com>, <iommu@...ts.linux.dev>
Subject: Re: [BUG REPORT]net: page_pool: kernel crash at
 iommu_get_dma_domain+0xc/0x20

On 2024/8/3 0:38, Alexander Duyck wrote:

...

> 
> The issue as I see it is that we aren't unmapping the pages when we
> call page_pool_destroy. There need to be no pages remaining with a DMA
> unmapping needed *after* that is called. Otherwise we will see this
> issue regularly.
> 
> What we probably need to look at doing is beefing up page_pool_release
> to add a step that will take an additional reference on the inflight
> pages, then call __page_pool_put_page to switch them to a reference
> counted page.

I am not sure if I understand what you meant about, did you mean making
page_pool_destroy() synchronously wait for the all in-flight pages to
come back before returning to driver?

> 
> Seems like the worst case scenario is that we are talking about having
> to walk the page table to do the above for any inflight pages but it

Which page table are we talking about here?

> would certainly be a much more deterministic amount of time needed to
> do that versus waiting on a page that may or may not return.
> 
> Alternatively a quick hack that would probably also address this would
> be to clear poll->dma_map in page_pool_destroy or maybe in

It seems we may need to clear pool->dma_sync too, and there may be some
time window between clearing and checking/dma_unmap?

> page_pool_unreg_netdev so that any of those residual mappings would
> essentially get leaked, but we wouldn't have to worry about trying to
> unmap while the device doesn't exist.

But how does the page_pool know if it is just the normal unloading processing
without VF disabling where the device still exists or it is the abnormal one
caused by the VF disabling where the device will disappear? If it is the first
one, does it cause resource leaking problem for iommu if some calling for iommu
is skipped?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ