netdev - Re: [PATCH net 2/2] page_pool: fix IOMMU crash when driver has already unbound

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <cdfecd37-31d7-42d2-a8d8-92008285b42e@huawei.com>
Date: Thu, 19 Sep 2024 19:15:11 +0800
From: Yunsheng Lin <linyunsheng@...wei.com>
To: Jesper Dangaard Brouer <hawk@...nel.org>, Ilias Apalodimas
	<ilias.apalodimas@...aro.org>
CC: <davem@...emloft.net>, <kuba@...nel.org>, <pabeni@...hat.com>,
	<liuyonglong@...wei.com>, <fanghaiqing@...wei.com>, <zhangkun09@...wei.com>,
	Robin Murphy <robin.murphy@....com>, Alexander Duyck
	<alexander.duyck@...il.com>, IOMMU <iommu@...ts.linux.dev>, Wei Fang
	<wei.fang@....com>, Shenwei Wang <shenwei.wang@....com>, Clark Wang
	<xiaoning.wang@....com>, Eric Dumazet <edumazet@...gle.com>, Tony Nguyen
	<anthony.l.nguyen@...el.com>, Przemek Kitszel <przemyslaw.kitszel@...el.com>,
	Alexander Lobakin <aleksander.lobakin@...el.com>, Alexei Starovoitov
	<ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>, John Fastabend
	<john.fastabend@...il.com>, Saeed Mahameed <saeedm@...dia.com>, Leon
 Romanovsky <leon@...nel.org>, Tariq Toukan <tariqt@...dia.com>, Felix Fietkau
	<nbd@....name>, Lorenzo Bianconi <lorenzo@...nel.org>, Ryder Lee
	<ryder.lee@...iatek.com>, Shayne Chen <shayne.chen@...iatek.com>, Sean Wang
	<sean.wang@...iatek.com>, Kalle Valo <kvalo@...nel.org>, Matthias Brugger
	<matthias.bgg@...il.com>, AngeloGioacchino Del Regno
	<angelogioacchino.delregno@...labora.com>, Andrew Morton
	<akpm@...ux-foundation.org>, <imx@...ts.linux.dev>, <netdev@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <intel-wired-lan@...ts.osuosl.org>,
	<bpf@...r.kernel.org>, <linux-rdma@...r.kernel.org>,
	<linux-wireless@...r.kernel.org>, <linux-arm-kernel@...ts.infradead.org>,
	<linux-mediatek@...ts.infradead.org>, <linux-mm@...ck.org>
Subject: Re: [PATCH net 2/2] page_pool: fix IOMMU crash when driver has
 already unbound

On 2024/9/19 17:42, Jesper Dangaard Brouer wrote:
> 
> On 18/09/2024 19.06, Ilias Apalodimas wrote:
>>> In order not to do the dma unmmapping after driver has already
>>> unbound and stall the unloading of the networking driver, add
>>> the pool->items array to record all the pages including the ones
>>> which are handed over to network stack, so the page_pool can
>>> do the dma unmmapping for those pages when page_pool_destroy()
>>> is called.
>>
>> So, I was thinking of a very similar idea. But what do you mean by
>> "all"? The pages that are still in caches (slow or fast) of the pool
>> will be unmapped during page_pool_destroy().
> 
> I really dislike this idea of having to keep track of all outstanding pages.
> 
> I liked Jakub's idea of keeping the netdev around for longer.
> 
> This is all related to destroying the struct device that have points to
> the DMA engine, right?

Yes, the problem seems to be that when device_del() is called, there is
no guarantee hw behind the 'struct device ' will be usable even if we
call get_device() on it.

> 
> Why don't we add an API that allow netdev to "give" struct device to
> page_pool.  And then the page_poll will take over when we can safely
> free the stuct device?

By 'allow netdev to "give" struct device to page_pool', does it mean
page_pool become the driver for the device?
If yes, it seems that is similar to jakub's idea, as both seems to stall
the calling of device_del() by not returning when the driver unloading.
If no, it seems that the problem is still existed when the driver for
the device has unbound after device_del() is called.

> 
> --Jesper