netdev - Re: [BUG REPORT]net: page_pool: kernel crash at iommu_get_dma

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5b0415ff-9bbe-4553-89d6-17d12fd44b47@huawei.com>
Date: Tue, 6 Aug 2024 19:54:33 +0800
From: Yunsheng Lin <linyunsheng@...wei.com>
To: Robin Murphy <robin.murphy@....com>, Somnath Kotur
	<somnath.kotur@...adcom.com>, Jesper Dangaard Brouer <hawk@...nel.org>
CC: Yonglong Liu <liuyonglong@...wei.com>, "David S. Miller"
	<davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, <pabeni@...hat.com>,
	<ilias.apalodimas@...aro.org>, <netdev@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, Alexander Duyck <alexander.duyck@...il.com>,
	Alexei Starovoitov <ast@...nel.org>, "shenjian (K)" <shenjian15@...wei.com>,
	Salil Mehta <salil.mehta@...wei.com>, <joro@...tes.org>, <will@...nel.org>,
	<iommu@...ts.linux.dev>
Subject: Re: [BUG REPORT]net: page_pool: kernel crash at
 iommu_get_dma_domain+0xc/0x20

On 2024/8/5 20:53, Robin Murphy wrote:
>>>>
>>>> The page_pool bumps refcnt via get_device() + put_device() on the DMA
>>>> 'struct device', to avoid it going away, but I guess there is also some
>>>> IOMMU code that we need to make sure doesn't go away (until all inflight
>>>> pages are returned) ???
>>
>> I guess the above is why thing went wrong here, the question is which
>> IOMMU code need to be called here to stop them from going away.
> 
> This looks like the wrong device is being passed to dma_unmap_page() - if a device had an IOMMU DMA domain at the point when the DMA mapping was create, then neither that domain nor its group can legitimately have disappeared while that device still had a driver bound. Or if it *was* the right device, but it's already had device_del() called on it, then you have a fundamental lifecycle problem - a device with no driver bound should not be passed to the DMA API, much less a dead device that's already been removed from its parent bus.

Yes, the device *was* the right device, And it's already had device_del()
called on it.
page_pool tries to call get_device() on the DMA 'struct device' to avoid the
above lifecycle problem, it seems get_device() does not stop device_del()
from being called, and that is where we have the problem here:
https://elixir.bootlin.com/linux/v6.11-rc2/source/net/core/page_pool.c#L269

The above happens because driver with page_pool support may hand over
page still with dma mapping to network stack and try to reuse that page
after network stack is done with it and passes it back to page_pool to avoid
the penalty of dma mapping/unmapping. With all the caching in the network
stack, some pages may be held in the network stack without returning to the
page_pool soon enough, and with VF disable causing the driver unbound, the
page_pool does not stop the driver from doing it's unbounding work, instead
page_pool uses workqueue to check if there is some pages coming back from the
network stack periodically, if there is any, it will do the dma unmmapping
related cleanup work.

> 
> Thanks,
> Robin.
> 
>