[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <775234bc-84a2-458b-b047-ff6ab2607989@huawei.com>
Date: Tue, 20 Aug 2024 15:18:16 +0800
From: Yonglong Liu <liuyonglong@...wei.com>
To: Robin Murphy <robin.murphy@....com>, Yunsheng Lin
<linyunsheng@...wei.com>, Somnath Kotur <somnath.kotur@...adcom.com>, Jesper
Dangaard Brouer <hawk@...nel.org>
CC: "David S. Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
<pabeni@...hat.com>, <ilias.apalodimas@...aro.org>, <netdev@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, Alexander Duyck <alexander.duyck@...il.com>,
Alexei Starovoitov <ast@...nel.org>, "shenjian (K)" <shenjian15@...wei.com>,
Salil Mehta <salil.mehta@...wei.com>, <joro@...tes.org>, <will@...nel.org>,
<iommu@...ts.linux.dev>
Subject: Re: [BUG REPORT]net: page_pool: kernel crash at
iommu_get_dma_domain+0xc/0x20
On 2024/8/6 20:50, Robin Murphy wrote:
> On 06/08/2024 12:54 pm, Yunsheng Lin wrote:
>> On 2024/8/5 20:53, Robin Murphy wrote:
>>>>>>
>>>>>> The page_pool bumps refcnt via get_device() + put_device() on the
>>>>>> DMA
>>>>>> 'struct device', to avoid it going away, but I guess there is
>>>>>> also some
>>>>>> IOMMU code that we need to make sure doesn't go away (until all
>>>>>> inflight
>>>>>> pages are returned) ???
>>>>
>>>> I guess the above is why thing went wrong here, the question is which
>>>> IOMMU code need to be called here to stop them from going away.
>>>
>>> This looks like the wrong device is being passed to dma_unmap_page()
>>> - if a device had an IOMMU DMA domain at the point when the DMA
>>> mapping was create, then neither that domain nor its group can
>>> legitimately have disappeared while that device still had a driver
>>> bound. Or if it *was* the right device, but it's already had
>>> device_del() called on it, then you have a fundamental lifecycle
>>> problem - a device with no driver bound should not be passed to the
>>> DMA API, much less a dead device that's already been removed from
>>> its parent bus.
>>
>> Yes, the device *was* the right device, And it's already had
>> device_del()
>> called on it.
>> page_pool tries to call get_device() on the DMA 'struct device' to
>> avoid the
>> above lifecycle problem, it seems get_device() does not stop
>> device_del()
>> from being called, and that is where we have the problem here:
>> https://elixir.bootlin.com/linux/v6.11-rc2/source/net/core/page_pool.c#L269
>>
>>
>> The above happens because driver with page_pool support may hand over
>> page still with dma mapping to network stack and try to reuse that page
>> after network stack is done with it and passes it back to page_pool
>> to avoid
>> the penalty of dma mapping/unmapping. With all the caching in the
>> network
>> stack, some pages may be held in the network stack without returning
>> to the
>> page_pool soon enough, and with VF disable causing the driver
>> unbound, the
>> page_pool does not stop the driver from doing it's unbounding work,
>> instead
>> page_pool uses workqueue to check if there is some pages coming back
>> from the
>> network stack periodically, if there is any, it will do the dma
>> unmmapping
>> related cleanup work.
>
> OK, that sounds like a more insidious problem - it's not just IOMMU
> stuff, in general the page pool should not be holding and using the
> device pointer after the device has already been destroyed. Even
> without an IOMMU, attempting DMA unmaps after the driver has already
> unbound may leak resources or at worst corrupt memory. Fundamentally,
> the page pool code cannot allow DMA mappings to outlive the driver
> they belong to.
>
> Thanks,
> Robin.
>
I found a easier way to reproduce the problem:
1. enable a vf
2. use mz to send only one of the multiple fragments
3. disable the vf in 30 seconds
Jakub's patch:
https://lore.kernel.org/netdev/20240809205717.0c966bad@kernel.org/T/
can solve this case, but can not solve the origin case.
Base on the simple case, I found that v6.9 has no problem, v6.10-rc1
introduce the problem.
And I trace the differences between the two version:
v6.9:
page_pool_return_page()->dma_unmap_page_attrs()->*dma_direct_unmap_page()*
v6.10-rc1:
page_pool_return_page()->dma_unmap_page_attrs()->*iommu_dma_unmap_page()*
That's because in v6.9 dev->dma_ops is NULL and v6.10-rc1 doesn't. (The
driver has already unbound, but the device is still there)
I revert the patch:
https://lore.kernel.org/linux-arm-kernel/4a4482a0-4408-4d86-9f61-6589ab78686b@somainline.org/T/
then the bug I report can not reproduce.
So maybe this problem is introduced by this patch or the following patch
set?
https://lore.kernel.org/linux-iommu/cover.1713523152.git.robin.murphy@arm.com/
or the page pool always has the problem: In the version before v6.9, and
in this case, page pool use iommu_dma_map_page() and
dma_direct_unmap_page(), not pair?
Powered by blists - more mailing lists