lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHS8izP-MWSFJi8zMW2P144-5p+KyWwNT2+UXBwSf=ocseQycQ@mail.gmail.com>
Date: Tue, 20 Aug 2024 09:22:39 -0400
From: Mina Almasry <almasrymina@...gle.com>
To: Jason Gunthorpe <jgg@...pe.ca>
Cc: Robin Murphy <robin.murphy@....com>, Yunsheng Lin <linyunsheng@...wei.com>, 
	Somnath Kotur <somnath.kotur@...adcom.com>, Jesper Dangaard Brouer <hawk@...nel.org>, 
	Yonglong Liu <liuyonglong@...wei.com>, "David S. Miller" <davem@...emloft.net>, 
	Jakub Kicinski <kuba@...nel.org>, pabeni@...hat.com, ilias.apalodimas@...aro.org, 
	netdev@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Alexander Duyck <alexander.duyck@...il.com>, Alexei Starovoitov <ast@...nel.org>, 
	"shenjian (K)" <shenjian15@...wei.com>, Salil Mehta <salil.mehta@...wei.com>, joro@...tes.org, 
	will@...nel.org, iommu@...ts.linux.dev
Subject: Re: [BUG REPORT]net: page_pool: kernel crash at iommu_get_dma_domain+0xc/0x20

On Tue, Aug 6, 2024 at 9:50 AM Jason Gunthorpe <jgg@...pe.ca> wrote:
>
> On Tue, Aug 06, 2024 at 01:50:08PM +0100, Robin Murphy wrote:
> > On 06/08/2024 12:54 pm, Yunsheng Lin wrote:
> > > On 2024/8/5 20:53, Robin Murphy wrote:
> > > > > > >
> > > > > > > The page_pool bumps refcnt via get_device() + put_device() on the DMA
> > > > > > > 'struct device', to avoid it going away, but I guess there is also some
> > > > > > > IOMMU code that we need to make sure doesn't go away (until all inflight
> > > > > > > pages are returned) ???
> > > > >
> > > > > I guess the above is why thing went wrong here, the question is which
> > > > > IOMMU code need to be called here to stop them from going away.
> > > >
> > > > This looks like the wrong device is being passed to dma_unmap_page() - if a device had an IOMMU DMA domain at the point when the DMA mapping was create, then neither that domain nor its group can legitimately have disappeared while that device still had a driver bound. Or if it *was* the right device, but it's already had device_del() called on it, then you have a fundamental lifecycle problem - a device with no driver bound should not be passed to the DMA API, much less a dead device that's already been removed from its parent bus.
> > >
> > > Yes, the device *was* the right device, And it's already had device_del()
> > > called on it.
> > > page_pool tries to call get_device() on the DMA 'struct device' to avoid the
> > > above lifecycle problem, it seems get_device() does not stop device_del()
> > > from being called, and that is where we have the problem here:
> > > https://elixir.bootlin.com/linux/v6.11-rc2/source/net/core/page_pool.c#L269
> > >
> > > The above happens because driver with page_pool support may hand over
> > > page still with dma mapping to network stack and try to reuse that page
> > > after network stack is done with it and passes it back to page_pool to avoid
> > > the penalty of dma mapping/unmapping. With all the caching in the network
> > > stack, some pages may be held in the network stack without returning to the
> > > page_pool soon enough, and with VF disable causing the driver unbound, the
> > > page_pool does not stop the driver from doing it's unbounding work, instead
> > > page_pool uses workqueue to check if there is some pages coming back from the
> > > network stack periodically, if there is any, it will do the dma unmmapping
> > > related cleanup work.
> >
> > OK, that sounds like a more insidious problem - it's not just IOMMU stuff,
> > in general the page pool should not be holding and using the device pointer
> > after the device has already been destroyed. Even without an IOMMU,
> > attempting DMA unmaps after the driver has already unbound may leak
> > resources or at worst corrupt memory. Fundamentally, the page pool code
> > cannot allow DMA mappings to outlive the driver they belong to.
>
> +1
>
> There is more that gets broken here if these basic driver model rules
> are not followed!
>
> netdev must fix this by waiting during driver remove until all this DMA
> activity is finished somehow.
>
>

Sorry if naive question, but why don't non-pp drivers run into this
issue? On driver unload, GVE seems to loop over the pages in its rx
queues, and calls gve_free_page_dqo, which simply subtracts the bias
refcount, dma_unmap_page(), and put_page(). The pages will live on if
they have references in the net stack, but the driver is free to
unload.

Is it for some reason not feasible for the page_pool to release the
pages on driver unload and destroy itself, rather than have to stick
around until all pending pages have been returned from the net stack?

-- 
Thanks,
Mina

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ