lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAC_iWj+3JvPY2oqVOdu0T1Wt6-ukoy=dLc72u1f55yY23uOTbA@mail.gmail.com>
Date: Fri, 20 Sep 2024 08:29:18 +0300
From: Ilias Apalodimas <ilias.apalodimas@...aro.org>
To: Jesper Dangaard Brouer <hawk@...nel.org>
Cc: Yunsheng Lin <linyunsheng@...wei.com>, davem@...emloft.net, kuba@...nel.org, 
	pabeni@...hat.com, liuyonglong@...wei.com, fanghaiqing@...wei.com, 
	zhangkun09@...wei.com, Robin Murphy <robin.murphy@....com>, 
	Alexander Duyck <alexander.duyck@...il.com>, IOMMU <iommu@...ts.linux.dev>, 
	Wei Fang <wei.fang@....com>, Shenwei Wang <shenwei.wang@....com>, 
	Clark Wang <xiaoning.wang@....com>, Eric Dumazet <edumazet@...gle.com>, 
	Tony Nguyen <anthony.l.nguyen@...el.com>, Przemek Kitszel <przemyslaw.kitszel@...el.com>, 
	Alexander Lobakin <aleksander.lobakin@...el.com>, Alexei Starovoitov <ast@...nel.org>, 
	Daniel Borkmann <daniel@...earbox.net>, John Fastabend <john.fastabend@...il.com>, 
	Saeed Mahameed <saeedm@...dia.com>, Leon Romanovsky <leon@...nel.org>, Tariq Toukan <tariqt@...dia.com>, 
	Felix Fietkau <nbd@....name>, Lorenzo Bianconi <lorenzo@...nel.org>, Ryder Lee <ryder.lee@...iatek.com>, 
	Shayne Chen <shayne.chen@...iatek.com>, Sean Wang <sean.wang@...iatek.com>, 
	Kalle Valo <kvalo@...nel.org>, Matthias Brugger <matthias.bgg@...il.com>, 
	AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>, 
	Andrew Morton <akpm@...ux-foundation.org>, imx@...ts.linux.dev, netdev@...r.kernel.org, 
	linux-kernel@...r.kernel.org, intel-wired-lan@...ts.osuosl.org, 
	bpf@...r.kernel.org, linux-rdma@...r.kernel.org, 
	linux-wireless@...r.kernel.org, linux-arm-kernel@...ts.infradead.org, 
	linux-mediatek@...ts.infradead.org, linux-mm@...ck.org
Subject: Re: [PATCH net 2/2] page_pool: fix IOMMU crash when driver has
 already unbound

Hi Jesper,

On Fri, 20 Sept 2024 at 00:04, Jesper Dangaard Brouer <hawk@...nel.org> wrote:
>
>
>
> On 19/09/2024 13.15, Yunsheng Lin wrote:
> > On 2024/9/19 17:42, Jesper Dangaard Brouer wrote:
> >>
> >> On 18/09/2024 19.06, Ilias Apalodimas wrote:
> >>>> In order not to do the dma unmmapping after driver has already
> >>>> unbound and stall the unloading of the networking driver, add
> >>>> the pool->items array to record all the pages including the ones
> >>>> which are handed over to network stack, so the page_pool can
> >>>> do the dma unmmapping for those pages when page_pool_destroy()
> >>>> is called.
> >>>
> >>> So, I was thinking of a very similar idea. But what do you mean by
> >>> "all"? The pages that are still in caches (slow or fast) of the pool
> >>> will be unmapped during page_pool_destroy().
> >>
> >> I really dislike this idea of having to keep track of all outstanding pages.
> >>
> >> I liked Jakub's idea of keeping the netdev around for longer.
> >>
> >> This is all related to destroying the struct device that have points to
> >> the DMA engine, right?
> >
> > Yes, the problem seems to be that when device_del() is called, there is
> > no guarantee hw behind the 'struct device ' will be usable even if we
> > call get_device() on it.
> >
> >>
> >> Why don't we add an API that allow netdev to "give" struct device to
> >> page_pool.  And then the page_poll will take over when we can safely
> >> free the stuct device?
> >
> > By 'allow netdev to "give" struct device to page_pool', does it mean
> > page_pool become the driver for the device?
> > If yes, it seems that is similar to jakub's idea, as both seems to stall
> > the calling of device_del() by not returning when the driver unloading.
>
> Yes, this is what I mean. (That is why I mentioned Jakub's idea).

Keeping track of inflight packets that need to be unmapped is
certainly more complex. Delaying the netdevice destruction certainly
solves the problem but there's a huge cost IMHO. Those devices might
stay there forever and we have zero guarantees that the network stack
will eventually release (and unmap) those packets. What happens in
that case? The user basically has to reboot the entire machine, just
because he tries to bring an interface down and up again.

Thanks
/Ilias
>
>
> > If no, it seems that the problem is still existed when the driver for
> > the device has unbound after device_del() is called.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ