netdev - Re: [PATCH net 2/2] page_pool: fix IOMMU crash when driver has already unbound

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240924064559.1681488-1-gur.stavi@huawei.com>
Date: Tue, 24 Sep 2024 09:45:59 +0300
From: Gur Stavi <gur.stavi@...wei.com>
To: <linyunsheng@...wei.com>
CC: <akpm@...ux-foundation.org>, <aleksander.lobakin@...el.com>,
	<alexander.duyck@...il.com>, <angelogioacchino.delregno@...labora.com>,
	<anthony.l.nguyen@...el.com>, <ast@...nel.org>, <bpf@...r.kernel.org>,
	<daniel@...earbox.net>, <davem@...emloft.net>, <edumazet@...gle.com>,
	<fanghaiqing@...wei.com>, <hawk@...nel.org>, <ilias.apalodimas@...aro.org>,
	<imx@...ts.linux.dev>, <intel-wired-lan@...ts.osuosl.org>,
	<iommu@...ts.linux.dev>, <john.fastabend@...il.com>, <kuba@...nel.org>,
	<kvalo@...nel.org>, <leon@...nel.org>,
	<linux-arm-kernel@...ts.infradead.org>, <linux-kernel@...r.kernel.org>,
	<linux-mediatek@...ts.infradead.org>, <linux-mm@...ck.org>,
	<linux-rdma@...r.kernel.org>, <linux-wireless@...r.kernel.org>,
	<liuyonglong@...wei.com>, <lorenzo@...nel.org>, <matthias.bgg@...il.com>,
	<nbd@....name>, <netdev@...r.kernel.org>, <pabeni@...hat.com>,
	<przemyslaw.kitszel@...el.com>, <robin.murphy@....com>,
	<ryder.lee@...iatek.com>, <saeedm@...dia.com>, <sean.wang@...iatek.com>,
	<shayne.chen@...iatek.com>, <shenwei.wang@....com>, <tariqt@...dia.com>,
	<wei.fang@....com>, <xiaoning.wang@....com>, <zhangkun09@...wei.com>
Subject: Re: [PATCH net 2/2] page_pool: fix IOMMU crash when driver has already unbound

>>>> With all the caching in the network stack, some pages may be
>>>> held in the network stack without returning to the page_pool
>>>> soon enough, and with VF disable causing the driver unbound,
>>>> the page_pool does not stop the driver from doing it's
>>>> unbounding work, instead page_pool uses workqueue to check
>>>> if there is some pages coming back from the network stack
>>>> periodically, if there is any, it will do the dma unmmapping
>>>> related cleanup work.
>>>>
>>>> As mentioned in [1], attempting DMA unmaps after the driver
>>>> has already unbound may leak resources or at worst corrupt
>>>> memory. Fundamentally, the page pool code cannot allow DMA
>>>> mappings to outlive the driver they belong to.
>>>>
>>>> Currently it seems there are at least two cases that the page
>>>> is not released fast enough causing dma unmmapping done after
>>>> driver has already unbound:
>>>> 1. ipv4 packet defragmentation timeout: this seems to cause
>>>>    delay up to 30 secs:
>>>>
>>>> 2. skb_defer_free_flush(): this may cause infinite delay if
>>>>    there is no triggering for net_rx_action().
>>>>
>>>> In order not to do the dma unmmapping after driver has already
>>>> unbound and stall the unloading of the networking driver, add
>>>> the pool->items array to record all the pages including the ones
>>>> which are handed over to network stack, so the page_pool can
>>>> do the dma unmmapping for those pages when page_pool_destroy()
>>>> is called.
>>>
>>> So, I was thinking of a very similar idea. But what do you mean by
>>> "all"? The pages that are still in caches (slow or fast) of the pool
>>> will be unmapped during page_pool_destroy().
>>
>> Yes, it includes the one in pool->alloc and pool->ring.
>
> It worths mentioning that there is a semantics changing here:
> Before this patch, there can be almost unlimited inflight pages used by
> driver and network stack, as page_pool doesn't really track those pages.
> After this patch, as we use a fixed-size pool->items array to track the
> inflight pages, the inflight pages is limited by the pool->items, currently
> the size of pool->items array is calculated as below in this patch:
>
> +#define PAGE_POOL_MIN_ITEM_CNT	512
> +	unsigned int item_cnt = (params->pool_size ? : 1024) +
> +				PP_ALLOC_CACHE_SIZE + PAGE_POOL_MIN_ITEM_CNT;
>
> Personally I would consider it is an advantage to limit how many pages which
> are used by the driver and network stack, the problem seems to how to decide
> the limited number of page used by network stack so that performance is not
> impacted.

In theory, with respect to the specific problem at hand, you only have
a limit on the number of mapped pages inflight. Once you reach this
limit you can unmap these old pages, forget about them and remember
new ones.