[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <be049c33-936a-4c93-94ff-69cd51b5de8e@kernel.org>
Date: Tue, 12 Nov 2024 15:19:53 +0100
From: Jesper Dangaard Brouer <hawk@...nel.org>
To: Yunsheng Lin <linyunsheng@...wei.com>,
Toke Høiland-Jørgensen <toke@...hat.com>,
davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com
Cc: zhangkun09@...wei.com, fanghaiqing@...wei.com, liuyonglong@...wei.com,
Robin Murphy <robin.murphy@....com>,
Alexander Duyck <alexander.duyck@...il.com>, IOMMU <iommu@...ts.linux.dev>,
Andrew Morton <akpm@...ux-foundation.org>, Eric Dumazet
<edumazet@...gle.com>, Ilias Apalodimas <ilias.apalodimas@...aro.org>,
linux-mm@...ck.org, linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
kernel-team <kernel-team@...udflare.com>
Subject: Re: [PATCH net-next v3 3/3] page_pool: fix IOMMU crash when driver
has already unbound
On 12/11/2024 13.22, Yunsheng Lin wrote:
> On 2024/11/12 2:51, Toke Høiland-Jørgensen wrote:
>
> ...
>
>>>
>>> Is there any other suggestion/concern about how to fix the problem here?
>>>
>>> From the previous discussion, it seems the main concern about tracking the
>>> inflight pages is about how many inflight pages it is needed.
>>
>> Yeah, my hardest objection was against putting a hard limit on the
>> number of outstanding pages.
>>
>>> If there is no other suggestion/concern , it seems the above concern might be
>>> addressed by using pre-allocated memory to satisfy the mostly used case, and
>>> use the dynamically allocated memory if/when necessary.
>>
>> For this, my biggest concern would be performance.
>>
>> In general, doing extra work in rarely used code paths (such as device
>> teardown) is much preferred to adding extra tracking in the fast path.
>> Which would be an argument for Alexander's suggestion of just scanning
>> the entire system page table to find pages to unmap. Don't know enough
>> about mm system internals to have an opinion on whether this is
>> feasible, though.
>
> Yes, there seems to be many MM system internals, like the CONFIG_SPARSEMEM*
> config, memory offline/online and other MM specific optimization that it
> is hard to tell it is feasible.
>
> It would be good if MM experts can clarify on this.
>
Yes, please. Can Alex Duyck or MM-experts point me at some code walking
entire system page table?
Then I'll write some kernel code (maybe module) that I can benchmark how
long it takes on my machine with 384GiB. I do like Alex'es suggestion,
but I want to assess the overhead of doing this on modern hardware.
>>
>> In any case, we'll need some numbers to really judge the overhead in
>> practice. So benchmarking would be the logical next step in any case :)
>
> Using POC code show that using the dynamic memory allocation does not
> seems to be adding much overhead than the pre-allocated memory allocation
> in this patch, the overhead is about 10~20ns, which seems to be similar to
> the overhead of added overhead in the patch.
>
Overhead around 10~20ns is too large for page_pool, because XDP DDoS
use-case have a very small time budget (which is what page_pool was
designed for).
[1]
https://github.com/xdp-project/xdp-project/blob/master/areas/hints/traits01_bench_kmod.org#benchmark-basics
| Link speed | Packet rate | Time-budget |
| | at smallest pkts size | per packet |
|------------+-----------------------+---------------|
| 10 Gbit/s | 14,880,952 pps | 67.2 nanosec |
| 25 Gbit/s | 37,202,381 pps | 26.88 nanosec |
| 100 Gbit/s | 148,809,523 pps | 6.72 nanosec |
--Jesper
Powered by blists - more mailing lists